Question

How does the shape of clusters create a challenge when implementing a clustering algorithm? How would...

How does the shape of clusters create a challenge when implementing a clustering algorithm?

How would you pick k when using the k-Means algorithm? Explain your reasoning.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Assuming the number of clusters k is predefined, and consequently, we get k clusters as an output, this may lead to addition of those data points that share no similarity with any cluster in a wrong cluster. These additional data points may change the shape of the cluster, which they are part of, leading to less accurate interpretation of the resulting clusters.

For example, in the graph below, for k=2, clearly 2 different clusters are formed, but there lies a point x almost equidistant from both the clusters. Whichever cluster x becomes a part of affects the shape of that cluster and also reduces the similarity among that cluster data points.

We can pick the most accurate k using elbow method. We plot the graph between the values of k and wcss (within cluster sum of squares, which is sum of squares of all centroids of each cluster for the k clusters). The value of k after which the wcss starts decreasing the most linearly is the optimal number of k. This is because minimising the wcss increases the distance between different clusters, leading to more similar data points being grouped together in each cluster.

For example, for the below elbow plot, k=y is the most optimal value for k.

Add a comment
Know the answer?
Add Answer to:
How does the shape of clusters create a challenge when implementing a clustering algorithm? How would...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Hierarchical clustering is sometimes used to generate K clusters, K > 1 by taking the clusters...

    Hierarchical clustering is sometimes used to generate K clusters, K > 1 by taking the clusters at the Kth level of the dendrogram. (Root is at level 1.) By looking at the clusters produced in this way, we can evaluate the behavior of hierarchical clustering on different types of data and clusters, and also compare hierarchical approaches to K-means. The following is a set of one-dimensional points: {6, 12, 18, 24, 30, 42, 48}. (a) For each of the following...

  • Suppose you have been building a model using the k-means clustering algorithm and you keep finding...

    Suppose you have been building a model using the k-means clustering algorithm and you keep finding that a certain variable is essentially ignored by the model (in other words, the variable is very similarly distributed across all clusters). Describe a method that can be used to exaggerate or minimize the impact of a variable when using k-means clustering. Why does this method work?

  • a) Why is implementing a K-means clustering algorithm multiple times with a fixed K important to do? 119 b) Why is cross-validation preferred over resubstituting as a method to measure classificat...

    a) Why is implementing a K-means clustering algorithm multiple times with a fixed K important to do? 119 b) Why is cross-validation preferred over resubstituting as a method to measure classification accuracy? Explain c) Give two situations when nearest neighbor classification may be preferred over linear and quadratic discriminant analysis methods in general. Explain your answer. a) Why is implementing a K-means clustering algorithm multiple times with a fixed K important to do? 119 b) Why is cross-validation preferred over...

  • a) How does PAM (K-medoids) form clusters; how does DBSCAN form clusters? b) Assume you apply...

    a) How does PAM (K-medoids) form clusters; how does DBSCAN form clusters? b) Assume you apply DBSCAN to the same dataset, but the examples in the dataset are sorted differently. Will DBSCAN always return the same clustering for different orderings of the same dataset? Give reasons for your answer.

  • 1) For the following set of two-dimensional points, draw a sketch of how they would be split into...

    1) For the following set of two-dimensional points, draw a sketch of how they would be split into two clusters by K-means (when global minimum of SSE is achieved) and by Gaussian mixture model clustering. You can assume the density of points in the darker area is much higher than the density of points in the lighter area 2) Name one other clustering method that might be able to accurately capture the two clusters. 1) For the following set of...

  • c++ question. implementing linked list how would you create the bellman-ford algorithm for weighted graphs?

    c++ question. implementing linked list how would you create the bellman-ford algorithm for weighted graphs?

  • 1. Implement the K-means algorithm using these two as a reference. 2.Use Matlab’s implementation of kmeans...

    1. Implement the K-means algorithm using these two as a reference. 2.Use Matlab’s implementation of kmeans to check your results on the fisheriris dataset (https://www.mathworks.com/help/stats/kmeans.html) a. The fisheriris dataset is built into Matlab, and you can load it using ‘load fisheriris’. b. Please note the labels are available for the dataset, so you can check the performance of the kmeans algorithm on the dataset. 274 14 Unsupervised Lnn Fig. 14.1 A two-dimensional domain with clusters of examples weight bot initial...

  • Business Analytics, Assignment on Clustering As part of the quarterly reviews, the manager of a r...

    Business Analytics, Assignment on Clustering As part of the quarterly reviews, the manager of a retail store analyzes the quality of customer service based on the periodic customer satisfaction ratings (on a scale of 1 to 10 with 1 = Poor and 10 = Excellent). To understand the level of service quality, which includes the waiting times of the customers in the checkout section, he collected data on 100 customers who visited the store; see the attached Excel file: ServiceQuality....

  • In C++ program a simple k-means clustering algorithm, kmeans, using the Euclidean distance for 2-dimensional numerical...

    In C++ program a simple k-means clustering algorithm, kmeans, using the Euclidean distance for 2-dimensional numerical data. Your program should be executed as follows: kmeans k input.txt where input parameter k > 1 is an integer, specifying the number of clusters. input.txt is an input file containing many 2-dimensional data points in the following format, 274 119 317 144 267 164 233 137 272 99 297 116 268 142 522 286 468 308 441 263 Your program should output a...

  • Data clustering and the k means algorithm. However, I'm not able to list all of the...

    Data clustering and the k means algorithm. However, I'm not able to list all of the data sets but they include: ecoli.txt, glass.txt, ionoshpere.txt, iris_bezdek.txt, landsat.txt, letter_recognition.txt, segmentation.txt vehicle.txt, wine.txt and yeast.txt. Input: Your program should be non-interactive (that is, the program should not interact with the user by asking him/her explicit questions) and take the following command-line arguments: <F<K><I><T> <R>, where F: name of the data file K: number of clusters (positive integer greater than one) I: maximum number...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT