explain the 5 methods used to measure distance between clusters
Euclidean distance
This is the most usual, “natural” and intuitive way of computing a distance between two samples. It takes into account the difference between two samples directly, based on the magnitude of changes in the sample levels. This distance type is usually used for data sets that are suitably normalized or without any special distribution problem.
Manhattan distance
Also known as city-block distance, this distance measurement is especially relevant for discrete data sets. While the Euclidean distance corresponds to the length of the shortest path between two samples (i.e. “as the crow flies”), the Manhattan distance refers to the sum of distances along each dimension (i.e. “walking round the block”).
Pearson Correlation distance
This distance is based on the Pearson correlation coefficient
that is calculated from the sample values and their standard
deviations. The correlation coefficient 'r' takes
values from –1 (large, negative correlation) to +1 (large, positive
correlation). Effectively, the Pearson distance -dp- is computed as
dp = 1 - r and lies between 0 (when correlation coefficient is +1,
i.e. the two samples are most similar) and 2 (when correlation
coefficient is -1).
Note that the data are centered by subtracting the
mean, and scaled by dividing by the standard
deviation.
Absolute Pearson Correlation distance
In this distance, the absolute value of the Pearson correlation
coefficient is used; hence the corresponding distance lies between
0 and 1, just like the correlation coefficient.
The equation for the Absolute Pearson distance -da- is:
da = 1 - ½ r ½
Taking the absolute value gives equal meaning to positive and negative correlations, due to which anti-correlated samples will get clustered together.
Un-centered Correlation distance
This is the same as the Pearson correlation, except that the sample means are set to zero in the expression for un-centered correlation. The un-centered correlation coefficient lies between –1 and +1; hence the distance lies between 0 and 2.
Explain two different methods that can be used to measure the phase angle difference between two sinusoidal functions with the same frequency using an oscilloscope.
Which clustering method computes the dissimilarity based the largest distance between two clusters? Write a name of the method.
Explain the similarities and differences between clusters, warehouse-scale computers, and datacenters.
(a) Write down the objective function of K-means. (b) Assume you have n d-dimension vectors, write down the code of K-means to cluster these n vectors to K groups (c) Explain three methods to measure the distance between two clusters for numerical data
(a) Write down the objective function of K-means. (b) Assume you have n d-dimension vectors, write down the code of K-means to cluster these n vectors to K groups (c) Explain three methods to measure the distance...
We used velocity dispersion and average distance between galaxies in the cluster to determine the virial mass, and we used the number of galaxies and the average mass of a galaxy to determine luminous mass. How would the average distance between galaxies have to change in order to eliminate the evidence for dark matter in galaxy clusters? Calculate the change needed in the case of the Coma Cluster.
Hierarchical clustering is sometimes used to generate K clusters, K > 1 by taking the clusters at the Kth level of the dendrogram. (Root is at level 1.) By looking at the clusters produced in this way, we can evaluate the behavior of hierarchical clustering on different types of data and clusters, and also compare hierarchical approaches to K-means. The following is a set of one-dimensional points: {6, 12, 18, 24, 30, 42, 48}. (a) For each of the following...
The masses of clusters of galaxies can be measured using methods based on three different physical processes. Name these methods and state what assumptions must be made about the physical state of the cluster in order for the individual methods to be applied.
Document one or more methods used to characterize and measure consumer confidence. Compare and contrast how confidence might be related to financial markets’ expectations of risk of a recession, similarly to interest rate spreads. Do you find consumer confidence to be a useful measure? Explain why or why not. Also comment on indicators contained in “economic fundamentals,” in its value to firm managers.
Explain encryption methods and how they are used Describe authentication methods and how they are used Explain and configure IP Security Discuss attacks on encryption and authentication methods Discuss the different types of encryption method and how to implement them. Discuss the importance of security token.
How must the line appear in order to measure the shortest distance between a point and a line?