Cumberlands Chapter 8 Cluster Analysis Problems

Instructions

10% WILL BE DEDUCTED IF YOU CREATE A NEW OR SEPARATE DOCUMENT.

10% WILL BE DEDUCTED IF YOU CREATE A TITLE PAGE TYPE OF DOCUMENT.

1. Consider a data set consisting of 220 data vectors, where each vector has 32 components and each component is a 4-byte value. Suppose that vector quantization is used for compression and that 216 prototype vectors are used. How many bytes of storage does that data set take before and after

compression and what is the compression ratio?

2.Find all well-separated clusters in the set of points shown below.

Note: take a photo of the clusters, print the photo and circle the clusters, take another photo, paste your response into this document.

1. Consider a data set consisting of 220 data vectors, where each vector has 32 components and each
component is a 4-byte value. Suppose that vector quantization is used for compression and that 216
prototype vectors are used. How many bytes of storage does that data set take before and after
compression and what is the compression ratio?
2. Find all well-separated clusters in the set of points shown below.
Note: take a photo of the clusters, print the photo and circle the clusters, take another photo, paste
3. Identify the clusters in the figure below using the center-, contiguity-, and density-based
definitions. Also indicate the number of clusters for each case and give a brief indication of your
reasoning. Note that darkness or the number of dots indicates density. If it helps, assume center-based
means K-means, contiguity-based means single link, and density-based means DBSCAN.
4. For the following sets of two-dimensional points, (1) provide a sketch of how they would be split into
clusters by K-means for the given number of clusters and (2) indicate approximately where the resulting
centroids would be. Assume that we are using the squared error objective function. If you think that
there is more than one possible solution, then please indicate whether each solution is a global or local
minimum.

