I Datasets
Construct your own toy datasets in 2D, such as Gaussian clusters with more or less overlap, or clusters with curved
shapes as in the 2moons dataset. You will also use the MNIST dataset of handwritten digits .
Since clustering algorithms are unsupervised, they do not use the class labels yn ∈ {0,...,9}, only the instances x ∈ RD (where D = 784). You can use the labels to see if they agree with the resulting clusters found by an algorithm.
II Using clustering algorithms
Algorithms and plots We consider the following algorithms: k-means, EM for Gaussian mixtures with full covariances, mean-shift and connected-components. The figure shows pseudocode for the algorithms. For algorithms that take an infinite number of iterations to converge, we stop them after maxit iterations (e.g. 100) or once the error function changes by less than a small value tol (e.g. 10−3). With toy datasets in 2D, we plot the following figures:
• For every algorithm: the dataset in 2D, with points colored according to the cluster they belong to.
• For k-means: the value of the error function after each iteration; it should decrease monotonically and stop in a finite number of iterations.
• For EM with Gaussian mixtures: the value of the log-likelihood function after each iteration; it should increase monotonically. To get a hard clustering, we assign point xn to cluster k if p(k|xn) p(j|xn) ∀j 6= k. To get a soft clustering (which is more informative), we plot p(k|x) itself for each cluster as a function of x ∈ R2 (as a
color plot, or as a contour plot for each cluster).
• For mean-shift: the contours of the kernel density estimate and its modes; and, for any given point xn, the value of the density p(x) after each mean-shift iteration (initialized from xn), which should increase monotonically.
• For connected-components: the dataset in 2D, with points connected by edges in the -ball graph.
With the MNIST dataset, try EM with Gaussian mixtures (also k-means) with different K values and plot:
1. The mean µk of each cluster k = 1,...,K, as a grayscale image, with its mixing proportion πk = p(k).
2. The posterior probabilities p(k|xn) for k = 1,...,K for a given digit image xn, plotted as a bar chart.
Exploration Explore each algorithm in different settings. First, using the same dataset:
• Try different values of the user parameter (number of clusters K for k-means and Gaussian mixtures with EM, bandwidth σ 0 for mean-shift, scale 0 for connected-components).
• For algorithms that depend on the initialization (k-means and EM), try different random initializations.
Then, explore the algorithms and plots with different datasets, number of clusters, clusters with more or less overlap, with different shapes, etc. See the end of file lab04.m for suggestions of things to explore.
1
Notes
• Some of these algorithms may be slow (in particular, EM and mean-shift). Use small datasets to get results fast.
• For EM with Gaussian mixtures, we add a small number to the diagonal of each covariance matrix Σk (e.g. 10−10 tr(Σk)/D) to make Σk be full rank. We do this right after updating Σk in the M step.
Figure 1: Pseudocode for k-means, EM for Gaussian mixtures, mean-shift for the Gaussian kernel and connectedcomponents for an -ball graph (using a distance function d(·,·)). In all cases, the input is a dataset x1,...,xN ∈ RD and a user parameter: number of clusters K for k-means and Gaussian mixtures with EM, bandwidth σ 0 for mean-shift, and scale 0 for connected-components.