$25
Machine Learning Homework 6 - Kernel K-means and Spectral Clustering
I. Homework Objective: Use whatever your favorite language to code out kernel k-means, spectral clustering (both normalized cut and ratio cut). You should consider spatial similarity and color similarity upon the clustering.
II. Data: Two 100*100 images are provided, and each pixel in the image should be treated as a data point, which means there are 10000 data points in each image.
III. Kernel: For both kernel k-means and spectral clustering, please use the new kernel defined below to compute the Gram matrix.
𝑘(𝑥, 𝑥′) × 𝑒 𝑐
This new defined kernel is basically multiplying two RBF kernels in order to consider spatial similarity and color similarity at the same time. 𝑆(𝑥) is the spatial information (i.e. the coordinate of the pixel) of data 𝑥, and 𝐶(𝑥) is the color information (i.e. the RGB values) of data 𝑥. Both 𝛾𝑠 and 𝛾𝑐 are hyper-parameters which you can tune in your own way.
IV. Requirements:
‣ Part1: You need to make videos or GIF images to show the clustering procedure (visualize the cluster assignments of data points in each iteration, colorize each cluster with different colors) of your kernel k-means and spectral clustering (both normalized cut and ratio cut) programs. (Hint : Numpy can help you to solve the eigenvalue problem.)
‣ Part2: In addition to cluster data into 2 clusters, try more clusters (e.g. 3 or 4) and show your results. (You also need to make videos or GIF images)
‣ Part3: For the initialization of k-means clustering used in kernel k-means, (e.g. kmeans++) and spectral clustering (both normalized cut and ratio cut), try different ways and show corresponding results. (You also need to make videos or GIF images)
‣ Part4: For spectral clustering (both normalized cut and ratio cut), you can try to examine whether the data points within the same cluster do have the same coordinates in the eigenspace of graph Laplacian or not. You should plot the result and discuss it in the report.