Starting from:

$25

DASC521 -Machine Learning Homework 09 - Spectral Clustering - Solved

In this homework, you will implement a spectral clustering algorithm in Python. Here are the steps you need to follow:

1.      You are given a two-dimensional data set in the file named hw08_data_set.csv, which contains 1000 data points generated randomly from nine bivariate Gaussian

densities with the following parameters.  
 
𝜇! = #++55..00( ,
Σ! = #+−00..86
−0.6( ,

+0.8
𝑁! = 100
𝜇" = #−+55..00( ,
Σ" = #++00..86
+0.6( ,

+0.8
𝑁" = 100
𝜇# = #−−55..00( ,
Σ# = #+−00..86
−0.6( ,

+0.8
𝑁# = 100
𝜇$ = #+−55..00( ,
Σ$ = #++00..86
+0.6( ,

+0.8
𝑁$ = 100
𝜇% = #++50..00( ,
Σ% = #++00..20
+0.0( ,

+1.2
𝑁% = 100
𝜇& = #++05..00( ,
Σ& = #++10..20
+0.0( ,

+0.2
𝑁& = 100
𝜇’ = #−+50..00( ,
Σ’ = #++00..20
+0.0( ,

+1.2
𝑁’ = 100
𝜇( = #+−05..00( ,
Σ( = #++10..20
+0.0( ,

+0.2
𝑁( = 100
𝜇) = #++00..00( ,
Σ) = #++10..60
+0.0( ,

+1.6
𝑁) = 200
The given data points are shown in the following figure.

 

2.      You should first calculate the Euclidean distances between the pairs of data points. The data point pairs with distance less than 𝛿 = 2.0 are considered as connected. Construct the matrix 𝐁 as follows:

 

1,

𝑏*+ = 4

0,

𝑏** = 0
5𝒙* − 𝒙+5" < 𝛿 otherwise.
 

You should also visualize this connectivity matrix by drawing a line between two data points if they are connected. Your figure should be like the following figure. (20 points)

 

 

3.      You should then calculate 𝐃 and 𝐋 matrices as described in the lecture notes. You should normalize the Laplacian matrix using the formula below. (20 points)

 

𝐋,-../01*2 = 𝐈 − 𝐃3!/"𝐁𝐃3!/"

 

4.      Find the eigenvectors of the normalized Laplacian matrix and pick 𝑅 = 5 eigenvectors that corresponds to 𝑅 smallest eigenvectors (eigenvectors that corresponds to 2nd smallest, 3rd smallest, 4th smallest, 5th smallest and 6th smallest eigenvalues since the smallest eigenvalue is 0). Using these eigenvectors construct the matrix 𝐙 as described in the lecture notes. Please note that the eigenvalues might not be returned in a decreasing or increasing order from the eig function. (20 points)

 

5.      Run k-means clustering algorithm on 𝐙 matrix to find 𝐾 = 5 clusters. When initializing your algorithm, use the following rows of 𝐙 matrix for initial centroids: 242, 528, 570, 590, 648, 667, 774, 891, and 955. (20 points)

 

6.      Draw the clustering result obtained by your spectral clustering algorithm by coloring each cluster with a different color. Your figure should be like the following figure. (20 points) 

More products