$35
task 3.1: fun with k-means clustering: Download the 2D data in the file data-clustering-1.csv and plot it; the result should look something
like this
Next, implement
• Lloyd’s algorithm for k-means clustering (e.g. simply using scipy)
• Hartigan’s algorithm for k-means clustering
• MacQueen’s algorithm for k-means clustering
For k = 3, run each algorithm on the above data and plot your results. In fact, run each of them several times and look at the results. What do you observe? Are the results always the same or do they vary from run to run?
Measure the run times of each of your implementations (run them each at least 10 times and determine their average run times). What do you observe?
task 3.2: spectral clustering: Download data-clustering-2.csv and plot the 2D data in this file; the result should look something like this
Set k = 2 and apply the k-means algorithms you implemented in the previous task to this data. What do you observe?
Next, implement spectral clustering. Proceed as follows: Let
be the given data. First, compute an n × n similarity matrix S where
Sij = e−βkxi−xjk2
and then compute the Laplacian matrix L = D − S where the diagonal matrix D is given by
otherwise
Note that row i in L can be understood as a feature vector f(xi) for data point xi. Next, compute the eigenvalues λi and eigenvectors ui of L and sort them in descending order. That is, let λn denote the largest eigenvalue and un denote the corresponding eigenvector.
The eigenvector u2 that corresponds to the second smallest eigenvalue λ2 is called the Fiedler vector and is of significance in clustering. You will find that some of its entries are greater than 0 and some are less than 0. To cluster the given data into two clusters C1 and C2, do the following: If entry i of u2 is greater than 0 assign xi to cluster C1, it it is less than zero, assign xi to cluster C2.
Set β to some value (say β = 1), cluster the data as described, and plot your results. What do you observe?
task 3.3: dimensionality reduction: The file data-dimred-X.csv contains a 500 × 150 data matrix X, that is, 150 data vectors xi ∈ R500.
In fact, these data vectors are from three classes and you can find the corresponding class labels yi ∈ {1,2,3} in the file data-dimred-y.csv.
The goal of this task is to explore mappings R500 → R2 that allow us to visualize (plot) high-dimensional data.
First of all, perform dimensionality reduction using PCA. That is, normalize the data in X to zero mean and compute the eigen-decomposition of the corresponding covariance matrix. Then, use the two eigenvectors u1 and u2 of the two largest eigenvalues to project the (normalized!) data into R2. What do you observe?
Second of all, perform dimensionality reduction usingmulticlass LDA (as discussed in lecture 14). To this end, make use of the the fact that the data in X are from three classes. Compute the within class scatter matrix SW and the between class scatter matrix SB and then the eigen-decomposition of the matrix SW−1SB. Again, use the two eigenvectors u1 and u2 of the two largest eigenvalues to project the data into R2. What do you observe? Does the result differ from the one you obtained via PCA?
What if you project the data from R500 to R3? For both approaches, this can be accomplished using the first three eigenvectors and creating 3D plots. task 3.4: non-monotonous neurons: The two files xor-X.csv and xor-y.csv contain data points xi ∈ R2 and label values which when plotted appropriately should lead to a picture like this
Note that XOR problems like this pose nonlinear classification problems, because there is no single hyperplane that would separate the blue from the orange dots. XOR problems are therefore famously used to prove the limitations of a single perceptron
where f is a monotonous activation function such as
.
However, this limitation is a historical artifact, because monotonous activation functions are a persistent meme that arose in the 1940s. Yet, there is nothing that would prevent us from considering more flexible neural networks composed of neurons with non-monotonous activations.
Mandatory sub-task: Given the above data, train a non-monotonous neuron
where
.
In order to do so, perform gradient descend over the loss function
.
That is, randomly initialize w0 ∈ R2 and θ0 ∈ R and then iteratively compute the updates
∂E wt+1 = wt − ηw
∂w
Note: Good choices for the step sizes are ηθ = 0.001 and ηw = 0.005, but you are encourages to experiment with these parameters and see how they influence the behavior and outcome of the training procedure.
If all goes well, and say you also implement a function that can visualize a classifier, you should observe something like this
result obtained from w0,θ0 result obtained from w50,θ50
Voluntary sub-task: If you want to impress your professor, then also train a kernel SVM on the above data using a polynomial kernel
.
task 3.5: exploring numerical instabilities: This task revisits task 2.1 and is not such much a task in itself but an eye opener! The goal is to raise awareness for the fact that doing math on digital computers may lead to unreliable results!