Problem 1 (K-means)
Implement the K-means algorithm discussed in class. Generate 500 observations from a mixture of three
Gaussians on R2 with mixing weights π = [0.2,0.5,0.3] and means µ and covariances Σ,
.
a) For K = 2,3,4,5, plot the value of the K-means objective function per iteration for 20 iterations (the algorithm may converge before that).
b) For K = 3,5, plot the 500 data points and indicate the cluster of each for the final iteration by marking it with a color or a symbol.
Problem 2 (Bayes classifier revisited)
In this problem, you will implement the EM algorithm for the Gaussian mixture model, with the purpose of using it in a Bayes classifier. The data is a processed version of the spam email data you looked at in Homework 2. Now, each labeled pair (x,y) has x ∈R10. We discussed how the Bayes classifier learns class-conditional densities, and unsupervised learning algorithms can be useful here. In this problem, the class conditional density will be the Gaussian mixture model (GMM). In these experiments, please initialize all covariance matrices to the empirical covariance of the data being modeled. Randomly initialize the means by sampling from a single multivariate Gaussian where the parameters are the mean and covariance of the data being modeled. Initialize the mixing weights to be uniform.
a) Implement the EM algorithm for the GMM described in class. Using the training data provided,for each class separately, plot the log marginal objective function for a 3-Gaussian mixture model over 10 different runs and for iterations 5 to 30. There should be two plots, each with 10 curves.
b) Using the best run for each class after 30 iterations, predict the testing data using a Bayes classifierand show the result in a 2 × 2 confusion matrix, along with the accuracy percentage. Repeat this process for a 1-, 2-, 3- and 4-Gaussian mixture model. Show all results nearby each other, and don’t repeat Part (a) for these other cases. Note that a 1-Gaussian GMM doesn’t require an algorithm, although your implementation will likely still work in this case.
Problem 3 (Matrix factorization)
In this problem, you will implement the MAP inference algorithm for the matrix completion problem discussed in class. As a reminder, for users u ∈Rd and movies v ∈Rd, we have ui ∼ N(0,λ−1I), i = 1,...,N1, vj ∼ N(0,λ−1I), j = 1,...,N2.
We are given an N1× N2 matrix M with missing values. Given the set Ω = {(i,j) : Mij is measured}, for each (i,j) ∈ Ω we model Mij ∼ N(uTi vj,σ2).
You should run your code on the user-movie ratings dataset provided on Courseworks and the course website. For your algorithm, set σ2 = 0.25, d = 10 and λ = 1. Train the model on the larger training set for 100 iterations. For each user-movie pair in the test set, predict the rating using the relevant dot product. Note that the mean rating has been subtracted from the data and you do not need to round your prediction. Since the equations are in the slides, there’s no need to re-derive it.
a) Run your code 10 times. For each run, initialize your ui and vj vectors as N(0,I) random vectors. On a single plot, show the the log joint likelihood for iterations 2 to 100 for each run. In a table, show in each row the final value of the training objective function next to the RMSE on the testing set. Sort these rows according to decreasing value of the objective function.
b) For the run with the highest objective value, pick the movies “Star Wars” “My Fair Lady” and“Goodfellas” and for each movie find the 10 closest movies according to Euclidean distance using their respective locations vj. List the query movie, the ten nearest movies and their distances. A mapping from index to movie is provided with the data.