Starting from:

$34.99

ISYE6740 Homework 4 Solution


1. Implementing EM algorithm for MNIST dataset. 
Implement the EM algorithm for fitting a Gaussian mixture model for the MNIST dataset. We reduce the dataset to be only two cases, of digits “2” and “6” only. Thus, you will fit GMM with C = 2. Use the data file data.mat or data.dat on Canvas. True label of the data are also provided in label.mat and label.dat
The matrix images is of size 784-by-1990, i.e., there are totally 1990 images, and each column of the matrix corresponds to one image of size 28-by-28 pixels (the image is vectorized; the original image can be recovered by map the vector into a matrix.)
(a) (5 points) Select from data one raw image of “2” and “6” and visualize them, respectively.
(b) (15 points) Use random Gaussian vector with zero mean as random initial means, and two identitymatrices I as initial covariance matrices for the clusters. Plot the log-likelihood function versus the number of iterations to show your algorithm is converging.
(c) (15 points points) Report, the fitting GMM model when EM has terminated in your algorithms,including the weights for each component and the mean vectors (please reformat the vectors into 28-by-28 images and show these images in your submission). Ideally, you should be able to see these means corresponds to “average” images. No need to report the covariance matrices.
1
2. Basic optimization. (50 points.)
The background of logistic regression will be discussed in the next lecture. Here, we just focus on finding out the property of the optimization problem, related to training a logistic regression.
Consider a simplified logistic regression problem. Given m training samples (xi,yi), i = 1,...,n. The data xi ∈ R, and yi ∈ {0,1}. To train/fit a logistic regression model for classification, we solve the following optimization problem, where θ ∈R is a parameter we aim to find:
max`(θ), (1) θ
where the log-likelhood function
.
(a) (15 points) Derive the gradient of the cost function `(θ) in (1) and write a pseudo-code for performing gradient descent to find the optimizer θ∗. This is essentially what the training procedure does. pseudo-code means you will write down the procedure in steps for the algorithm, not necessarily any specific programming language.
(b) (15 points) Write down a stochastic gradient descent algorithm to solve the training of logistic regression problem (1).
(c) (20 points) Show that the training problem in basic logistic regression problem is concave. Derive the Hessian matrix of `(θ) and based on this, show the training problem (1) is concave. Explain why the problem can be solved efficiently and gradient descent will achieve a unique global optimizer, as we discussed in the lecture.
2

More products