Starting from:

$30

STA414-2104-Mnist Logistic Regression Gaussian Solved

MNIST dataset. In this assignment, you will fit both generative and discriminative models using the MNIST dataset of handwritten numbers.

Each datapoint in the MNIST http://yann.lecun.com/exdb/mnist/ dataset is a 28x28 blackand-white image of a handwritten digit in {0...9}, and a label indicating which digit.

MNIST is the ’fruit fly’ of machine learning - a simple standard problem useful for comparing the properties of different algorithms. A starter Python code that loads and plots the MNIST dataset is attached. For this assignment, we will binarize the data, converting grey-scale pixels to either black or white (0 or 1) with > 0.5 being the cutoff (already done in the starter code).

The starter code hw2-train.py is to be used in both questions below. You will need to write the missing parts of the functions and return it back for evaluation. Note that each missing part should be typically a few lines of code, so make sure your code is compact. When comparing models, you will need a training and test set. Build a dataset of only 2000 training samples (controlled by N_data) to use when coding or debugging, to make loading and training faster. Inspect the starter code carefully, before you start coding.

 

Fig 1: Samples from MNIST data set.

1. Multi-class Logistic Regression Classifier - 50 pts. In this question, you will fit a discriminative model using gradient descent. Our model will be multi-class logistic regression:

 p(tk = 1|x,w) = P9iexp(=0 exp(wkTwx)iTx)

Omit bias (intercept) parameters for this question.

(a)     How many parameters does this model have?

(b)     Write down the log-likelihood and convert it into a minimization problem over the cross-entropy loss E. Derive the gradient of E with respect to each wk, i.e., ∇wkE(w).

(c)      Code up a gradient descent optimizer using the starter code provided to you, and minimize the cross-entropy loss. Report the final training and the test accuracy achieved. The training must be done over the full training dataset, unless there are computational issues, in which case you can reduce the number of training samples depending on the memory available. Report the number of samples used to obtain the final result. Hint: For log_softmax function, use scipy.special.logsumexp (already imported in the starter code) or its equivalent to make your code more numerically stable. Avoid nested for loops, and instead use matrix operations to keep your code fast. Each missing chunk should be a few lines of code!

(d)    Plot the final weights obtained as 10 images.

What to submit?

a)    Number of parameters.

b)    Log-likelihood, resulting cross-entropy minimization, and the gradient.

c)     Final training and test errors as well as the number of samples used in training.

d)    Figure containing each weight wk as an image.

e)     Your entire code should be attached to the end of your answers.

2.          Gaussian Discriminant Analysis - 50 pts. In this part, we train a generative model using the MNIST dataset. Assuming that the data generating distribution is Gaussian, i.e.

(2.1)                                                                                                      p(x|Ck) = N(x|µk,Σ).

We know that the posterior p(Ck|x) can be written in terms of the softmax function

(2.2)                                                                                                          where   ak = wkTx + wk0.

Here, we also know that

(2.3)                                                                                                      wk = Σ−1µk                     and         .

(a)    Write down the log-likelihood implied by this model and find the maximum likelihood estimator (MLE) for the priors p(Ck) = πk and the class means µk, for k = 1,...,K. Note that you do not need to derive the MLE for the covariance matrix.

(b)   Compute the MLEs obtained in the previous part together with the following estimator for the covariance matrix

            (2.4)                            where  ,

where Nk is the number of images that belong to class k, and N is the total number of images. In order to make Σb invertible, add  small e.g.  . Plot the means of each class as an image. Hint: In this part, if you use the entire training dataset to train your model, your computer’s memory will likely run out. Start with a small number N_data = 2000 and slowly increase it. In your final model, use as many samples as permitted by the computer memory. Report this number below. Try to avoid for loops as much as possible. Many of these operations can be written as matrix-matrix products. Take advantage of 1-of-K encoding.

(c)    Using the MLE estimators obtained in previous part as well as the posterior (2.2), make predictions on both training and test sets and report the obtained accuracy in each dataset. Also, report the number of training images used to compute the MLE estimators.

(d)    Briefly compare the performance of this model to that of logistic regression.

(e)    Using the generative model you trained, generate 10 images from digit 0 and 10 images from digit 3.

More products