$24.99
In this problem you will implem ent and train a 3-layer neural network to classify images of clothing items from the Fashion MNIST dataset. Similarly to Homework 3, the input to the network will be a 28 × 28-pixel image (converted into a 784-dimensional vector); the output will be a vector of 10 probabilities (one for each clothing type). Specifically, the network you create should implement a function g : R784 →R10, where:
z(1) = W(1)x + b(1)
h(1) = relu(z(1)))
z(2) = W(2)h(1) + b(2)
yˆ = g(x) = softmax(z(2))
Computing each of the intermediate outputs z(1), h(1), z(2), and yˆ is known as forward propagation since it follows the direction of the edges in the directed graph shown below:
Loss function: For the Fashion MNIST dataset you should use the cross-entropy loss function:
where n is the number of examples.
>
∇W(2)fCE = (yˆ −y)h(1)
∇b(2)fCE = (yˆ −y)
∇W(1)fCE = gx>
∇b(1)fCE = g
where column-vector g is defined so that
g
In the equation above, relu0 is the derivative of relu. Also, make sure that you follow the transposes exactly!
Hyperparameter tuning: In this problem, there are several different hyperparameters that will impact the network’s performance:
• Number of units in the hidden layer (suggestions: {30,40,50})
• Learning rate (suggestions: {0.001,0.005,0.01,0.05,0.1,0.5})
• Minibatch size (suggestions: 16,32,64,128,256)
• Number of epochs
• Regularization strength
In order not to “cheat” – and thus overestimate the performance of the network – it is crucial to optimize the hyperparameters only on a validation set; do not use the test set. To create a validation set, simply randomly select and set aside 20% of the training examples and associated labels.
Your task: Use stochastic gradient descent to minimize the cross-entropy with respect to W(1),W(2),b(1), and b(2). Specifically:
(a) Implement stochastic gradient descent for the network shown above. [40 points]
(b) Implement the pack and unpack functions shown in the starter code. Use these to verify that your implemented cost and gradient functions are correct (the discrepancy should be less than 0.01) using a numerical derivative approximation – see the call to check grad in the starter code. [10 points]
(c) Optimize the hyperparameters by training on the training set and selecting the parameter settings that optimize performance on the validation set. You should systematically (i.e., in code) try at least 10 (in total, not for each hyperparameter) different hyperparameter settings; accordingly, make sure there is a method called findBestHyperparameters (and please name it as such to help us during grading) [10 points]. Include a screenshot showing the progress and final output (selected hyperparameter values) of your hyperparameter optimization.
(d) After you have optimized your hyperparameters, then run your trained network on the test set and report (1) the cross-entropy and (2) the accuracy (percent correctly classified images). Include a screenshot showing both these values during the last 20 epochs of SGD. The test accuracy (percentage correctly classified test images) should be at least 87%. [5 points]
Datasets: You should use the following datasets:
• https://s3.amazonaws.com/jrwprojects/fashion_mnist_train_images.npy
• https://s3.amazonaws.com/jrwprojects/fashion_mnist_train_labels.npy
• https://s3.amazonaws.com/jrwprojects/fashion_mnist_test_images.npy
• https://s3.amazonaws.com/jrwprojects/fashion_mnist_test_labels.npy
In addition to your Python code (homework6 WPIUSERNAME.py, create a PDF file (homework6 WPIUSERNAME.pdf containing the screenshots described above. Please submit both the PDF and Python files in a single Zip file.