$30
1. The goal of this problem is to minimize a function given a certain input using gradient descent by breaking down the overall function into smaller components via a computation graph. The function is defined as:
.
(a) Please calculate .
Solution:
(b) Start with the following initialization: w1 = 0.3,w2 = −0.5,x1 = 0.2,x2 = 0.4, draw the computation graph. Please use backpropagation as we did in class.
You can draw the graph on paper and insert a photo into your report.
The goal is for you to practice working with computation graphs. As a consequence, you must include the intermediate values during the forward and backward pass.
Solution:
The computation graph is shown as below. All number above the lines are values in forward pass. All numbers below the lines are values in backward pass.
(c) Implement the above computation graph in the complimentary Colab Notebook using numpy. Use the values of (b) to initialize the weights and fix the input. Use a constant step size of 0.01. Plot the weight value w1 and w2 for 30 iterations in a single figure in the report.
Solution:
2. The goal of this problem is to understand the classification ability of a neural network. Specifically, we consider the XOR problem. Go to the link in footnote[1] and answer the following questions. Hint: hit reset the network right next to the run button after you change the architecture.
(a) Can a linear classifier, without any hidden layers, solve the XOR problem?
Solution: No. Since there’s only one layer,it is only capable of distinguish all data with a line. It is apparently not possible to divide the data in XOR problem with a line.
(b) With one hidden layer and ReLU(x) = max(0,x), how many neurons in the hidden layer do you need to solve the XOR problem? Describe the training loss and estimated prediction accuracy when using 2, 3 and 4 neurons. Discuss the intuition of why a certain number of neurons is necessary to solve XOR.
Solution:
When using 2 neurons, the training loss is 0.268, the estimated prediction accuracy is 78. The picture is shown as below.
When using 3 neurons, the training loss is 0.260, the estimated prediction accuracy is 73. The picture is shown as below.
When using 4 neurons, the training loss is 0.002, the estimated prediction accuracy is
00. The picture is shown as below.
I think that there are 2 status for x1 and 2 status for x2. Since a layer of neurons can only perform 1 manipulation, we need 2 = 4 neurons to represent the 4 conditions when x1 xor x2. Therefore, we can use the four neurons in the hidden layers to make to right prediction.
3. In this problem, we want to build a neural network from scratch using Numpy for a realworld problem. We consider the MNIST dataset (http://yann.lecun.com/exdb/mnist/), a hand-written digit classification dataset. Please follow the formula in the complimentary Colab Notebook. Hint: Make sure you pass the loss and gradient check in the notebook.
(a) Implement the loss and gradient of a linear classifier (python function linear classifier forward and backward).
(b) Implement the loss and gradient of a multilayer perceptron with one hidden layer and ReLU(x) = max(0,x) (python function mlp single hidden forward and backward).
(c) Implement the loss and gradient of a multilayer perceptron with two hidden layer, skip connection and ReLU(x) = max(0,x) (python function mlp two hidden forward and backward).
(d) Plot the development accuracy of each epoch of three models in a single figure using the following hyperparameters: the batch size is 50, the learning rate is 0.005 and the number of epochs is 20.
Solution:
(e) Try using other hyperparameters and select a set of best hyperparameters using development accuracy. Once you pick the best model and hyperparameters, include the development accuracy of each epoch into the above figure (make a new figure) and report the test accuracy of the selected model and hyperparameters.
Solution: The best parameter I currently find is BS = 100, LR = 0.01, NB EPOCH =
20. The development accuracy is 97.30%, higher than the original MLP with 2 hidden layers dev loss, which is 97.29%.
The picture is shown as below:
The test accuracy is 97.18%
[1] https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=xor®Dataset=
reg-plane&learningRate=0.01®ularizationRate=0&noise=0&networkShape=&seed=0.10699&showTestData= false&discretize=true&percTrainData=80&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false& cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero= false&hideText=false