$34.99
1. Neural net functions
a) Sketch the function generated by the following 3-neuron ReLU neural network.
f(x) = 2(x − 0.5)+ − 2(2x − 1)+ + 4(0.5x − 2)+
where x ∈ R and where (z)+ = max(0,z) for any z ∈ R. Note that this is a single-input, single-output function. Plot f(x) vs x by hand.
b) Consider the continuous function depicted below. Approximate this function with ReLU neural network with 2 neurons. The function should be in the form
Indicate the weights and biases of each neuron and sketch the neural network function.
c) A neural network fw can be used for binary classification by predicting the label as yˆ = sign(fw(x)). Consider a setting where x ∈ R2 and the desired classifier is −1 if both elements of x are less than or equal to zero and +1 otherwise. Sketch the desired classification regions in the two-dimensional plane, and provide a formula for a ReLU network with 2-neurons that can produce the desired classification. For simplicity, assume in this questions that sign(0) = −1.
2. Gradients of a neural net. Consider a 2 layer neural network of the form f(x) =
. Suppose we want to train our network on a dataset of N samples xi
with corresponding labels yi, using a least squares loss function .
Derive the gradient descent update steps for the input weights wj and output weights vj.
1 of 7
3. Compressing neural nets. Large neural network models can be approximated by considering low rank approximations to weight matrices. The neural network f(x) = can be written as
f(x) = vT(Wx)+.
where v is a J × 1 vector of the output weights and W is a J × d matrix with ith row wjT. Let σ1,σ2,... denote the singular values of W and assume that for i > r. Let fr denote the neural network obtained by replacing W with its best rank r approximation Wˆ r. Assuming that x has unit norm, find an upper bound to the difference maxx |f(x) − fr(x)|. (Hint: for any pair of vectors a and b, the following inequality holds ka+ − b+k2 ≤ ka − bk2).
a) Build a classifier using a full connected three layer neural network with logistic activation functions. Your network should
• take a vector x ∈ R10 as input (nine features plus a constant offset), • have a single, fully connected hidden layer with 32 neurons
• output a scalar yb.
Note that since the logistic activation function is always positive, your decision should be as follows: y > 0.5 corresponds to a ‘happy’ face, while y ≤ 0.5 is not b b
happy.
b) Train your classifier using stochastic gradient descent (start with a step size of α = 0.05) and create a plot with the number of epochs on the horizontal axis, and training accuracy on the vertical axis. Does your classifier achieve 0% training error? If so, how many epoch does it take for your classifier to achieve perfect classification on the training set?
c) Find a more realistic estimate of the accuracy of your classifier by using 8-fold cross validation. Can you achieve perfect test accuracy?
2 of 7