Starting from:

$40

CSE 676: Assignment #1 Solved

CSE 676: Assignment #1

0.1      Softmax [20 points]
1)  [10 point] Prove that softmax is invariant to constant sifts in the input, i.e., for any input vector x and a constant scalar c, the following holds:

softmax(x) = softmax(x+c) ,

where softmax(x) , and x+c means adding c to every dimension of x.

2)  [10 point] Let z = Wx+c, where W and c are some matrix and vector, respectively. Let

J = Xlogsoftmax(z)i .

i

Calculate the derivatives of J w.r.t. W and c, respectively, i.e., calculate ∂∂JW and  .

0.2        Logistic Regression with Regularization [20 points]
1)  [10 point] Let the data be ( , where xi ∈ Rd and yi ∈ {0,1}. Logistic regression is a binary classification model, with the probability of yi being 1 as:

  ,

where θ is the model parameter. Assume we impose an L2 regularization term on the parameter, defined as:

 

with a positive constant λ. Write out the final objective function for this logistic regression with regularization model.

2)  [10 point] If we use gradient descent to solve the model parameter. Derive the updating rule for θ. Your answer should contain the derivation, not just the final answer.

0.3        Derivative of the Softmax Function [30 points]
1) [10 point] Define the loss function as

K

J(z) = −Xyk log ˜yk ,

k=1

        where ˜             , and (y1,··· ,yK) is a known probability vector. Derive the  .

Note z = (z1,··· ,zK) is a vector so ∂J∂(zz) is in the form of a vector. Your answer should contain the derivation, not just the final answer.

1

CSE 676               
2 [10 point] Assume the above softmax is the output layer of an FNN. Briefly explain how the derivative is used in the backpropagation algorithm.

3) [10 points] Let z = WT h+b, where W is a matrix, b and h are vectors. Use the chain rule to calculate the gradient of W and b, i.e., ∂∂JW and , respectively.

0.4         MNIST with FNN [30 points]
1) [30 points] Design an FNN for MNIST classification. Implement the model and plot two curves in one figure: i) training loss vs. training iterations; ii) test loss vs. training iterations.

–   You can use online code. However, you must reference (cite) the code in your answer.

–   Submission includes the plot of the two curves and the runnable code (with a ReadMe file containing instructions on how to run the code).

2

More products