Starting from:

$25

CS260-Homework 2 Solved

Problem 1. True or False
Decide whether the following statements are true or false. Justify your answers.

(a)   (10 pt) If classifier A has smaller training error than classifier B, then classifier A will have smaller generalization (test) error than classifier B.

(b)   (10 pt) The VC dimension is always equal to the number of parameters in the model.

(c)    (10 pt) For non-convex problems, gradient descent is guaranteed to converge to the global minimum.

Problem 2. Multiple choice questions
Choose the correct answer and justify your answer.

(a) (20 pt) Which of the following is not a possible growth function√      mH(N) for some hypothesis set? (1) 2N

 

(2) 2 N (3) 1 (4) N2 − N + 2 (5) none of the other choices

Problem 3. Proximal Gradient Descent
Consider solving the following problem

 ,

w

where X ∈ Rn×d is the feature matrix (each row is a feature vector), y ∈ Rn is the label vector, kwk1 := Pi |wi| and λ 0 is a constant to balance loss and regularization. This is known as the Lasso regression problem and our goal is to derive the “proximal gradient method” for solving this.

•   (10 pt) The gradient descent algorithm cannot be directly applied since the objective function is nondifferentiable. Discuss why the objective function is non-differentiable.

•   (30 pt) In the class we showed that gradient descent is based on the idea of function approximation. To form an approximation for non-differentiable function, we split the differentiable part and non-differentiable part. Let , as discussed in the gradient descent lecture we approximate g(w) by

 .

In each iteration of proximal gradient descent, we obtain the next iterate (wt+1) by minimizing the following approximation function:

wt+1 = argmingˆ(w) + λkwk1. w

Derive the close form solution of wt+1 given wt,∇g(wt),η,λ. What’s the time complexity for one proximal gradient descent iteration?

More products