$30
Optimization for Machine Learning
Theoretical Problems
1. Let 4, where x ∈Rd. Find its conjugate f∗(x).
2. Let x ∈R and y ∈R+. Is f(x,y) = x2/y a convex function? Prove your claim.
3. Consider the convex set C = {x ∈Rd : kxk∞ ≤ 1}. Given y ∈Rd, compute the projection projC(y).
4. Compute ∂f(x) for the following functions of x ∈Rd
• f(x) = kxk2
• f(x) = 1(kxk∞ ≤ 1)
• f(x) = kxk2 + kxk∞
5. Consider the square root Lasso method. Given X ∈ Rn×d and y ∈ Rn, we want to find w ∈Rd to solve
, (1)
subject to ξj ≥ wj, ξj ≥−wj (j = 1,...,d). (2)
Lasso produces sparse solutions. Define the support of the solution as
S = {j : w∗,j 6= 0}.
Write down the KKT conditions under the assumption that Xw∗ 6= y. Simplify in terms of S,XS,XS¯,y,wS. Here XS contains the columns of X in S, XS¯ contains the columns of X not in S, and wS contains the nonzero components of w∗.
Programming Problem (4 points)
We consider ridge regression problem with randomly generated data. The goal is to implement gradient descent and experiment with different strong-convexity settings and different learning rates.
• Use the python template “prog-template.py”, and implement functions marked with ’# implement’.
• Submit your code and outputs. Compare to the theoretical convergence rates in class, and discuss your experimental results.
1