$30
Theoretical Problems
1. Consider the quadratic objective function
defined on x = [x1,x2] ∈R2. Assume that we want to solve
x∗ = argminQ(x)
x
from x0 = 0.
• Find A and b so that .
• For gradient descent method with constant learning rate η, what range should η belong to? What is the optimal value of η, and what is the corresponding convergence rate?
• For CG, how many iterations T are needed to find XT = x∗? Find values of α1, β1, and α2.
• For the Heavy-Ball method with constant η and β. What’s the optimal values of (η,β) to achieve the fastest asymptotic convergence rate, and what is the corresponding convergence rate?
2. Consider the regularized logistic regression:
where xi ∈Rd and yi ∈{±1}. Assume kxik2 ≤ 1 for all i.
• find the smoothness parameter L of f(w).
• find an estimate of Lipschitz constant G in the region {w : f(w) ≤ f(0)} which holds for all dataset {xi} such that kxik2 ≤ 1.
3. Consider training data (xi,yi) so that kxik2 ≤ 1 and yi ∈{±1}, and we would like to solve the linear SVM problem
using subgradient descent with w0 = 0, and learning rate ηt ≤ η < 1/λ.
• Let C = {w : kwk2 ≤ R}. Find the smallest R so that for all training data that satisfy the assumptions of the problem, subgradient descent without projection belongs to C.
1
• Find an upper bound of Lipschitz constant G of f(w) in C.
4. Given a nonsmooth function, we would like to find its smooth approximation.
• Find a closed form solution of
.
• Let x ∈Rd, find
.
Proof. Let z achieves the minimum on the right hand side. If kxk2 ≤ γ, then z = 0 is a solution, and
.
If kxk2 > γ, then z = (1 − γ/kxk2)x, and
f(x) = kxk2− γ/2.
We thus obtain
kxk2 ≤ γ
otherwise
Programming Problem
We consider optimization with the smoothed hinge loss, and randomly generated data.
• Use the python template “prog-template.py”, and implement functions marked with ’# implement’. • Submit your code and outputs. Please note that in real machine learning problems, strong-convexity settings are often associated with L2 regularization: it is necessary to choose a proper regularization to better estimate true w. Thus, you should
1. choose the most proper setting mentioned in “prog-template.py” for real problems (including λ, γ, and the optimizer);
2. Try some different λ and discuss how λ influences optimization and prediction respectively.
2