COMP6211E-Homework 2 Solved

Starting from:

$30

Theoretical Problems

1. Consider the quadratic objective function

defined on x = [x1,x2] ∈R2. Assume that we want to solve

x∗ = argminQ(x)

x

from x0 = 0.

• Find A and b so that .

• For gradient descent method with constant learning rate η, what range should η belong to? What is the optimal value of η, and what is the corresponding convergence rate?

• For CG, how many iterations T are needed to find XT = x∗? Find values of α1, β1, and α2.

• For the Heavy-Ball method with constant η and β. What’s the optimal values of (η,β) to achieve the fastest asymptotic convergence rate, and what is the corresponding convergence rate?

2. Consider the regularized logistic regression:

where xi ∈Rd and yi ∈{±1}. Assume kxik2 ≤ 1 for all i.

• find the smoothness parameter L of f(w).

• find an estimate of Lipschitz constant G in the region {w : f(w) ≤ f(0)} which holds for all dataset {xi} such that kxik2 ≤ 1.

3. Consider training data (xi,yi) so that kxik2 ≤ 1 and yi ∈{±1}, and we would like to solve the linear SVM problem

using subgradient descent with w0 = 0, and learning rate ηt ≤ η < 1/λ.

• Let C = {w : kwk2 ≤ R}. Find the smallest R so that for all training data that satisfy the assumptions of the problem, subgradient descent without projection belongs to C.

1

• Find an upper bound of Lipschitz constant G of f(w) in C.

4. Given a nonsmooth function, we would like to find its smooth approximation.

• Find a closed form solution of

.

• Let x ∈Rd, find

.

Proof. Let z achieves the minimum on the right hand side. If kxk2 ≤ γ, then z = 0 is a solution, and

.

If kxk2 > γ, then z = (1 − γ/kxk2)x, and

f(x) = kxk2− γ/2.

We thus obtain

kxk2 ≤ γ

otherwise

Programming Problem

We consider optimization with the smoothed hinge loss, and randomly generated data.

• Use the python template “prog-template.py”, and implement functions marked with ’# implement’. • Submit your code and outputs. Please note that in real machine learning problems, strong-convexity settings are often associated with L2 regularization: it is necessary to choose a proper regularization to better estimate true w. Thus, you should

1. choose the most proper setting mentioned in “prog-template.py” for real problems (including λ, γ, and the optimizer);

2. Try some different λ and discuss how λ influences optimization and prediction respectively.

2

More products

CS7643 Homework 4 Solution

$39.99

Add to cart

CS7643 Homework 3 Solution

$29.99

Add to cart

CS7643 Homework 2 Solution

$10.29

Add to cart