Starting from:

$30

COMP6211E-Homework 3 Solved

                  Optimization for Machine Learning   (Homework #3)



Theoretical Problems 

1.     Consider x ∈ R+. Consider

h(x) = −lnx

•   Find the Bregman divergence Dh(x,y).

•   Consider function f(x) = 0.5x − ln(x + 1).

What’s the smoothness and strong convexity parameters of f with respect to h?

•   Consider h-mirror descent method for solving f(x). What’s the formula to obtain xt from xt−1 with a learning rate ηt.

2.     Consider a composite optimization problem

f(x) + g(x),

with non-convex regularizer g(x) given by

 ,

for some λ > 0. Since logarithm grows slowly when |xj| increases, this regularizer leads to less bias than L1 regularization. Assume we want to apply proximal gradient method

proxηg [x − η∇f(x)]

to solve this problem, where the proximal optimization problem becomes

 proxηg(z) = argmin(1) x

•   Show that when η > 0 is sufficiently small, then the proximal optimization problem (1) is strongly convex for all z. What is the range for such η?

•   Find the closed form solution for proxηg(z) when η is sufficiently small so that (1) is convex.

•   If f(x) is L smooth and 2λ strongly convex. Does the proximal gradient algorithm converge? If so, how many iterations are needed to obtain an  primal suboptimal solution?

1

3.     Consider the optimization problem

 ,

where Ai are matrices. We rewrite it as

                                           subject to A1x − z1 = 0,...,Anx − zn = 0, A0x − z0 = 0.

•   Write down the Lagrangian function L(x,z,α) with multipliers α1,...,αn and α0 corresponding to the n + 1 linear constraints.

•   Find the dual formulation φD(α) = minx,z L(x,z,α) in terms of fi∗.

•   If n = 1 and A1 = I, write down the dual objective function in α0 by eliminating α1. Assume  ) is smooth, how to get primal the primal variable x from dual α0? 4. (4 points) Consider a symmetric positive definite matrix A, and let

 .

•   Find the Fenchel’s dual of f(x) + g(x).

•   Find ∇f∗(α) and ∇g∗(α)

•   Write down a closed form formula for the dual ascent method.

•   Find the smoothness of f∗ and g∗ and explain how to set learning rate for dual ascent.

Programming Problem 

•   Download data in the mnist sub-directory (which contains class 1 (positive) versus 7 (negative)

•   Use the python template “prog-template.py”, and implement functions marked with ’# implement’.

•    Complete codes and get the plots

•    Discuss the behaviors of proximal and dual averaging algorithms in the experiments

•    Discuss the impacts of different µ in the experiments

•    Moreover, can you reduce the degree of oscillations for GD by selecting other learning rates when the objective function is highly non-smooth?

2

More products