COMP6211E-Homework 3 Solved

Starting from:

$30

Optimization for Machine Learning (Homework #3)

Theoretical Problems

1. Consider x ∈ R+. Consider

h(x) = −lnx

• Find the Bregman divergence Dh(x,y).

• Consider function f(x) = 0.5x − ln(x + 1).

What’s the smoothness and strong convexity parameters of f with respect to h?

• Consider h-mirror descent method for solving f(x). What’s the formula to obtain xt from xt−1 with a learning rate ηt.

2. Consider a composite optimization problem

f(x) + g(x),

with non-convex regularizer g(x) given by

,

for some λ > 0. Since logarithm grows slowly when |xj| increases, this regularizer leads to less bias than L1 regularization. Assume we want to apply proximal gradient method

proxηg [x − η∇f(x)]

to solve this problem, where the proximal optimization problem becomes

proxηg(z) = argmin(1) x

• Show that when η > 0 is sufficiently small, then the proximal optimization problem (1) is strongly convex for all z. What is the range for such η?

• Find the closed form solution for proxηg(z) when η is sufficiently small so that (1) is convex.

• If f(x) is L smooth and 2λ strongly convex. Does the proximal gradient algorithm converge? If so, how many iterations are needed to obtain an primal suboptimal solution?

1

3. Consider the optimization problem

,

where Ai are matrices. We rewrite it as

subject to A1x − z1 = 0,...,Anx − zn = 0, A0x − z0 = 0.

• Write down the Lagrangian function L(x,z,α) with multipliers α1,...,αn and α0 corresponding to the n + 1 linear constraints.

• Find the dual formulation φD(α) = minx,z L(x,z,α) in terms of fi∗.

• If n = 1 and A1 = I, write down the dual objective function in α0 by eliminating α1. Assume ) is smooth, how to get primal the primal variable x from dual α0? 4. (4 points) Consider a symmetric positive definite matrix A, and let

.

• Find the Fenchel’s dual of f(x) + g(x).

• Find ∇f∗(α) and ∇g∗(α)

• Write down a closed form formula for the dual ascent method.

• Find the smoothness of f∗ and g∗ and explain how to set learning rate for dual ascent.

Programming Problem

• Download data in the mnist sub-directory (which contains class 1 (positive) versus 7 (negative)

• Use the python template “prog-template.py”, and implement functions marked with ’# implement’.

• Complete codes and get the plots

• Discuss the behaviors of proximal and dual averaging algorithms in the experiments

• Discuss the impacts of different µ in the experiments

• Moreover, can you reduce the degree of oscillations for GD by selecting other learning rates when the objective function is highly non-smooth?

2

More products

COMP5511- Assignment 2 Solved

$25

Add to cart

COMP5511- Assignment 1: Principles of Data Structures and Algorithms Solved

$25

Add to cart

CS606- Programming Assignment 1 Solved

$35

Add to cart