Starting from:

$30

IE684-Lab6 Solved

Please check the submission deadline announced in moodle.


Let {xk} be a sequence in Rn that converges to x∗. One possible classification of the convergence behavior is stated below:

The convergence is Q-Linear if there is a constant r ∈ (0,1) such that

  for all k sufficiently large,

where the norm || · || is generally assumed to be the Euclidean norm.

The convergence is Q-superlinear if

 .

The convergence is Q-quadratic if there exists some M > 0 such that we have

  for all k sufficiently large.

1. For each of following three sequences check empirically how fast they converge and try to check the theoretical rates of convergence:

(a)                  (1 + 0.005)−(2k), k = 0,1,2,...

(b)                  1 + (0.005)k, k = 0,1,2,...

(c)                   1 + (0.005)(−k), k = 0,1,2,...

(d)                  1 + k−k, k = 0,1,2,... (assume 00 = 1) You can check the rates by plotting the iterates.

In this lab, we will consider the ordinary least squares regression (OLSR) problem. We will discuss a few optimization algorithms to solve it. Please use only Python as your programming language.

Preparation Exercise (PREP):

1.    Import         the          required                Python   packages               using      the          following              commands import numpy as np

import matplotlib.pyplot as plt

(Recall the functionality of these packages.)

2.    For replication purposes, initialize the random number generator using np.random.seed(1000)

3.    Use the np.random.randn function to create A as a random numpy array of 1000 rows and 10 columns (recall the purpose of np.random.randn function). In data science parlance, we shall call A to be a data set of 1000 data points, each of dimension 10.



4.     Create ¯ as a random vector of size 10×1 such that when i is odd, each component ¯xi of ¯x is sampled uniformly from [−(i+1),−i] ⊂ R, and when i is even, each component ¯xi of ¯x is sampled uniformly from [i,i + 1] ⊂ R (Use appropriate numpy function here).

5.    Use np.random.randn to create ε as a random vector of size 1000 × 1.

6.    Compute y = Ax¯ + ε. Use an appropriate numpy function to do the matrix multiplication Ax¯ efficiently.


Exercise 1: Direct least squares loss minimization

Note that y is a noisy version of Ax¯. We will now try to estimate ¯x assuming that we are given y and A. One possible approach is to solve the following problem:

                                                                                          .                                                                                   (1)

The loss term   is called the ordinary least squares (OLS) loss and the problem (1) is called the OLS

Regression problem.

1.    Write Python functions using appropriate numpy routines to compute the objective function value, the gradient value and the Hessian of f.

2.    [R] With a starting point of , solve problem (1) using the Newton’s method implemented with backtracking line search (use α0 = 0.99,ρ = 0.5,γ = 0.5 for backtracking line search and τ = 10−5). Comment on difficulties (if any) you face when computing the inverse of Hessian (recall that you need to use an appropriate Python function to compute the inverse of the Hessian). If you face difficulty in computing inverse of Hessian, try to think of some remedy so that you can avoid the issue.

Let x∗ be the final optimal solution provided by your algorithm. Report the values of x∗ and ¯x, and discuss the observations.

Plot the values log(||xk − x∗||2) against iterations k = 0,1,2,....

Prepare a different plot for plotting log(|f(xk) − f(x∗)|) obtained from Newton’s method against the iterations.

Comment on the convergence rates of the iterates and the objective function values, by recalling the definitions given above.

3.    [R] With a starting point of , solve problem (1) using the BFGS method implemented in the previous lab with backtracking line search (use α0 = 0.99,ρ = 0.5,γ = 0.5 for backtracking line search and τ = 10−5).

Let x∗ be the final optimal solution provided by your BFGS algorithm. Report the values of x∗ and ¯x, and discuss the observations.

Plot the values log(||xk − x∗||2) against iterations k = 0,1,2,....

Prepare a different plot for plotting log(|f(xk)−f(x∗)|) obtained from BFGS method against the iterations.

Comment on the convergence rates of the iterates and the objective function values obtained by BFGS method.

4.    [R] Compare and contrast the results obtained by Newton’s method and BFGS method and comment on the time taken by both the methods.

Exercise 2: Regularized least squares loss minimization

1.    Let us now introduce the following regularized problem (with λ > 0):

                                                                                      .                                                                    (2)

[R] Comment on the significance of the newly added regularizer term , when compared to problem (1).

2.    Write Python functions to compute the function value, gradient and Hessian of fλ.

3.    For , perform the following: with a starting point of , solve the problem (2) using Newton and BFGS methods with backtracking line search (use α0 = 0.99,ρ = 0.5,γ = 0.5 for backtracking line search and τ = 10−5).

4.    [R] For Newton’s method prepare the following plots and discuss the relevant observations:

(a)    Prepare a single plot where you depict the values log(||xk − x∗||2) against iterations k = 0,1,2,..., for each value of λ (use different colors for different λ values; if necessary, add zoomed versions of the plots to depict the behavior clearly, and use appropriate legend in your plots). Comment on the convergence rates of the iterates for each value of λ.

(b)    Prepare a different plot for plotting log(|f(xk) − f(x∗)|) against the iterations, for each value of λ (use different colors for different λ value; if necessary, add zoomed versions of the plots to depict the behavior clearly, and use appropriate legend in your plots). Comment on the convergence rates of the objective function values.

5.    [R] For BFGS method prepare the following plots and discuss the relevant observations:

(a)    Prepare a single plot where you depict the values log(||xk − x∗||2) against iterations k = 0,1,2,..., for each value of λ (use different colors for different λ values; if necessary, add zoomed versions of the plots to depict the behavior clearly, and use appropriate legend in your plots). Comment on the convergence rates of the iterates for each value of λ.

(b)    Prepare a different plot for plotting log(|f(xk) − f(x∗)|) against the iterations, for each value of λ (use different colors for different λ value; if necessary, add zoomed versions of the plots to depict the behavior clearly, and use appropriate legend in your plots). Comment on the convergence rates of the objective function values.

6.    [R] Compare and contrast the results obtained by Newton’s method and BFGS method and comment on the time taken by both the methods for each value of λ.

 

More products