Starting from:

$30

IE684-Lab3 Solved

Exercise 1: Gradient Descent with Scaling

Recall that in the last lab, when we tried to solve certain problems of the form minx∈Rn f(x) using gradient descent algorithm, we noticed that the algorithm needed a large number of iterations to find the minimizer. As a workaround to avoid such situations, we consider the modified gradient descent scheme illustrated in Algorithm 1.

 

Input: Starting point x0, Stopping tolerance τ Initialize k = 0 pk = −∇f(xk).

while kpkk2 > τ do

 Choose a suitable scaling matrix Dk.

ηk = argminη≥0 f(xk + ηDkpk) = argminη≥0 f(xk − ηDk∇f(xk)) xk+1 = xk + ηkDkpk = xk − ηkDk∇f(xk)

k = k + 1

end

Output: xk.

Algorithm 1: Gradient Descent Procedure with Scaling

 

1.       Note that in the notebook file shared with you, we provide the motivation for the modified gradient descent procedure with scaling.

2.       [R] Consider the function  . Write code to find the Hessian matrix of f and its condition number.

3.       Note that the update step in the modified gradient descent scheme uses a scaled gradient. Thus it becomes important to set up some criteria for choosing the Dk matrix in every iteration. In this exercise, we will assume Dk to be a diagonal matrix. The following questions will help in designing a suitable Dk.

4.       [R] Based on our discussion on condition number and the derivation of the gradient descent scheme with scaling, can you identify and write down the matrix Q whose condition number needs to be analyzed in the new gradient scheme with scaling?

5.       [R] Based on the matrix Q, can you come up with a useful choice for Dk (assuming Dk to be diagonal)?

6.       Write all related code to implement Algorithm 1 to find the minimizer of the function .

7.       [R] Note down the minimizer and minimum function value of  .

8.       [R] With starting point x0 = (1,4000) and a stopping tolerance τ = 10−12, find the number of iterations taken by the gradient descent algorithm (without scaling) with exact line search, gradient descent algorithm (without scaling) with backtracking line search, gradient descent algorithm (with scaling) with backtracking line search. For backtracking line search, use α0 = 1,ρ = 0.5,γ = 0.5. Note the minimizer and minimum objective function value in each case. Comment on your observations.

9.       [R] With starting point x0 = (1,4000) and τ = 10−12, we will now study the behavior of gradient descent algorithm (without scaling) with backtracking line search, gradient descent algorithm (with scaling) with backtracking line search, for different choices of α0. Take γ = ρ = 0.5. Try α0 ∈{1,0.9,0.75,0.6,0.5,0.4,0.25,0.1,0.01}. For each α0, record the final minimizer, final objective function value and number of iterations to terminate, for the gradient descent algorithm (without scaling) with backtracking line search and the gradient descent algorithm (with scaling) with backtracking line search. Prepare a plot where the number of iterations for both the algorithms are plotted against α0 values. Use different colors and a legend to distinguish the plots corresponding to the different algorithms. Comment on the observations. Comment about the minimizers and objective function values obtained for different choices of the α0 values for the two algorithms.

10.   [R] With starting point x0 = (1,4000) and τ = 10−12, we will now study the behavior of gradient descent algorithm (without scaling) with backtracking line search, gradient descent algorithm (with scaling) with backtracking line search, for different choices of ρ. Take α = 1,γ = 0.5. Try ρ ∈{0.9,0.8,0.75,0.6,0.5,0.4,0.25,0.1,0.01}. For each ρ, record the final minimizer, final objective function value and number of iterations to terminate, for the gradient descent algorithm (without scaling) with backtracking line search and the gradient descent algorithm (with scaling) with backtracking line search. Prepare a plot where the number of iterations for both the algorithms are plotted against ρ values. Use different colors and a legend to distinguish the plots corresponding to the different algorithms. Comment on the observations. Comment about the minimizers and objective function values obtained for different choices of the ρ values for both the algorithms.


Consider the function

 .

1.    [R] Can you design a suitable diagonal matrix Dk for gradient descent algorithm with scaling to solve minx q(x). If you can come up with a suitable choice of Dk, use it in the implementation of Algorithm 1 (with backtracking line search) to find the minimizer of q(x) with starting point x0 = (8,8) and τ = 10−5. Consider α0 = 1,ρ = 0.5,γ = 0.5 for backtracking line search. Comment on your observations when compared to the gradient descent (without scaling) with backtracking line search. If you cannot find a suitable choice of Dk, explain clearly the reasons.

2.    If we allow the scaling matrix Dk to be non-diagonal, then a natural choice turns out to be Dk = (∇2f(xk))−1. This choice of Dk yields the popular Newton’s method. Implement Algorithm 1 with this choice of Dk = (∇2f(xk))−1.

3.    [R] Based on our discussion on condition number and the derivation of the gradient descent scheme with scaling, can you identify and write down the matrix Q whose condition number needs to be analyzed in the new gradient descent scheme with scaling with Dk = (∇2f(xk))−1?

4.    [R] With starting point x0 = (8,8) and a stopping tolerance τ = 10−5, find the number of iterations taken by the gradient descent algorithm (without scaling) with backtracking line search, gradient descent algorithm (with scaling) with backtracking line search. For backtracking line search, use α0 = 1,ρ = 0.5,γ = 0.5. Note the minimizer and minimum objective function value in each case. Comment on your observations. Also note the condition number of the Hessian matrix involved in the gradient descent algorithm (without scaling) with backtracking line search and condition number of the matrix Q involved in the gradient descent algorithm (with scaling) with backtracking line search in each iteration. Prepare a plot depicting the behavior of condition numbers in both algorithms against iterations. Use different colors and legend to distinguish the methods. Comment on your observations.

More products