$30
Exercise 1: Backtracking Line Search
Recall that we implemented the gradient descent algorithm to solve minx∈Rn f(x). The main ingredients in the gradient descent iterations are the descent direction pk which is set to −∇f(xk), and the step length ηk which is found by solving an optimization problem (or sometimes taken as a constant value over all iterations). We used the following procedure in the previous lab:
Input: Starting point x0, Stopping tolerance τ Initialize k = 0 pk = −∇f(xk).
while kpkk2 > τ do ηk = argminη≥0 f(xk + ηpk) = argminη≥0 f(xk − η∇f(xk)) xk+1 = xk + ηkpk = xk − ηk∇f(xk)
k = k + 1
end
Output: xk.
Algorithm 1: Gradient Descent Procedure with line search to find step length
We saw that for particular quadratic functions, a closed form analytical solution for the minimizer of the optimization problem minη≥0 f(xk + ηpk) exists. However finding a closed form expression as a solution to this optimization problem to find a suitable step length might not always be possible. To tackle general situations, we will try to devise a different procedure in this lab.
To find the step length, we will use the following property: Suppose a non-zero p ∈Rn is a descent direction at point x, and let γ ∈ (0,1). Then there exists ε > 0 such that
f(x + αp) ≤ f(x) + γα∇f(x)>p, ∀α ∈ (0,ε].
This condition is called a sufficient decrease condition.
Using the idea of sufficient decrease, the step length ηk can be found using a backtracking procedure illustrated below to find appropriate value of ε.
Input: xk, pk, α0, ρ ∈ (0,1), γ ∈ (0,1) Initialize α = α0 pk = −∇f(xk).
while f(xk + αpk) > f(xk) + γα∇f(xk)>pk do
α = ρα
end
Output: α.
Algorithm 2: Backtracking line search
1. Note that in the notebook file shared with you, the gradient descent algorithm is implemented to solve minx f(x) = (x1 − 8)2 + (x2 + 12)2 with a constant step length η = 0.1, stopping tolerance τ = 10−5 and with the starting point x0 = (1,1).
2. Implement the methods to find step length using exact line search (closed form expression), and using backtracking line search for the function f(x) = f(x1,x2) = (x1− 8)2 + (x2 + 12)2.
3. [R] Note down the minimizer and minimum function value of f(x) = f(x1,x2) = (x1− 8)2 + (x2 + 12)2.
4. [R] Consider stopping tolerance τ = 10−12 and starting point x0 = (25,25). Compare the number of iterations taken by the gradient descent procedure which uses exact step length computation against the gradient descent procedure which uses the backtracking line search procedure (with α0 = 1,ρ = 0.5,γ = 0.5). Comment on your observations.
5. [R] With starting point x0 = (25,25) and τ = 10−10, we will now study the behavior of the backtracking line search algorithm for different choices of α0.Take γ = ρ = 0.5. Try α0 ∈{1,0.9,0.75,0.6,0.5,0.4,0.25,0.1,0.01}. For each α0, record the final minimizer, final objective function value and number of iterations taken by the gradient descent algorithm with backtracking line search to terminate. Prepare a plot where the number of iterations is plotted against α0 values. Comment on the observations. Comment about the minimizers and objective function values obtained for different choices of the α0 values. Check and comment if for any α0 value, gradient descent with backtracking line search takes lesser number of iterations when compared to the gradient descent procedure with exact line search.
6. [R] With starting point x0 = (25,25) and τ = 10−10, we will now study the behavior of the backtracking line search algorithm for different choices of ρ. Take α = 1,γ = 0.5. Try ρ ∈{0.9,0.75,0.6,0.5,0.4,0.25,0.1,0.01}. For each ρ, record the final minimizer, final objective function value and number of iterations taken by the gradient descent algorithm with backtracking line search to terminate. Prepare a plot where the number of iterations is plotted against ρ values. Comment on the observations. Comment about the minimizers and objective function values obtained for different choices of the ρ values. Check and comment if for any ρ value, gradient descent with backtracking line search takes lesser number of iterations when compared to the gradient descent procedure with exact line search.
Exercise 2:
Consider the function
.
1. Consider N = 3. Implement the evalf, evalg and other related functions for implementing gradient descent (with exact line search and with backtracking line search) to solve minx h(x).
2. [R] Note down the minimizer and minimum function value of h(x).
3. [R] Consider stopping tolerance τ = 10−10 and starting point x 1). Compare the number of iterations taken by the gradient descent procedure which uses exact step length computation against the gradient descent procedure which uses the backtracking line search procedure (with α0 = 1,ρ = 0.5,γ = 0.5). Comment on your observations.
4. [R] Check if similar observations hold in the N = 4 case as well. Choose a starting point x and for the backtracking line search procedure, use α0 = 1,ρ = 0.5,γ = 0.5. Comment on your observations.
5. [R] Can you also comment on the possible observations for N > 4 case as well, without actually running the program?
Exercise 3:
Consider the function
.
1. [R] Find the minimizer of the function q(x).
2. [R] Can you use the exact line search idea to find a closed form expression for solving minη q(x + ηp) where p 6= 0 is a descent direction at x? If yes, find the closed form expression for optimal value of η. If you cannot find closed form solution, explain the reasons.
3. If you could use the exact line search procedure to solve minη q(x+ηp) for finding the step length, implement the procedure. If not, you can ignore the implementation.
4. Write appropriate code to implement the gradient descent procedure with backtracking line search to solve minx q(x).
5. [R] With starting point x0 = (100,100) and τ = 10−10, we will now study the behavior of the backtracking line search algorithm for different choices of α0.Take γ = ρ = 0.5. Try α0 ∈{1,0.9,0.75,0.6,0.5,0.4,0.25,0.1,0.01}. For each α0, record the final minimizer, final objective function value and number of iterations taken by the gradient descent algorithm with backtracking line search to terminate. Prepare a plot where the number of iterations is plotted against α0 values. Comment on the observations. Comment about the minimizers and objective function values obtained for different choices of the α0 values. If you have implemented exact line search, check and comment if for any α0 value, gradient descent with backtracking line search takes lesser number of iterations when compared to the gradient descent procedure with exact line search.
6. [R] With starting point x0 = (100,100) and τ = 10−10, we will now study the behavior of the backtracking line search algorithm for different choices of ρ. Take α = 1,γ = 0.5. Try ρ ∈{0.9,0.75,0.6,0.5,0.4,0.25,0.1,0.01}. For each ρ, record the final minimizer, final objective function value and number of iterations taken by the gradient descent algorithm with backtracking line search to terminate. Prepare a plot where the number of iterations is plotted against ρ values. Comment on the observations. Comment about the minimizers and objective function values obtained for different choices of the ρ values. If you have implemented exact line search, check and comment if for any ρ value, gradient descent with backtracking line search takes lesser number of iterations when compared to the gradient descent procedure with exact line search.