Starting from:

$29.99

MAT3007 Exercise Sheet 8 Optimization Solution

Exercise Sheet 8
Problem 1 (A One-Dimensional Problem): (approx. 20 pts)
We consider the optimization problem
1 2(x − 1)
min f(x) := − log(x) − s.t. x ∈ [1.5,4.5]. x (x − 1)2 x + 1
Implement the bisection or the golden section method to solve this problem and output a solution with accuracy at least 10−5.
Problem 2 (The Gradient Method): (approx. 30 pts)
In this exercise, we want to solve the optimization problem
2 1 4
min2 f(x) := x41 + x31 + x2 − 2x21x2 + x22. (1)
1
x∈R 3 2 3
via the gradient descent method. (This is problem 1 discussed in sheet 6).
Implement the gradient method that was presented in the lecture as a function gradient_method in MATLAB or Python.
The following input functions and parameters should be considered:
• obj, grad – function handles that calculate and return the objective function f(x) and the gradient ∇f(x) at an input vector x ∈ Rn. You can treat these handles as functions or fields of a class or structure f or you can use f and ∇f directly in your code. (For example, your function can have the form gradient_method(obj,grad,...)).
• x0 – the initial point.
• tol – a tolerance parameter. The method should stop whenever the current iterate xk satisfies the criterion k∇f(xk)k ≤ tol.
We want to investigate the performance of the gradient method for different step size strategies. In particular, we want to test and compare backtracking and exact line search. The following parameters will be relevant for these strategies:
• σ,γ ∈ (0,1) – parameters for backtracking and the Armijo condition. (At iteration k, we choose αk as the largest element in {1,σ,σ2,...} satisfying the condition f(xk−αk∇f(xk))− f(xk) ≤ −γαk · k∇f(xk)k2).
• You can use the golden section method to determine the exact step size αk. The parameters for the golden section method are: maxit (maximum number of iterations), tol (stopping tolerance), [0,a] (the interval of the step size).
You can organize the latter parameters in an appropriate options class or structure. It is also possible to implement separate algorithms for backtracking and exact line search. The method(s) should return the final iterate xk that satisfies the stopping criterion.
a) Apply the gradient method with backtracking and parameters (σ,γ) = (0.5,0.1) and exact line search (maxit = 100, tol = 10−6, a = 2) to solve the problem minx f(x).
The algorithms should use the stopping tolerance tol = 10−5. Test the methods using the initial point x0 = (3,3)> and report the behavior and performance of the methods. In particular, compare the number of iterations and the point to which the different gradient descent methods converged.
b) Let us define the set of initial points
( ! ! ! !)
X0 := −3 , 3 , −3 , 3
−3 −3 3 3
Run the methods:
– Gradient descent method with backtracking and (σ,γ) = (0.5,0.1),
– Gradient method with exact line search and maxit = 100, tol = 10−6, a = 2,
again for every initial point in the set X0 using the tolerance tol = 10−5. For each algorithm/step size strategy create a single figure that contains all of the solution paths generated for the different initial points. The initial points and limit points should be clearly visible. Add a contour plot of the function f in the background of each figure.
Problem 3 (A Limited-Memory Version of Adagrad): (approx. 25 pts)
In this exercise, we investigate a limited-memory version of the gradient descent method with adaptive diagonal scaling. The update of the method is given by
xk , (2)
where αk ≥ 0 is a suitable step size and Dk ∈ Rn×n is a diagonal matrix that is chosen as follows:
Dk = diag( ,...,vnk) and
v
u k vik = uut + X , tm(k) = max{0,k − m}, ∀ i = 1,...,n.
j=tm(k)
Here, > 0, the memory constant m ∈ N, and the initial point x0 ∈ Rn are given parameters.
a) Show that the direction dk is a descent direction for all k ∈ N (assuming
∇f(xk) 6= 0).
b) Write a MATLAB or Python code and implement the gradient method with adaptive diagonal scaling (Adagrad) and backtracking (i.e., implement the abstract descent method with dk as descent direction). Run Adagrad on problem (1) introduced in the last exercise and compare its performance with the tested approaches in Problem 2 using the same initial points and stopping tolerance as in Problem 2 b). You can use the following parameters:
– (σ,γ) = (0.5,0.1) – parameter for backtracking.
– = 10−6 – scaling parameter in the update (2).
– m = 25 – memory parameter.
Generate a plot of the different solution paths of Adagrad using the different initial points from X0 and add a contour plot of f in the background.
c) How does Adagrad behave when different memories m are chosen? (Suitable other choices might include m ∈ {5,10,15,25,...,maxit})? Does the performance or convergence behavior change?
Problem 4 (Globalized Newton’s Method): (approx. 25 pts)
Implement the globalized Newton method with backtracking that was presented in the lecture as a function newton_glob in MATLAB or Python.
The pseudo-code for the full Newton method is given as follows:

Algorithm 1: The Globalized Newton Method

1 Initialization: Select an initial point x0 ∈ Rn and parameter γ,γ1,γ2,σ ∈ (0,1) and tol. for k = 0,1,... do

The following input functions and parameters should be considered:
• obj, grad, hess – function handles that calculate and return the objective function f(x), the gradient ∇f(x), and the Hessian ∇2f(x) at an input vector x ∈ Rn. You can treat these handles as functions or fields of a class or structure f or you can use f, ∇f, and ∇2f from part a) and b) directly in the algorithm. (For example, your function can have the form newton_glob(obj,grad,hess,...)).
• x0 – the initial point.
• tol – a tolerance parameter. The method should stop whenever the current iterate xk satisfies the criterion k∇f(xk)k ≤ tol.
• γ1,γ2 > 0 – parameters for the Newton condition.
• σ,γ ∈ (0,1) – parameters for backtracking and the Armijo condition.
You can again organize the latter parameters in an appropriate options class or structure. You can use the backslash operator A in MATLAB or numpy.linalg.solve(A,b) to solve the linear system of equations Ax = b. If the computed Newton step sk = −∇2f(xk)−1∇f(xk) is a descent direction and satisfies
−∇f(xk)>sk ≥ γ1 min{1,kskkγ2}kskk2,
we accept it as next direction dk. Otherwise, the gradient direction dk = −∇f(xk) is chosen. The
method should return the final iterate xk that satisfies the stopping criterion.
a) Test your approach on the Rosenbrock function f : R2 → R
f(x) = 100(x2 − x21)2 + (1 − x1)2
with initial point x0 = (−1,−0.5)> and parameter (σ,γ) = (0.5,10−4) and (γ1,γ2) = (10−6,0.1). (Notice that γ is smaller here). Besides the globalized Newton method also run the gradient method with backtracking ((σ,γ) = (0.5,10−4)) on this problem and compare the performance of the two approaches using the tolerance tol = 10−7.
Does the Newton method always utilize the Newton direction? Does the method always use full step sizes αk = 1?
b) Repeat the performance test described in Problem 2 b) and Problem 3 b) for problem (1) using the globalized Newton method. You can use the parameter (σ,γ) = (0.5,0.1), (γ1,γ2) = (10−6,0.1), and tol = 10−5.
Plot all of the solution paths obtained by Newton’s method for the different initial points in X0 in one figure (with a contour plot of f in the background).
Information: MATLAB and Python Code
• For those questions that ask you to write code to solve the problem, please attach the code to the homework. Please also state the optimal solution and the optimal value or show the obtained plot / figure in your solution sheet (depending on the specific question).

More products