$20
Problem 1
Give a closed-form solution to the loss
Problem 2
function value guarantees to decrease. In practice, we may annealα > 0 α α, meaning that we start In the gradient descent algorithm, is the learning rate. If is small enough, then the from a relatively large α, but decrease it gradually. the gradient descent algorithm may not converge to the optimum of a convex function.α α Show that cannot be decreased too fast. If is decreased too fast, even if it is strictly positive,
Hint: Show a concrete loss and an annealing scheduler such that the gradient descent algorithm fails to converge to the optimum.
Another Hint: Think of the schema of our attendance bonus in this course. Why can't a student get more than five marks even if the student catches infinite errors?