Starting from:

$29.99

COMP5630 Assignment #5- Linear Regression and Regularization Solution


Submission Instructions
Tasks
1 Linear Regression [30 pts]
Suppose that, y = w0 + w1x1 + w2x2 + ϵ, where ϵ ∼ N(0,σ2)
a) [10 pts] Write down an expression for P(y|x1,x2).
b) [10 pts] Assume you are given a set of training observations for i =
1,...,n. Write down the conditional log-likelihood of this training data. Drop any constants that do not depend on the parameters w0, w1, or w2.
c) [10 pts] Based on your answer, show that finding the MLE of that conditional loglikelihood is equivalent to minimizing least squares,

2 Regularization [30 pts]
a) [15 pts] Find the partial derivative of the regularized least squares problem:

with respect to w0, w1, and w2. Although there is a closed-form solution to this problem, there are situations in practice where we solve this via gradient descent. Write down the gradient descent update rules for w0, w1, and w2.
b) [15 pts] Suppose that w1,w2 ∼ N(0,τ2). Prove that, the MAP estimate of w0, w1, and w2 with this prior is equivalent to minimizing the above regularized least squares problem with .
3 Programming: Implement Linear Regression [40 pts]
In this assignment, you will implement Linear Regression for a small dataset. Note that you will implement Linear Regression from scratch, and no external libraries like Tensorflow, Sklearn, or Pytorch can be used. The data set (“numpydataset.csv”) has been provided on canvas as part of this assignment. We have also provided a template code (“Linear Regression.py”) to simplify the input, output, and plot functions.
You will complete the following tasks for this programming assignment:
a) [10 pts] Implement a method MSE() that will compute the loss incurred at each epoch. We aim to minimize the distance between the data and the line after each epoch. An epoch is a single iteration over a dataset of features. The higher the cost (or Error), the more the parameter values need to be changed in order to bring it down. The Error function is represented as , where yi is the actual value, ˆy is the predicted value, n is the total number of data points or samples. Invoke the function to calculate the initial loss when m = 0 and b = 0. We are using m and w interchangeably because they both refer to the weight or the slope.
b) [20 pts] Your aim in Linear Regression is to minimize the MSE by using the GRADIENT DESCENT Algorithm. Extensively used in Machine Learning, this process involves finding the local minimum of a function. The sole idea behind this algorithm is, given a certain learning rate γ, to take iterative steps (slowly) in the direction of −∇E (negative gradient) in order to minimize the error rate. Each iteration of gradient descent updates the w and the b (collectively denoted by θ) according to

where θt denotes the weights and the bias at iteration t.
c) [10 pts] Using the template code, compute the MSE after 100 iterations for learning rate 0.001 and 0.01 separately. The code will also create two plots (corresponding to different learning rates) for the fitted line along with the data points for you. Which learning rate do you think is working better? Why? Explain your answer with relevant data and observations.

More products