$25
Exercise 1: Baseline predictor
Compute the baseline predictor 𝑹̂ based on the following raw data matrix 𝑹:
5
−
𝑹 = 4 3
[1
−
1
1
4
5
5
1
2
−
3
4
4
4
3
−]
(Hint: This involves a least square with sixteen equations and nine variables. Feel free to use any programming language. For example, the backslash operator or pinv() in Matlab can be helpful. If there are multiple solutions to the least squares problem, take one of those.
Show the matrix 𝑨, 𝒄 and 𝒃, 𝑹̂. Note that the possible rate is between 1 and 5. No need to submit the programming code.)
Exercise 2: Neighborhood predictor
Using the given R and the computed 𝑹̂ from the previous question, compute the neighborhood predictor 𝑹̂𝑁 with 𝐿 = 2. Compute neighbors across the columns (movies.)
(Hint: Note that the possible rate is between 1 and 5. Compute the similarity matrix, show the
Neighbors of each movie, find the 𝑹̂𝑁 ; No need to submit the programming code. )
Exercise 3: Least squares
a) Solve for b in the following least squares problem, by hand or using any programming language:
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝒃 ‖𝑨𝒃−𝒄‖22,
Where
1
1
𝑨 = [
0
2
0
1
2
1
2 2
0 and 𝒄 = [1] ]
1 1
1 3
(Hint: take the derivative of ‖𝑨𝒃−𝒄‖22 with respect to b.)
b) Solve the above least squares problem again with regularization. Vary the regularization parameter 𝜆 for 𝜆 = 0,0.2,0.4,0.6,0.8,…,4.8,5.0, and plot both ‖𝑨𝒃−𝒄‖22 and ‖𝒃‖22 against 𝜆.
(Hint: take the derivative of
‖𝑨𝒃−𝒄‖22 +𝜆‖𝒃‖22
with respect to b to obtain a system of linear equations. Note that 𝜆 is between 0 to 5.0 with an interval of 0.2)
Hint: We briefly mentioned regularization in class, which is a method we use to prevent overfitting. You should be able to do question (b) by using a similar method to differentiate
‖𝑏‖22 .)