Problem 1 (written) –
Imagine you have a sequence of N observations (x1,...,xN), where each xi ∈ {0,1,2,...,∞}. You model this sequence as i.i.d. from a Poisson distribution with unknown parameter λ ∈ R+, where
(a) What is the joint likelihood of the data (x1,...,xN)?
(b) Derive the maximum likelihood estimate λML for λ.
To help learn λ, you use a prior distribution. You select the distribution p(λ) = gamma(a,b).
(c) Derive the maximum a posteriori (MAP) estimate λMAP for λ?
(d) Use Bayes rule to derive the posterior distribution of λ and identify the name of this distribution.
(e) What is the mean and variance of λ under this posterior? Discuss how it relates to λML and λMAP.
Problem 2 (written) –
You have data (xi,yi) for i = 1,...,n, where x ∈ Rd and y ∈ R. You model this as yi iid∼ N(xTi w,σ2). You use the data you have to approximate w with wRR = (λI + XTX)−[1]XTy, where X and y are defined as in the lectures. Derive the results for E[wRR] and V[wRR] given in the slides.
Problem 3 (coding) –
In this problem you will analyze data using the linear regression techniques we have discussed. The goal of the problem is to predict the miles per gallon a car will get using six quantities (features) about that car. The zip file containing the data can be found on Courseworks.1 The data is broken into training and testing sets. Each row in both “X” files contain six features for a single car (plus a 1 in the 7th dimension) and the same row in the corresponding “y” file contains the miles per gallon for that car.
Part 1. Using the training data only, write code to solve the ridge regression problem
.
(a) For λ = 0,1,2,3,...,5000, solve for wRR. (Notice that when λ = 0, wRR = wLS.) In one figure, plot the 7 values in wRR as a function of df(λ). You will need to call a built in SVD function to do this as discussed in the slides. Be sure to label your 7 curves by their dimension in x.[2]
(b) Two dimensions clearly stand out over the others. Which ones are they and what information canwe get from this?
(c) For λ = 0,...,50, predict all 42 test cases. Plot the root mean squared error (RMSE)[3] on the test set as a function of λ—not as a function of df(λ). What does this figure tell you when choosing λ for this problem (and when choosing between ridge regression and least squares)?
Part 2. Modify your code to learn a pth-order polynomial regression model for p = 1,2,3. (You’ve already done p = 1 above.) For this implementation use the method discussed in the slides. Also, be sure to standardize each additional dimension of your data.
(d) In one figure, plot the test RMSE as a function of λ = 0,...,100 for p = 1,2,3. Based on this plot, which value of p should you choose and why? How does your assessment of the ideal value of λ change for this problem?
[1] See https://archive.ics.uci.edu/ml/datasets/Auto+MPG for more details on this dataset. Since I have done some preprocessing, you must use the data provided with this homework.
[2] The dimensions correspond to: 1. cylinders, 2. displacement, 3. horsepower, 4. weight, 5. acceleration, 6. year made
[3] RMSE.