Starting from:

$25

CS760-Assignment 5 Linear Regression Solved

The Wisconsin State Climatology Office keeps a record on the number of days Lake Mendota was covered by ice at http://www.aos.wisc.edu/∼sco/lakes/Mendota-ice.html. Same for Lake Monona: http://www.aos.wisc.edu/ ∼sco/lakes/Monona-ice.html. As with any real problems, the data is not as clean or as organized as one would like for machine learning. Curate two clean data sets for each lake, respectively, starting from 1855-56 and ending in 2018-19. Let x be the year: for 1855-56, x = 1855; for 2017-18, x = 2017; and so on. Let y be the ice days in that year: for Mendota and 1855-56, y = 118; for 2017-18, y = 94; and so on. Some years have multiple freeze thaw cycles such as 2001-02, that one should be x = 2001,y = 21.

1.      Plot year vs. ice days for the two lakes as two curves in the same plot. Produce another plot for year vs. yMonona − yMendota.

2.      Split the datasets: x ≤ 1970 as training, and x 1970 as test. (Comment: due to the temporal nature this is NOT an iid split. But we will work with it.) On the training set, compute the sample mean and the sample standard deviation for the two lakes, respectively.

3.      Using training sets, train a linear regression model

yˆMendota = β0 + β1x + β2yMonona

to predict yMendota. Note: we are treating yMonona as an observed feature. Do this by finding the closedform MLE solution for β = (β0,β1,β2) (no regularization):

.

Give the MLE formula in matrix form (define your matrices), then give the MLE value of β0,β1,β2.

4.      Using the MLE above, give the (1) mean squared error and (2) R2 values on the Mendota test set. (You will need to use the Monona test data as observed features.)

5.      “Reset” to Q3, but this time use gradient descent to learn the β’s. Recall our objective function is the mean squared error on the training set:

.

Derive the gradient.

6.      Implement gradient descent. Initialize β0 = β1 = β2 = 0. Use a fixed stepsize parameter η = 0.1 and print the first 10 iteration’s objective function value. Tell us if further iterations make your gradient descent converge, and if yes when; compare the β’s to the closed-form solution. Try other η values and tell us what happens. Hint: Update β0,β1,β2 simultaneously in an iteration. Don’t use a new β0 to calculate β1, and so on.

Homework 5                                                                                                             CS 760 Machine Learning



7.      As preprocessing, normalize your year and Monona features (but not yMendota). Then repeat Q6.

8.      “Reset” to Q3 (no normalization, use closed-form solution), but train a linear regression model without using Monona:

yˆMendota = γ0 + γ1x.

(a)    Interpret the sign of γ1.

(b)   Some analysts claim that because β1 the closed-form solution in Q3 is positive, fixing all other factors, as the years go by the number of Mendota ice days will increase, namely the model in Q3 indicates a cooling trend. Discuss this viewpoint, relate it to question 8(a).

9.      Of course, Weka has linear regression. Reset to Q3. Save the training data in .arff format for Weka. Use classifiers / functions / LinearRegression. Choose “Use training set.” Bring up Linear Regression options, set “ridge” to 0 so it does not regularize. Run it and tell us the model: it is in the output in the form of “β1 * year + β2 * Monona + β0.”

10.   Ridge regression.

(a)    Then set ridge to 1 and tell us the resulting Weka model.

(b)   Meanwhile, derive the closed-form solution in matrix form for the ridge regression problem:



where

kβk2A := βAβ

and

                                                                                                 0    0    0

                                                                                          A = 0     1     0.

                                                                                                    0    0    1

This A matrix has the effect of NOT regularizing the bias β0, which is standard practice in ridge regression. Note: Derive the closed-form solution, do not blindly copy lecture notes.

(c)    Let λ = 1 and tell us the value of β from your ridge regression model.

Extra Credit: Multinomial Na¨ıve Bayes [10 pts]
Consider the Multinomial Na¨ıve Bayes model. For each point (x,y), y ∈ {0,1}, x = (x1,x2,...,xM) where each xj is an integer from {1,2,...,K} for 1 ≤ j ≤ M. Here K and M are two fixed integer. Suppose we have N data points {(x(i),y(i)) : 1 ≤ i ≤ N}, generated as follows.

for i ∈ {1,...,N}: y(i) ∼ Bernoulli(φ) for j ∈ {1,...,M}:

Multinomial(θy(i),1)

Here φ ∈ R and θk ∈ RK(k ∈ {0,1} are parameters. Note that Pl θk,l = 1 since they are the parameters of a multinomial distribution.

Derive the formula for estimating the parameters φ and θk, as we have done in the lecture for the Bernoulli Na¨ıve Bayes model. Show the steps.

Extra Credit: Logistic Regression [10 pts]
(1)  Suppose for each class i ∈ {1,...,K}, the class-conditional density p(x|y = i) is normal with mean µi ∈ Rd and the same covariance Σ ∈ Rd×d:

p(x|y = i) = N(x|µi,Σ).

Compute p(y = i|x). Can it be represented as a softmax over a linear transformation of x? Show the calculation steps.




(2)  Suppose p(x|y = i) has different covariances Σi ∈ Rd×d:

p(x|y = i) = N(x|µi,Σi).

Again, compute p(y = i|x). Can it be represented as a softmax over a linear transformation of x? Show the calculation steps.

More products