Starting from:

$30

MTH9899-Homework 1 Solved

Problem 1 We reviewed the closed form solution for Linear Regression/OLS in class and showed that XTe = 0. Explain why the following properties can be derived from this:

i     Observed values of your features, X are uncorrelated with the residuals e

ii   Sum of residuals is zero i.e. Pei = 0

(assume that the regression includes a constant term i.e. an intercept) iii Sample mean of the residuals is zero iv y¯ = ¯xβˆ v Predicted values ˆy are uncorrelated with the residuals e

Problem 2 .

i     Ignoring more sophisticated algorithms, like the Strassen algorithm, multiplying an a × b matrix by a b × c matrix takes O(abc). Please work out the time complexity of computing a naive K-Fold Cross Validation Ridge Regression on an N × F input matrix.

ii   We can be more efficient. We don’t have to compute (XXT)−1 completely each time. In particular, if you break up X into K chunks, there is a faster way.

 

•    Define X−i as X with the ith fold omitted. Given these hints, write a description of how you can efficiently compute  for all K folds.

iii Assume you have a sample of data with N samples and K features i.e. X is of size NxK and target variable Y , a vector of size N. You wish to estimate a Ridge regression model with λ = λˆ to find βˆ, a vector of size K s.t. Y = Xβˆ but you are only allowed to use the closed form solution to OLS i.e. βˆ = (XTX)−1XTy. You are not allowed to modify the existing data in X but you are allowed to append to it. Propose what data you would append to X and Y so that the OLS solution with the modified X and Y is equivalent to the Ridge solution. Justify why this works.

Problem 3 We discussed the tradeoff between bias and variance. In this question, we’ll run some simulations to see how λ can affect the variance of β. Attached, you will find python code to generate a number of different datasets for testing, use the data generating code exactly as provided. Write Python methods to implement the closed-form solutions for Linear Regression and Ridge Regression, as discussed in class, for use below.

i       For datasets 1-2 in the code, generate each dataset 1000 times (by calling e.g. get dataset(1) or get dataset(2) each time). For each of these 1000 times, perform simple OLS regression and record the β values. Plot a histogram of the β values and report the µβ and σβ2. Note that the true betas for data sets 1 and 2 are identical. Explain the differences between the histograms.

ii     Repeat what you did in (i) above for dataset 3. How does the distribution of the betas change and how do you explain this?

iii   Repeat the above trials with Ridge Regression instead, using reasonable λ values. Prepare a graph of how µβ and σβ2 change as a function of λ for each of the datasets - you do NOT need to include histograms of all of your distributions. Explain the outputs.

iv    For the trials from (iii), calculate and report the effective degrees of freedom (for each dataset and lambda value), to make sure that the λ values you are using are reasonable. You should see effective DOFs from 2 down to less than 1 (See ESLII, equation 3.50, for effective DOF calculation, or this link for more details ).

More products