Starting from:

$30

STA414-2104-Cross Validation Ridge Regression Solved

2.          Regression - . In this question, you will derive certain properties of linear regression.

2.1. Linear regression - . Suppose that 2 Rn⇥m with n m and t 2 Rn, and that t|( ,w) ⇠ N( w, 2I). We know that the maximum likelihood estimate wˆ of w is given by

wˆ = ( T ) 1 >t.

(a)     Write the log-likelihood implied by the model above, and compute its gradient w.r.t. w. By setting it equal to 0, derive the above estimator wˆ.

(b)    Find the distribution of wˆ, its expectation and covariance matrix.

3. Cross validation - . In this problem, you will write a function that performs K-fold cross validation (CV) procedure to tune the penalty parameter in Ridge regression. CV procedure is one of the most commonly used methods for tuning hyperparameters. In this question, you shouldn’t use the package scikit-learn to perform CV. You should implement all of the below functions yourself. You may use numpy and scipy for basic math operations such as linear algebra, sampling etc.

In class we learned training, test, and validation procedures which assumes that you have enough data and you can set aside a validation set and a test set to use it for assessing the performance of your machine learning algorithm. However in practice, this may be problematic since we may not have enough data. A remedy to this issue is K-fold cross- validation which uses a part of the available data to fit the model, and a di↵erent part to test it. K-fold CV procedure splits the data into K equal-sized parts; for example, when K = 5, the scenario looks like this:

 

Fig 1: credit: Elements of Statistical Learning

1.    We first set aside a test dataset and never use it until the training and parameter tuning procedures are complete. We will use this data for final evaluation. In this question, test data is provided to you as a separate dataset.

2.    CV error estimates the test error of a particular hyperparameter choice. For a particular hyperparameter value, we split the training data into K blocks (See the figure), and for k = 1,2,...,K we use the k-th block for validation and the remaining K 1 blocks are for training. Therefore, we train and validate our algorithm K times. Our CV estimate for the test error for that particular hyperparameter choice is given by the average validation error across these K blocks.

3.    We repeat the above procedure for several hyperparameter choices and choose the one that provides us with the smalles CV error (which is an estimate for the test error).

Below, we will code the above procedure for tuning the regularization parameter in linear regression which is a hyperparameter. Your cross_validation function will rely on 6 short functions which are defined below along with their variables.

•    data is a variable and refers to a (t, ) pair (can be test, training, or validation) where t is the target (response) vector, and is the feature matrix.

•    model is a variable and refers to the coe              cients of the trained model, i.e. wˆ .

•    data_shf = shuffle_data(data) is a function and takes data as an argument and returns its randomly permuted version along the samples. Here, we are considering a uniformly random permutation of the training data. Note that t and need to be permuted the same way preserving the target-feature pairs.

•    data_fold, data_rest = split_data(data, num_folds, fold) is a function that takes data, number of partitions as num_folds and the selected partition fold as its arguments and returns the selected partition (block) fold as data_fold, and the remaining data as data_rest. If we consider 5-fold cross validation, num_folds=5, and your function splits the data into 5 blocks and returns the block fold (2 {1,2,3,4,5}) as the validation fold and the remaining 4 blocks as data_rest. Note that data_rest[data_fold = data, and data_rest\data_fold = ;.

•    model = train_model(data, lambd) is a function that takes data and lambd as its arguments, and returns the coe cients of ridge regression with penalty level . For simplicity, you may ignore the intercept and use the expression in equation (2.1).

•    predictions = predict(data, model) is a function that takes data and model as its arguments, and returns the predictions based on data and model.

•    error = loss(data, model) is a function which takes data and model as its arguments and returns the average squared error loss based on model. This means if data is composed of t 2Rn and 2Rn⇥p, and model is wˆ, then the return value is kt wˆk2/n.

•    cv_error = cross_validation(data, num_folds, lambd_seq) is a function that takes the training data, number of folds num_folds, and a sequence of ’s as lambd_seq as its arguments and returns the cross validation error across all ’s. Take lambd_seq as evenly spaced 50 numbers over the interval (0.02, 1.5). This means cv_error will be a vector of 50 errors corresponding to the values of lambd_seq. Your function will look like:

data = shuffle_data(data) for i = 1,2,...,length(lambd_seq) lambd = lambd_seq(i) cv_loss_lmd = 0.

for fold = 1,2, ...,num_folds val_cv, train_cv = split_data(data, num_folds, fold) model = train_model(train_cv, lambd) cv_loss_lmd += loss(val_cv, model)

cv_error(i) = cv_loss_lmd / num_folds

return cv_error

Download the dataset from the course webpage hw1_data.zip and place and extract in your working directory, or note its location file_path. For example, file path could be /Users/yourname/Desktop/

• In Python:

import numpy as np

data_train = {’X’: np.genfromtxt(’data_train_X.csv’, delimiter=’,’),

’t’: np.genfromtxt(’data_train_y.csv’, delimiter=’,’)}

data_test = {’X’: np.genfromtxt(’data_test_X.csv’, delimiter=’,’),

’t’: np.genfromtxt(’data_test_y.csv’, delimiter=’,’)}

Here, the design matrix is loaded as data_??[’X’], and target vector t is loaded as data_??[’t’], where ?? is either train or test.

(a)    Write the above 6 functions, and identify the correct order and arguments to do cross validation.

(b)    Find the training and test errors corresponding to each in lambd_seq. This part does not use the cross_validation function but you may find the other functions helpful.

(c)    Plot training error, test error, and 5-fold and 10-fold cross validation errors on the same plot for each value in lambd_seq. What is the value of proposed by your cross validation procedure? Comment on the shapes of the error curves.

More products