Starting from:

$25

STAT542 - HW1 - Solved

Coding Assignment 
 


This assignment is related to the simulation study described in Section 2.3.1 (the so-called Scenario 2) of “Elements of Statistical Learning” (ESL).

Scenario 2: the two-dimensional data X ∈ R2 in each class is generated from a mixture of 10 different bivariate Gaussian distributions with uncorrelated components and different means, i.e.,

 ,

where k = 0,1, l = 1 : 10, P(Y = k) = 1/2, and P(Z = 1) = 1/10. In other words, given Y = k, X follows a mixture distribution with density function

 .

You can choose your own values for s and the twenty 2-dim vectors mkl, or you can generate them from some distribution.

I have also discussed this example in class; please check notes from this week on “kNN vs. LinearRegression” and the related Rcode.

Following the data generating process, generate a training sample of size 200 and a test sample of size 10,000.

Evaluate the performance (the averaged 0/1 error[1] ) for the following three procedures:

•    Linear regression with cut-off value 0.5,

•    kNN classification with k = 1,3,5,7,11,21,31,45,69,101,151, and

•    the Bayes rule (assume your know the values of mkl’s and s).

Summarize your result graphically. Design your graph so that it shows the test and training errors for linear regression and kNN, and test error for Bayes classifier. Check Figure 2.4 of ESL and figures from the notes.

Write R/Python code to simulate the data, compute the errors, and produce a PDF file of your graph.

(Continue on the next page −→)


More products