$30
1. We define a new kind of discriminant analysis for a classification problem with a binary response. The classes have prior probabilities π0 and π1. Given the class, k, the conditional probability of the inputs X1,...,Xp is multivariate normal with a class-dependent mean µk and covariance matrix σkΣ. The matrix Σ is common to both classes and σk is a classdependent constant. All parameters, πk, µk, σk, for each class, as well as Σ, are set to their Maximum Likelihood estimates.
(a) Provide an equation describing the classifier’s decision boundary or discriminant. What would the boundary look like?
(b) Why might this classifier be preferable to Linear Discriminant Analysis?
(c) Why might this classifier be preferable to Quadratic Discriminant Analysis?
2. Compare leave-one-out cross validation to 10-fold cross validation, with reference to the bias-variance tradeoff.
3. [10 points] A total of n samples were simulated from the following distribution
X1,X2,X3,X4 ∼ N(0,1) i.i.d.
where f is non-linear. Consider the following regression methods for Y : linear regression with predictors X1, X2, X3, and X4, and 3-nearest neighbors regression. On the same plot, sketch a plausible learning curve for each method. A learning curve for regression shows the average test MSE as a function of n. Explain your reasoning.
4. [10 points] The clusterings below were produced by single-linkage hierarchical clustering and k-means clustering. Determine which one is which and explain your reasoning.
5. We apply the Bootstrap to a dataset with n distinct observations x1,...,xn.
(a) [10 points] What is the probability that the jth observation is included in a specific bootstrap sample?
(b) [10 points] What is the expected value for the fraction of distinct observations in a specific bootstrap sample (i.e. the number of distinct observations from x1,...,xn divided by n). Does this expectation converge as n grows large? Hints: (i) use the probability from part (a), (ii) limn→∞(1 − 1/n)n = e−1.
6. [10 points] A group of 33 people were asked to report their happiness on a scale from 0 to 20. We apply a linear model with an intercept to regress happiness onto 2 predictors, the yearly income and the amount of money paid in taxes last year.
The t-statistics for income and taxes have corresponding p-values of 0.14 and 0.52, respectively. The RSS of the model is 30 and the sample variance of the happiness is 1.5.
What would you conclude about the relationship between happiness, income, and tax contributions?
7. (a) [10 points] Identify which classifier among k-nearest neighbors with k = 15 and logistic regression would be more appropriate for each dataset below. Explain how one might adjust the True Positive rate of each method.
Note: Red circles are negative and blue triangles are positive.
(b) [10 points] Each of the ROC curves below corresponds to one of the datasets in part (a). In each case, we applied the optimal classifier among k-nearest neighbors and logistic regression. Match each ROC curve to its corresponding dataset and explain your reasoning.
Cheat sheet
The sample variance of x1,...,xn is:
,
The residual sum of squares for a regression model is:
t-test:
The t-statistic for hypothesis H0 : βi = 0 is
βˆi
t =
SE(βˆi) F-test:
The F-statistic for hypothesis H0 : βp−q+1 = βp−q+2 = ··· = βp = 0 is
,
where RSS0 is the residual sum of squares for the null model H0, and RSS is the residual sum of squares for the full model with all predictors. Asymptotically, the F-statistic has the F-distribution with degrees of freedom d1 = q and d2 = n − p − 1.
Minimum F-statistic to reject H0 at a significance level α = 0.01
d1
1
2 3
4
d2
1 10
20
4052.181
10.044
8.096
4999.500 5403.352
7.559 6.552
5.849 4.938
5624.583
5.994
4.431
30
7.562
5.390 4.510
4.018
120
6.851
4.787 3.949
3.480
Logistic regression:
Logistic regression assigns to positive if the estimated conditional probability
LDA:
The log-posterior of class k given an input x is:
where C is a constant which does not depend on k.
QDA:
The log-posterior of class k given an input x in QDA is:
where C is a constant which does not depend on k.