Starting from:

$25

MATH5473-Homework 3 Solved

1. Maximum Likelihood Method: consider n random samples from a multivariate normal distribution, Xi ∈ Rp ∼ N(µ,Σ) with i = 1,...,n. (a) Show the log-likelihood function

 trace(Σ 

where , and some constant C does not depend on µ and

Σ;

(b)    Show that f(X) = trace(AX−1) with      0 has a first-order approximation,

f(X + ∆) ≈ f(X) − trace(X−1A0X−1∆)

hence formally df(X)/dX = −X−1AX−1 (note (I + X)−1 ≈ I − X);

(c)    Show that g(X) = logdet(X) with        0 has a first-order approximation,

g(X + ∆) ≈ g(X) + trace(X−1∆)

hence dg(X)/dX = X−1 (note: consider eigenvalues of X−1/2∆X−1/2);

(d)    Use these formal derivatives with respect to positive semi-definite matrix variables toshow that the maximum likelihood estimator of Σ is

 .

A reference for (b) and (c) can be found in Convex Optimization, by Boyd and Vandenbergh, examples in Appendix A.4.1 and A.4.3:

1

Homework 3. MLE and James-Stein Estimator                                                                                                                    2

(a)    Consider the Ridge regression

 .

Show that the solution is given by

 .

Compute the risk (mean square error) of this estimator. The risk of MLE is given when

C = I.

(b)    Consider the LASSO problem,

 .

Show that the solution is given by Soft-Thresholding

                                                                        µˆsofti            = µsoft(yi;λ) := sign(yi)(|yi| − λ)+.

√ 

                        For the choice λ =         2logp, show that the risk is bounded by

p

Ekµˆsoft(y) − µk2 ≤ 1 + (2logp + 1)Xmin(µ2i,1).

i=1

Under what conditions on µ, such a risk is smaller than that of MLE? Note: see Gaussian Estimation by Iain Johnstone, Lemma 2.9 and the reasoning before it.

(c)    Consider the l0 regularization

 ,

                      where          = 0). Show that the solution is given by Hard-Thresholding

                                                                             µˆhardi          = µhard(yi;λ) := yiI(|yi| > λ).

Rewriting ˆµhard(y) = (1 − g(y))y, is g(y) weakly differentiable? Why?

(d)    Consider the James-Stein Estimator

 

Show that the risk is

EkµˆJS(y) − µk2 = EUα(y)

where Uα(y) = p−(2α(p−2)−α2)/kyk2. Find the optimal α∗ = argminα Uα(y). Show that for p > 2, the risk of James-Stein Estimator is smaller than that of MLE for all µ ∈ Rp.

Homework 3. MLE and James-Stein Estimator                                                                                                                    3

(e)    In general, an odd monotone unbounded function Θ : R → R defined by Θλ(t) with parameter λ ≥ 0 is called shrinkage rule, if it satisfies

[shrinkage] 0 ≤ Θλ(|t|) ≤ |t|;

[odd] Θλ(−t) = −Θλ(t);

[monotone] Θλ(t) ≤ Θλ(t0) for t ≤ t0;

[unbounded] limt→∞ Θλ(t) = ∞.

Which rules above are shrinkage rules?

3.    Necessary Condition for Admissibility of Linear Estimators. Consider linear estimator for y ∼ N(µ,σ2Ip) µˆC(y) = Cy.

Show that ˆµC is admissible only if

(a)    C is symmetric;

(b)    0 ≤ ρi(C) ≤ 1 (where ρi(C) are eigenvalues of C); (c) ρi(C) = 1 for at most two i.

These conditions are satisfied for MLE estimator when p = 1 and p = 2.


4.    *James Stein Estimator for p = 1,2 and upper bound: If we use SURE to calculate the risk of James Stein Estimator,

 

it seems that for p = 1 James Stein Estimator should still have lower risk than MLE for any µ. Can you find what will happen for p = 1 and p = 2 cases?

Moreover, can you derive the upper bound for the risk of James-Stein Estimator?

 .

More products