Starting from:

$25

MATH5473-Homework 2 Solved

1.    Phase transition in PCA “spike” model: Consider a finite sample of n i.i.d vectors x1,x2,...,xn drawn from the p-dimensional Gaussian distribution N(0,σ2Ip×p + λ0uuT ), where λ0/σ2 is the signal-to-noise ratio (SNR) and u ∈ Rp. In class we showed that the largest eigenvalue λ of the sample covariance matrix Sn


pops outside the support of the Marcenko-Pastur distribution if
 

or equivalently, if

SNR .

  √     √             2, that is, λ0 can be “buried” well inside the support Marcenko(Notice that         γ < (1 + γ)

Pastur distribution and still the largest eigenvalue pops outside its support). All the following questions refer to the limit n → ∞ and to almost surely values:



(a)      Find λ given SNR >  γ.

(b)    Use your previous answer to explain how the SNR can be estimated from the eigenvaluesof the sample covariance matrix.

(c)     Find the squared correlation between the eigenvector v of the sample covariance matrix (corresponding to the largest eigenvalue λ) and the “true” signal component u, as a function of the SNR, p and n. That is, find |hu,vi|2.

(d)    Confirm your result using MATLAB, Python, or R simulations (e.g. set u = e; and choose σ = 1 and λ0 in different levels. Compute the largest eigenvalue and its associated eigenvector, with a comparison to the true ones.)

1

Homework 2. Random Matrix Theory and PCA                                                                                                                    2

2.    Exploring S&P500 Stock Prices: Take the Standard & Poor’s 500 data:

https://github.com/yao-lab/yao-lab.github.io/blob/master/data/snp452-data.mat which contains the data matrix X ∈ Rp×n of n = 1258 consecutive observation days and p = 452 daily closing stock prices, and the cell variable “stock” collects the names, codes, and the affiliated industrial sectors of the 452 stocks. Use Matlab, Python, or R for the following exploration.

(a)     Take the logarithmic prices Y = logX;

(b)    For each observation time t ∈ {1,...,1257}, calculate logarithmic price jumps

                                                                               ∆Yi,t = Yi,t − Yi,t−1,              i ∈ {1,...,452};

(c)     Construct the realized covariance matrix Σˆ ∈ R452×452 by,

1257
 ;

τ=1

(d)    Compute the eigenvalues (and eigenvectors) of Σ and store them in a descending orderˆ by {λˆk,k = 1,...,p}.

(e)     Horn’s Parallel Analysis: the following procedure describes a so-called Parallel Analysis of PCA using random permutations on data. Given the matrix [∆Yi,t], apply random permutations πi : {1,...,t} → {1,...,t} on each of its rows: ∆Y˜i,πi(j) such that

 ∆Y1,1 ∆Y

 2,π2(1)

[∆Y˜π(i),t] =  ∆Y3,π3(1)



 ...

∆Yn,πn(1)
∆Y1,2

∆Y2,π2(2)

∆Y3,π3(2) ...

∆Yn,πn(2)
∆Y1,3

∆Y2,π2(3)

∆Y3,π3(3) ...

∆Yn,πn(3)
...

...

... ... ...
∆Y1,t 

∆Y2,π2(t) 

∆Y3,π3(t) .

... 

∆Yn,πn(t)
Define   as the null covariance matrix. Repeat this for R times and compute the eigenvalues of Σ˜r for each 1 ≤ r ≤ R. Evaluate the p-value for each estimated eigenvalue λˆk by (Nk+1)/(R+1) where Nk is the counts that λˆk is less than the k-th largest eigenvalue of Σ˜r over 1 ≤ r ≤ R. Eigenvalues with small p-values indicate that they are less likely arising from the spectrum of a randomly permuted matrix and thus considered to be signal. Draw your own conclusion with your observations and analysis on this data. A reference is: Buja and Eyuboglu, ”Remarks on Parallel Analysis”, Multivariate Behavioral Research, 27(4): 509-540, 1992.

3.    *Finite rank perturbations of random symmetric matrices: Wigner’s semi-circle law (proved by Eugene Wigner in 1951) concerns the limiting distribution of the eigenvalues of random symmetric matrices. It states, for example, that the limiting eigenvalue distribution of n × n symmetric matrices whose entries wij on and above the diagonal (i ≤ j) are i.i.d Gaussians

 ) (and the entries below the diagonal are determined by symmetrization, i.e., wji = wij) is the semi-circle:

 ,

where the distribution is supported in the interval [−1,1].

Homework 2. Random Matrix Theory and PCA                                                                                                                    3

(a)     Confirm Wigner’s semi-circle law using MATLAB, Python, or R simulations (take, e.g.,n = 400).

(b)    Find the largest eigenvalue of a rank-1 perturbation of a Wigner matrix. That is, findthe largest eigenvalue of the matrix

W + λ0uuT ,

where W is an n × n random symmetric matrix as above, and u is some deterministic unit-norm vector. Determine the value of λ0 for which a phase transition occurs. What is the correlation between the top eigenvector of W + λ0uuT and the vector u as a function of λ0? Use techniques similar to the ones we used in class for analyzing finite rank perturbations of sample covariance matrices.

[Some Hints about homework] For Wigner Matrix ), the answer is

    

More products