$25
1. PCA experiments
(a) Set up data matrix X = (x1,...,xn) ∈ Rp×n;
(b) Compute the sample mean ˆµn and form X˜ = X − eµˆTn;
(c) Compute top k SVD of X˜ = USkV T;
(d) Plot eigenvalue curve, i.e. i vs. λi(Σˆn)/tr(Σˆn) (i = 1,...,k), with top-k eigenvalue λi for sample covariance matrix , which gives you explained variation of data by principal components;
(e) Use imshow to visualize the mean and top-k principle components as left singular vectors U = [u1,...,uk];
(f) For k = 1, order the images (xi) (i = 1,...,n) according to the top first right singular vector, v1, in an ascending order of v1(i);
(g) For k = 2, scatter plot (v1,v2) and select a grid on such a plane to show those images on the grid (e.g. Figure 14.23 in book [ESL]: Elements of Statistical Learning).
(h) * You may try the parallel analysis with permutation test to see how many significantprinciple components you will obtain.
2. MDS of cities: Go to the following website http://www.geobytes.com/citydistancetool.htm Perform the following experiment.
(a) Input a few cities (no less than 7) in your favorite, and collect the pairwise air traveling distances shown on the website in to a matrix D;
(b) Make your own codes of Multidimensional Scaling algorithm for D;
(c) Plot the normalized eigenvalues λi/(Pi λi) in a descending order of magnitudes, analyze your observations (did you see any negative eigenvalues? if yes, why?);
1
Homework 1. PCA and MDS 2
(d) Make a scatter plot of those cities using top 2 or 3 eigenvectors, and analyze yourobservations.
3. Positive Semi-definiteness: Recall that a n-by-n real symmetric matrix K is called positive semi-definite (p.s.d. or 0) iff for every x ∈ Rn, xTKx ≥ 0.
(a) Show that 0 if and only if its eigenvalues are all nonnegative.
(b) Show that dij = Kii+Kjj −2Kij is a squared distance function, i.e. there exists vectors ui,vj ∈ Rn (1 ≤ i,j ≤ n) such that dij = kui − ujk2.
(c) Let α ∈ Rn be a signed measure s.t. Pi αi = 1 (or eTα = 1) and Hα = I − eαT be the Householder centering matrix. Show that 0 for matrix D = [dij].
(d) If 0 and ), show that 0 (elementwise sum), and 0 (Hadamard product or elementwise product).
4. Distance: Suppose that d : Rp × Rp → R is a distance function.
(a) Is d2 a distance function? Prove or give a counter example.
√
(b) Is d a distance function? Prove or give a counter example.
5. ∗Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to the singular value decomposition is Chapter 2 in this book:
Matrix Computations, Golub and Van Loan, 3rd edition.
(a) Existence: Prove the existence of the singular value decomposition. That is, show that if A is an m×n real valued matrix, then A = UΣV T, where U is m×m orthogonal matrix, V is n×n orthogonal matrix, and Σ = diag(σ1,σ2,...,σp) (where p = min{m,n}) is an m×n diagonal matrix. It is customary to order the singular values in decreasing order: σ1 ≥ σ2 ≥ ... ≥ σp ≥ 0. Determine to what extent the SVD is unique. (See Theorem 2.5.2, page 70 in Golub and Van Loan).
(b) Best rank-k approximation - operator norm: Prove that the “best” rank-k approximation of a matrix in the operator norm sense is given by its SVD. That is, if A = UΣV T is the SVD of A, then Ak = UΣkV T (where Σk = diag(σ1,σ2,...,σk,0,...,0) is a diagonal matrix containing the largest k singular values) is a rank-k matrix that satisfies
kA − Akk = min kA − Bk.
rank(B)=k
(Recall that the operator norm of A is kAk = maxkxk=1 kAxk. See Theorem 2.5.3 (page 72) in Golub and Van Loan).
(c) Best rank-k approximation - Frobenius norm: Show that the SVD also provides the best rank-k approximation for the Frobenius norm, that is, Ak = UΣkV T satisfies
kA − AkkF = min kA − BkF.
rank(B)=k
Homework 1. PCA and MDS 3
(d) Schatten p-norms: A matrix norm k · k that satisfies
kQAZk = kAk,
for all Q and Z orthogonal matrices is called a unitarily invariant norm. The Schatten p-norm of a matrix A is given by the `p norm (p ≥ 1) of its vector of singular values, namely,
.
Show that the Schatten p-norm is unitarily invariant. Note that the case p = 1 is sometimes called the nuclear norm of the matrix, the case p = 2 is the Frobenius norm, and p = ∞ is the operator norm.
(e) Best rank-k approximation for unitarily invariant norms: Show that the SVD provides the best rank-k approximation for any unitarily invariant norm. See also 7.4.51 and
7.4.52 in:
Matrix Analysis, Horn and Johnson, Cambridge University Press, 1985.
(f) Closest rotation: Given a square n × n matrix A whose SVD is A = UΣV T, show that its closest (in the Frobenius norm) orthogonal matrix R (satisfying RRT = RTR = I) is given by R = UV T. That is, show that
,
where A = UΣV T. In other words, R is obtained from the SVD of A by dropping the diagonal matrix Σ. Use this observation to conclude what is the optimal rotation that aligns two sets of points p1,p2,...,pn and q1,...,qn in Rd, that is, find R that minimizes . See also (the papers are posted on course website):
• [Arun87] Arun, K. S., Huang, T. S., and Blostein, S. D., “Least-squares fitting of two 3-D point sets”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 9 (5), pp. 698–700, 1987.
• [Keller75] Keller, J. B., “Closest Unitary, Orthogonal and Hermitian Operators to a Given Operator”, Mathematics Magazine, 48 (4), pp. 192–197, 1975.
• [FanHoffman55] Fan, K. and Hoffman, A. J., “Some Metric Inequalities in the Space of Matrices”, Proceedings of the American Mathematical Society, 6 (1), pp. 111–116, 1955.