$24.99
1. Suppose A is an n-by-n symmetric, rank-one matrix with right singular vectors vk,k = 1,2,...,n. Show that the power method converges to v1 and determine the number of
iterations needed to converge within 1% of v1 if your initial vector b .
2. Use the starter script available for download on the course page. The script loads the 1000 three-dimensional data points in sdata.csv into a 1000-by-3 matrix X. The second section of the code displays the data using a scatter plot format. We wish to approximate the data using a subspace. Use the rotate tool to view the data cloud from different perspectives.
a) Does the data appear to lie in a low-dimensional subspace? Why or why not? Remember the definition of a subspace.
b) What could you do to the data so that it lies (approximately) in a low-dimensional subspace?
c) The third section of the code removes the mean (average) value of the 1000 data points. Use the rotate tool to inspect the scatterplot of the data with the mean removed. Does the mean-removed data appear to lie in a low-dimensional subspace?
d) The fourth section of the code finds the SVD of the mean-removed data. If a is a unit-norm vector representing the best one-dimensional subspace for the meanremoved data, complete the line of code to define a in terms of the SVD matrices. The fifth section displays the one-dimensional subspace with the mean-removed data. Note that the length of the vector representing the subspace is scaled by the root-mean-squared value of the data for display purposes. Use the rotate tool to inspect the relationship between the subspace and the data, and comment on how well a one-dimensional subspace captures the data.
e) Let xzi,i = 1,2,...,1000 be the individual mean-removed data points and a the unit-norm vector representing the best one-dimensional subspace for the data. Thus, xzi ≈ awi. Find wi in terms of the SVD matrices U, S, and V .
f) Now write the original data xi,i = 1,2,...,1000 as xi ≈ awi + b. What is b?
g) Let E be the difference between X and the rank-one approximation. Find a
1 of 2
mathematical expression for ||E||2F in terms of the singular values of the meanremoved data Xz.
h) Now try a rank-two approximation. Use the SVD to find an orthonormal basis for the best plane containing the mean-removed data. Display the mean-removed data and the bases for the plane.
i) Your rank-two approximation for the original data is xi ≈ a1w1i + a2w2i + b,i = 1,2,...,1000. Express w2i,i = 1,2,...,1000 in terms of the SVD of the meanremoved data matrix Xz. Display a scatter plot of the original data (in red) and the rank-two approximations in blue. Does the rank-two approximation lie in a plane? Does that plane capture the dominant components of the data?
j) Let E be the difference between X and the rank-two approximation. Find a
mathematical expression for ||E||2F in terms of the singular values of the meanremoved data Xz.
k) Find and compare the numerical values for ||E||2F using both the rank-1 and rank-2 approximation.
3. Consider the face emotion classification problem studied previously. Design and compare the performances of the classifiers proposed in a and b, below. In each case, divide the dataset into 8 equal sized subsets (e.g., examples 1 − 16, 17 − 32, etc). Use 6 sets of the data to estimate w for each choice of the regularization parameter, then select the best value for the regularization parameter by estimating the error on one of the two sets of data held out from training, and finally use the w corresponding to the best value of the regularization parameter to predict the labels of the remaining set of data that was held out. Compute the number of mistakes made on this hold-out set and divide that number by 16 (the size of the set) to estimate the error rate. Note there are 8 × 7 = 56 different choices of the two hold-out sets, so repeat this process 56 times and average the error rates across the 56 cases to obtain a final estimate.
a) Truncated SVD. Use the pseudo-inverse V Σ−r 1UT, where Σ−r 1 is computed by inverting the r largest singular values. Hence the regularization parameter r takes values r = 1,2,...,9.
b) Ridge Regression. Let = argminw ky − Xw , for the following values of the regularization parameter λ = 0,2−1,20,21,22,23, and 24. Show that wbλ can be computed using the SVD and use this fact in your code.
2 of 2