$25
Task 1: Let X ∈ Rp×N, p < N be a centered matrix of p N-dimensional data samples with the full-size SVD X = UΣV>. Assume that the singular values are sorted in a descending manner, i.e. σ1,1 ≥ ... ≥ σp,p.
• Provide a normalized vector ˆs∈Rp, such that
ˆs = arg max s>ΣΣ>s.
s s.t. ksk=1
• Show that the empirical variance of the inner products of the columns of X with a normalized vector a,
N
,
i=1
is maximized when a is set to the first column of U, i.e. a = u1 (note that kak = 1).
Hint: Write a as a linear combination of the columns of U. Verify that such a representation does not affect the norm constraint.
Task 2: For this task, download the modified version of the Yale Face Database B provided on Moodle (task2_data.zip). The Yale Face Database B consists of single light source images of 10 subjects, each seen in different poses and illumination conditions. In the provided form the database is divided into 5 subsets. In subset 0 the subject is illuminated by an almost frontal light source, while for subsets 1-4 the
light source is gradually moved along the horizon. Subset 0 will serve as the training set, while subsets 1-4 are used for testing.
• Write a function that takes as an input matrix T of vectorized images from subset 0. The output of this function are the 20 first singular vectors U[:,1],...,U[:,20]. Display the first 3 vectors as images, i.e., reshape them to size 50×50 and display them.
• Write a function that takes as an input the training set T (a matrix composed of vectorized pictures from subset 0), a vector containing the labels of the training set (i.e., if the the i-th sample belongs to class j, the i-th entry of the labels vector is j. In this exercise j is an integer between 1 and 10), the test samples S (a matrix composed of vectorized pictures from subsets 1-4) and the corresponding labels (in a separate vector), the 20 singular vectors from the first step, and the parameter k that denotes how many of the PCs are used. Use the Euclidean distance to classify each sample image based on its three nearest neighbors. (This is done by comparing the test samples with the training samples in the reduced space.) As an output give the fraction of images from S that were misclassified, i.e., the error rate. Repeat this for subsets 1 through 4 and for k = 1,...,20. Plot the error rate for each subset.
• Repeat the above experiment without using the first three singular vectors, i.e., use k = 1,...,17 singular vectors starting from the 4-th. Plot the error rate as before. How do you explain the difference in recognition rate?