$30
Introduction
In this assignment we will first work with a simple 2D example of the direct PCA method for ease of visualization. We will then experiment with the dual PCA method in the case of compact face description and try out some of the nice properties of such a description. More about the theory of PCA and its applications can be found in [1] (section 22.3.1, pages 507 to 515).
Exercise 1: Direct PCA method
The primary purpose of the PCA method is reduction of dimensionality. This means that we want to find such a linear transformation of the input data that will describe the data in a low dimensional orthogonal space (which can also be called a subspace) while losing the minimal amount of information needed to reconstruct the data. A subspace is defined by its basis vectors. Let us first repeat the direct calculation of the basis vectors with the PCA method. Here we assume that the input data is structured as a column vector with size m × 1:
In the algorithm 1 we can see that in first steps of the PCA method we compute the mean value and the covariance method of the data. These are the two parameters that define a Gaussian distribution. As we will later on refer to this distribution we should give it a little more attention. We have already introduced Gaussian distribution in Assignment 2, where we have used a special case of such distribution to construct a Gaussian kernel. In that case the distribution had a mean value of zero and a diagonal Algorithm 1 : The direct PCA algorithm.
1: Build a matrix X of size m × N so that we combine N input entries as a sequence: X = [x1,x2,...,xN]
2: Calculate the mean value of the data:
3: Center the data: Xd = [x1 − µ,x2 − µ,...,xN − µ]
4: Compute the covariance matrix: C
5: Compute SVD of the covariance matrix:
C = USVT The matrix U is of size m × m, its columns represent eigenvectors of the matrix C. The corresponding eigenvalues are saved in a diagonal matrix S and are sorted in a descending order. The columns in U therefore represent an orthogonal basis (subspace) while the eigenvalues give us the level of scattering of the data along the corresponding eigenvector.
6: A new input entry xq is projected in the PCA space (subspace) by first centering it using µ, and then multiply it with the matrix of eigenvectors: y = UT(xq − µ).
7: The transformation from the subspace to the original space is done using equation:
(a) (b)
Figure 1: a) displays 3D display of a 2D Gaussian distribution, b) shows the same distribution in 2D using an ellipse.
covariance matrix with equal covariance along the x and y axis. The general formula for a Gaussian distribution is
. (1)
Figure 1(a) shows an example of a Gaussian distribution where the values on the z axis display the probability of p(x,y). Image 1(b) shows an alternative illustration, where we can see a border at which the ellipse contains the 68% of all probability density. Points used to compute the mean value and the covariance used to draw a Gaussian distribution are also shown in the right image.
How to calculate eigenvectors and eigenvalues analytically?
Let us briefly repeat how to calculate eigenvectors and eigenvalues of a square matrix C with size m × m analytically. Consider a matrix Z, that we get if we multiply a unit matrix of size m × m with a constant value, e.g., Z = λI. If we subtract this matrix from C and we get a determinant 0, we have transformed C into a singular matrix:
|C − λI| ≡ 0. (2)
The values λ that satisfy the equation above are called eigenvalues of the matrix C. For an m×m matrix we can compute m eigenvalues λi. A product of eigenvalues equals the determinant of the matrix: . Each eigenvalue λi corresponds to a eigenvector xi. Eigenvectors are mutually perpendicular (the dot product of two eigenvectors is 0).
For eigenvalue λi we compute the eigenvector xi of size 1×m by setting up the following equation:
(C − λiI)xi ≡ 0. (3)
The solution of the equation gives us the elements of the eigenvector up to the level of scale. That means that the equation can be solved by an infinite number of vectors that only differ in scale. In practice we pick value 1 for the first element and then calculate other elements of the vector. In the end we change the scale of the vector so that its length becomes 1 (unit vector). Unit eigenvector for λi is defined as ei = xi/||xi||.
Solve the following assignment by hand for practice: You are given four points A(3,4), B(3,6), C(7,6) and D(6,4). Calculate the eigenvectors and eigenvalues for the given set of points.
A simple example of a direct PCA
(a) Write a script to calculate and visualize PCA from 2D data from the file points.txt (as always, the first column contains the x axis and the second column the y axis). Plot the points and draw the representation of the Gaussian distribution using drawEllipse from the supplementary material. Follow the Algorithm 1 to compute the eigenvectors and eigenvalues of the PCA subspace.
(b) The matrix U contains the eigenvectors that represent the basis of our PCA subspace. Draw the eigenvectors on the plot from the previous task. Their origin should lie at the mean of the data µ. Since both vectors have the length 1, a better way for visualizing them is to multiply each vector with the corresponding eigenvalue[1]. Draw the first eigenvector with red and the second with green.
Question: What do you notice about the relationship between the eigenvectors and the data? What happens to the eigenvectors if you change the data or add more points?
(c) The eigenvalues represent the reconstruction error we incur if we discard the corresponding eigenvector. Plot the cumulative graph of the eigenvalues and normalize it so the largest value will be 1. From the graph determine how many percent of the information will we retain if we discard the second eigenvector. To put it differently, what percent of the variance is explained just by using the first eigenvector?
(d) Now remove the direction of the lowest variance from the input data. This means we will project the data into the subspace of the first eigenvector. We can do this by transforming the data into the PCA space then setting to 0 the components corresponding to the eigenvectors we want to remove. The resulting points can then be transformed back to the original space. Project each of the input points to PCA space, then project them back to Cartesian space by multiplying them by the diminished matrix U.
Question: What happens to the reconstructed points? Where is the data projected to?
(e) For the point qpoint = [3,6]T, calculate the closest point from the input data (using Euclidean distance). Which point is the closest? Then, project all the points (including qpoint) to PCA subspace and remove the variation in the direction of the second vector. Calculate the distances again. Which point is the closest to qpoint now? Visualize the reconstruction.
Exercise 2: The dual PCA method
When analyzing image data, the dimensionality can become very large (much larger than the number of elements in the input), the direct method of computing eigenvectors we used previously is no longer suitable. E.g. if we have 10000-dimensional data, this would produce a covariance matrix of size 10000×10000. Here we are close to hitting the limits of computer memory. As the number of data samples N is lower than the dimensionality we can use the dual PCA method. The main difference between this and the direct approach is is steps 4 and 5 (compare Algorithm 1 to Algorithm 2).
(a) For our requirements it is necessary only to correctly calculate eigenvectors and eigenvalues up to the scale factor. Therefore implement the dual method according
to the Algorithm 2 and test it using the data from points.txt. Verify whether you get the same eigenvectors (matrix U) as with the Algorithm 1. The first two eigenvectors should be the same as with the Algorithm 1. The Algorithm 2 gives you a larger matrix U, however, all eigenvectors but the first two equal to zero. For the projection from and to the subspace you can leave the matrix U as it is.
(b) Project the data from the previous assignment to the PCA space using matrix U, and then project the data back again in the original space. If you have implemented the method correctly, you will get the original data (up to the numerical error).
Algorithm 2 : The dual PCA algorithm.
1: Build a matrix X of size m × N so that we combine N input entries as a sequence: X = [x1,x2,...,xN]
2: Calculate the mean value of the data:
3: Center the data: Xd = [x1 − µ,x2 − µ,...,xN − µ]
4: Compute the dual covariance matrix: C˜ .
5: Compute singular value decomposition (SVD) of the dual covariance matrix, C = USVT. Then compute the basis of the eigenvector space as: U .
Note: it is expected that only first few of the m or N (the lower one) columns of the eigenvectors will differ from 0.
6: A new input entry xq is projected in the PCA space (subspace) by first centering it using µ, and then multiply it with the matrix of eigenvectors: y = UT(xq − µ).
7: The transformation from the subspace to the original space is done using equation: xq = Uy + µ.
Exercise 3: Image decomposition examples
Here we will use the dual PCA method on a computer vision problem. In the supplementary material there are three series of images. Each series contains 64 images of a face under different lighting conditions. Your task will be to analyze each series using the PCA method.
(a) Data preparation: Firstly, we have to formulate the problem in a way that is compatible with the PCA method. Since PCA operates on points in space, we can represent a grayscale image of size m × n as a point in mn-dimensional space if we reshape it into a vector. Write a function that reads all the images from one of the series transforms them into grayscale reshapes them using np.reshape and stacks the resulting column vectors into a matrix of size 64 × mn.
(b) Using dual PCA: Use dual PCA on the vectors of images. Write a function that takes the matrix of image vectors as input and returns the eigenvectors of the PCA subspace and the mean of the input data.
Note: In step 5 of Algorithm 2 be careful when computing the inverse of the S as some of the eigenvalues can be very close to 0. Division by zero can cause numerical errors when computing a matrix inverse. You have to take into account that the matrix S is a diagonal matrix and must therefore have non-zero diagonal elements. One way of solving this numerical problem is that we add a very small constant value to the diagonal elements, e.g. 10−15.
Transform the first five eigenvectors using the np.reshape function back into a matrix and display them as images. What do the resulting images represent (both numerically and in the context of faces)?
Project the first image from the series to the PCA space and then back again. Is the result the same? What do you notice when you change one dimension of the vector in the image space (e.g. component with index 4074) to 0 and display the image? Repeat a similar procedure for a vector in the PCA space (project image in the PCA space, change one of the first five components to zero and project the image back in the image space and display it as an image). What is the difference? How many pixels are changed by the first operation and how many by the second[2]?
(c) Effect of the number of components on the reconstruction: Take a random image and project it into the PCA space. Then change the vector in the PCA space by retaining only the first 32 first components and setting the remaining components to 0. Project the resulting vector back to the image space and display it as an image. Repeat the procedure for the first 16, 8, 4, 2, and one eigenvector. Display the resulting vectors together on one figure. What do you notice?
(d) F Informativeness of each component: Each eigenvector holds information that defines some aspect of the PCA space. By changing the values of
a vector in a periodic way, we can observe how different weights for an eigenvector affect the reconstructed image.
Use the second series of images for this task. Take the average photo that you compute based on all images in the series[3]. Project the average image to PCA space. Then, select one of the more important eigenvectors and manually set its corresponding weight in the projected vector to some value of your choice. Project the modified vector back to image space and observe the change.
In order to see the changes easier, write a script that goes over a range of values for your selected eigenvector. To smoothly change the values, use np.sin and np.linspace as well as some scaling factor x to show the differences more strongly. Also, use plt.draw in combination with plt.pause to display the results as an animated sequence of images.
Modify the script to change two parameters at the same time, in a circular way (i.e. using both np.sin and np.cos). Experiment with different eigenvector pairs and report your observations.
(e) F Reconstruction of a foreign image: The PCA space is build upon an array of data. In this task we will check the effect that the transformation to PCA space has on an image that is not similar to the images that were used to build the PCA space. Write a script that computes the PCA space for the first face dataset. Then load the image elephant.jpg, reshape it into a vector and transform it into the precalculated PCA space, then transform it back to image space. Reshape the resulting vector back to an image and display both the original and the reconstructed image. Comment on the results.
(f) F Recognition with a subspace: PCA subspaces can be used in a simple object recognition scenario. Generally, if an image is similar to the images used to build the PCA subspace, the reconstruction error incurred when projecting to said PCA space should be small. In this task, you will try to implement a simple facial recognition system based on this property.
Use a camera to capture images (10 at the very least) of your face with varying illumination and facial expressions. Resize them to a common resolution and try to align them by eye location. Construct a PCA space from these images. Then, write a script that captures images from your webcam, detects faces in them (you can use something from OpenCV like Haar cascades) and extracts the regions that contain faces. Reshape these regions to correct size, transform them to the precalculated PCA space and back to image space. Then, based on some similarity measure (can be simple L2 norm) and some threshold, decide whether a region contains your face or not. Display the information on the image stream.
(g) F Linear discriminant analysis: Based on the instructions from the lecture slides (Subspace-based recognition) implement the LDA algorithm and apply it to our face database. Make sure that you also address the problem of singularities in high-dimensional data using appropriate pre-processing (also described in the lecture slides). Visualize all the image samples in the 2D PCA subspace and the LDA subspace with different colors for each class to demonstrate that the LDA subspace better separates classes.
References
[1] D. A. Forsyth and J. Ponce. Computer Vision: A Modern Approach. Prentice Hall, 2002.
[1] Eigenvalue tells us the variance of the data along its corresponding eigenvector. If you multiply the eigenvector with the square root of the eigenvalue you will visualize the standard deviation along the eigenvector.
[2] visualize the difference by subtracting original image from the reprojected one .
[3] The value of each pixel equals to the average value of the corresponding pixels in all images. In case of image vectors we are dealing simply with their average value.