$25
Task 1: The function linear_pca expects a data matrix X ∈ Rp×N and a number of PCs k and returns the first k PCA scores for the matrix X.
• Provide code that tests the function with selected images from the provided MNIST training dataset by visualizing the first 2 scores in a scatter plot.
• Complete the function gram_pca such that it has the same functionality as linear_pca but expects a gram matrix K = X>X instead of the data matrix X as its input. Do not assume that K was produced from centered data. Note: It is important to be consistent in notation here. E.g., for a data matrix of 1000 MNIST images, we have X ∈ R784×1000 and K ∈ R1000×1000.
• Test your implementation and show that gram_pca(dot(X.T,X), k) yields results equivalent to those of linear_pca(X, k).
• There is as an unknown vector space H, equipped with an inner product h·,·iH and a function
ϕ : Rp → H,
such that
holds for every x,y ∈ Rp. The expression on the right-hand side of the equation is called the Gaussian kernel and σ is a parameter to choose by hand.
The function gaussian_kernel_pca expects a data matrix X, a reduced dimension number k and a parameter σ. It returns the first k Kernel PCA scores of the data. In other words, the function returns the first k PCA scores of
ϕ(x1),ϕ(x2),...,ϕ(xN),
where xi denotes the i-th data sample/i-th column of the data matrix. The function gaussian_kernel_pca is already written, but for it to work, the function compute_gaussian_gram_matrix must return correct results. Complete compute_gaussian_gram_matrix accordingly.
• Test gaussian_kernel_pca with some MNIST train images and σ = 1000.