$30
Recall: For a sample of d1- and d2-dimensional data of size N, given as two data matrices X ∈ Rd1×N, Y ∈ Rd2×N (assumed to be centered), canonical correlation analysis (CCA) finds a one-dimensional projection maximizing the cross-correlation for constant auto-correlation. The primal optimization problem is: Find wx ∈ Rd1,wy ∈ Rd2 maximizing
subject to wx>Cxxwx = 1 (1)
wy>Cyywy = 1,
where Cxx = N1 XX> ∈ Rd1×d1 and Cyy = N1 Y Y > ∈ Rd2×d2 are the auto-covariance matrices of X resp. Y , and Cxy = N1 XY > ∈ Rd1×d2 is the cross-covariance matrix of X and Y .
Exercise 1: Primal CCA (10+5 P)
We have seen in the lecture that a solution of the canonical correlation analysis can be found in some eigenvector of the generalized eigenvalue problem:
(a) Show that among all eigenvectors (wx,wy) the solution is the one associated to the highest eigenvalue.
(b) Show that if (wx,wy) is a solution, then (−wx,−wy) is also a solution of the CCA problem.
Exercise 2: Dual CCA (10+15+5+5 P)
In this exercise, we would like to derive the dual optimization problem.
(a) Show, that it is always possible to find an optimal solution in the span of the data, that is,
wx = Xαx , wy = Y αy
with some coefficient vectors αx ∈ RN and αy ∈ RN.
(b) Show that the solution of the dual optimization problem is found in an eigenvector of the generalized eigenvalue problem
where A = X>X and
(c) Show that the solution of the dual is given by the eigenvector associated to the highest eigenvalue.
(d) Show how a solution to the original problem can be obtained from the solution of the generalized eigenvalue problem of the dual.
Exercise 3: CCA and Least Square Regression (20 P)
Consider some supervised dataset with the inputs stored in a matrix X ∈ RD×N and the targets stored in a vector y ∈ RN. We assume that both our inputs and targets are centered. The least squares regression optimization problem is:
We would like to relate least square regression and CCA, specifically, their respective solutions v and (wx,wy).
(a) Show that if X and y are the two modalities of CCA (i.e. X ∈ RD×N and y ∈ R1×N), the first part of the solution of CCA (i.e. the vector wx) is equivalent to the solution v of least square regression up to a scaling factor.
Exercise 4: Programming (30 P)
Download the programming files on ISIS and follow the instructions.
Exercise sheet 2 (programming)
Machine Learning 2
Canonical Correlation Analysis
In this exercise, we consider canonical correlation analysis (CCA) on two simple problems, one in low dimensions and one in high dimensions. The goal is to implement the primal and dual versions of CCA to handle these two different cases. The first dataset consists of two trajectories in two dimensions. The dataset is extracted and plotted below. The first data points are shown in dark blue, and the last ones are shown in yellow.
For these two trajetories, that can be understood as two different modalities of the same data, we would like determine under which projections they appear maximally correlated.
Implementing Primal CCA
As stated in the lecture, the CCA problem in its primal form consists of maximizing the cross-correlation objective:
$$J(w_x,w_y) = w_x^\top C_{xy} w_y$$
subject to autcorrelation constraints $w_x^\top C_{xx} w_x = 1$ and $w_y^\top C_{yy} w_y = 1$. Using the method of Lagrange multipliers, this optimization problem can be reduced to finding the first eigenvector of the generalized eigenvalue problem:
$$ \begin{bmatrix}0 & C_{xy}\\C_{yx} & 0\end{bmatrix} \begin{bmatrix}w_x\\w_y\end{bmatrix} = \lambda \begin{bmatrix}C_{xx} & 0\\0 & C_{yy}\end{bmatrix} \begin{bmatrix}w_x\\w_y\end{bmatrix} $$
Your first task is to write a function that solves the CCA problem in the primal (i.e. that solves the generalized eigenvalue problem above). The function you need to implement receives two matrices X and Y of size N $\times$ d1 and N $\times$ d2 respectively. It returns two vectors of size d1 and d2 corresponding to the projections associated to the modalities X and Y . (Hint: Note that the data matrices X and Y have not been centered yet.)
In each modality, the arrow points in a specific direction (note that the optimal CCA directions are defined up to a sign flip of both $w_x$ and $w_y$).
Furthermore, we can verify CCA has learned a meaningful solution by projecting the data on it.
Clearly, the data is correlated in the projected space.
Implementing Dual CCA
In the second part of the exercise, we consider the case where the data is high dimensional (with d $\gg$ N ). Such high-dimensionality occurs for example, when input data are images. We consider the scenario where sources emit spatially, and two (noisy) receivers measure the spatial field at different locations. We would like to identify signal that is common to the two measured locations, e.g. a given source emitting at a given frequency. We first load the data and show one example.
Several sources can be perceived, however, there is a significant level of noise. Here again, we will use CCA to find subspaces where the two modalities are maximally correlated. In this example, because there are many more dimensions than there are data points, it is more advantageous to solve CCA in the dual. Your task is to implement a CCA dual solver that receives two data matrices of size N $\times$ d1 and N $\times$ d2 respectively as input, and returns the associate CCA directions (two vectors of respective sizes d1 and d2 ).
the inputs, it can be rendered in a similar fashion.
Here, we can clearly see a common factor that has been extracted between the two fields, specifically a point source emitting at a particular frequency. A test sequence of 100 pairs of images can now be projected on these two filters:
Clearly the two projected signals are correlated and the input noise has been strongly reduced.