Starting from:

$30

CSC411-Homework 7 Representer Theorem and Compositional Kernels Solved

1.     Representer Theorem. In this question, you’ll prove and apply a simplified version of the Representer Theorem, which is the basis for a lot of kernelized algorithms. Consider a linear model:

z = wψ(x) y = g(z),

where ψ is a feature map and g is some function (e.g. identity, logistic, etc.). We are given a training set . We are interested in minimizing the expected loss plus an L2 regularization term:

,

where L is some loss function. Let Ψ denote the feature matrix

 .

Observe that this formulation captures a lot of the models we’ve covered in this course, including linear regression, logistic regression, and SVMs.

(a)    Show that the optimal weights must lie in the row space of Ψ.

Hint: Given a subspace S, a vector v can be decomposed as v = vS +v⊥, where vS is the projection of v onto S, and v⊥ is orthogonal to S. (You may assume this fact without proof, but you can review it here[1].) Apply this decomposition to w and see if you can show something about one of the two components.

1

CSC411 Fall 2018                                                                                                                  Homework 7



(b)   Another way of stating the result from part (a) is that w = Ψα for some vector α. Hence, instead of solving for w, we can solve for α. Consider the vectorized form of the L2 regularized linear regression cost function:

Ψw .

Substitute in w = Ψα, to write the cost function as a function of α. Determine the optimal value of α. Your answer should be an expression involving λ, t, and the Gram matrix K = ΨΨ. For simplicity, you may assume that K is positive definite. (The algorithm still works if K is merely PSD, it’s just a bit more work to derive.)

Hint: the cost function J(α) is a quadratic function. Simplify the formula into the following form:



for some positive definite matrix A, vector b and constant c (which can be ignored). You may assume without proof that the minimum of such a quadratic function is given by α = −A−1b.

2.    Compositional Kernels. One of the most useful facts about kernels is that they can be composed using addition and multiplication. I.e., the sum of two kernels is a kernel, and the product of two kernels is a kernel. We’ll show this in the case of kernels which represent dot products between finite feature vectors.

(a)    Suppose k1(x,x0) = ψ1(x)ψ1(x0) and k2(x,x0) = ψ2(x)ψ2(x0). Let kS be the sum kernel kS(x,x0) = k1(x,x0)+k2(x,x0). Find a feature map ψS such that kS(x,x0) = ψS(x)ψS(x0).

(b)   Suppose k1(x,x0) = ψ1(x)ψ1(x0) and k2(x,x0) = ψ2(x)ψ2(x0). Let kP be the product kernel kP(x,x0) = k1(x,x0)k2(x,x0). Find a feature map ψP such that kP(x,x0) = ψP(x)ψP(x0).


More products