$25+
Foundations of Computer and Data Science
Problem 1: Consider the matrix
.
a) Find the eigenvalues/eigenvectors of A assuming = 0. Force your eigenvectors to have unit norm. b) Diagonalize A using the eigenvalues/eigenvectors you computed. c) Start now sending
0. What do you observe is happening to the matrices you use for diagonalization as becomes smaller and smaller? So what do you conclude when = 0?
Problem 2: Let A,B be two matrices of the same dimensions k ×m. a) With direct computation show that trace(AB|) = trace(B|A) = trace(BA|) = trace(A|B). b) Use question a) to compute E[x|Ax] where E[·] denotes expectation, A is a constant matrix and x is a random vector for which we know that E[xx|] = Q. Hint: The trace of a scalar is the scalar itself. Trace and expectation are linear operations, so we can change their order of application. c) Using the previous properties show that for any matrix A of dimensions k × k we have trace(A) = trace(UAU−1) for any nonsingular matrix U of dimensions k×k. In other words that the trace does not change if we apply a similarity transformation. d) Use question c) to prove that if matrix A of dimensions k × k is diagonalizable then its trace is equal to the sum of its eigenvalues (actually this is true even if the matrix is not diagonalizable). e) Regarding question d) how do you explain this equality given that when A is real the trace is also real whereas the eigenvalues can be complex? f) Using again question d) what can you say about the coefficient ck−1 of the characteristic polynomial λk + ck−1λk−1 + ··· + c0 of A. We recall that we already know that c0 = (−1)kλ1 ···λk = (−1)k det(A).
Problem 3: A matrix A is called nilpotent if Ar = 0 for some integer r > 1. a) Show that all eigenvalues of A must be equal to 0. b) Is such a matrix diagonalizable? c) Give an example of a 2 × 2 matrix which is nilpotent. c) If we apply a similarity transformation to a nilpotent is the resulting matrix nilpotent? Apply the previous observation to your example in c) to obtain a second example of a nilpotent matrix.
Problem 4: There is clock and at each period of the clock you obtain a new pair (yt,Xt) where yt is scalar and Xt vector. We are interested in modeling y as a cross product of X with a vector A
yn = Xn|A + en, for n = 1,2,...,t
where {en} denotes the error term between the model Xn|A and the target value yn. At each time t we have available the data (yt,Xt),(yt−1,Xt−1),...,(y1,X1) and we would like to find the best A the minimizes the sum of the squared errors, namely perform the minimization
t
minX(yn − Xn|A)2
A n=1
a) Show that if At denotes the solution, at time t, of the optimization in (1) then
(1)
t t
At = Rt−1Ut, where Rt = XXnXn|, Ut = XynXn.
(2)
n=1 n=1
b) At the next time instant t+1 when we receive the new pair (yt+1,Xt+1) we need to reapply the formula in (2) to compute the new estimate At+1. Express the computational complexity in terms of the size k of the vector Xt.
c) Show that we can compute At+1 by using the previous At and applying the following updating formulas
et+1 = yt+1 − Xt|+1At, Kt+1 = QtXt+1, γt+1 = 1 + Xt|+1Kt+1,
,
where Qt = Rt−1. What is the complexity of the algorithm in terms of k? This recursive algorithm is known as Recursive Least Squares (RLS) and is one of the most famous learning algorithms in the literature. Hint: To prove the validity of RLS you need to apply the matrix inversion lemma onto Rt+1 = Rt +Xt+1Xt|+1 in order to compute Qt+1 in terms of Qt. From (2) you must also use the fact that Ut+1 = Ut + yt+1Xt+1 and At+1 = Qt+1Ut+1.