$25
MIDTERM
Advanced Machine Learning DATA 442/642
Exercise 1
Show that the `1 norm is a convex function (as all norms), yet it is not strictly convex. In contrast, show that the squared Euclidean norm is a strictly convex function.
Exercise 2
Let the observations resulting from an experiment be xn, n = 1,2,...,N. Assume that they are independent and that they originate from a Gaussian PDF with mean µ and standard deviation σ2. Both, the mean and the variance, are unknown. Prove that the maximum likelihood (ML) estimates of these quantities are given by
Exercise 3
For the regression model where the noise vector η = [η1,...,ηN]> comprises samples from zero mean Gaussian random variable, with covariance matrix Σn, show that the Fisher information matrix is given by
where X is the input matrix.
Exercise 4 Consider the regression problem described in one of our labs. Read the same audio file, then add white Gaussian noise at a 15 dB level and randomly “hit” 10% of the data samples with outliers (set the outlier values to 80% of the maximum value of the data samples).
(a) Find the reconstructed data samples obtained by the support vector regression. Employthe Gaussian kernel with σ = 0.004 and set 003 and C = 1. Plot the fitted curve of the reconstructed samples together with the data used for training.
(b) Repeat step (a) using C = 0.05,0.1,0.5,5,10,100.
(c) Repeat step (a) using .
(d) Repeat step (a) using σ = 0.001,0.002,0.01,0.05,0.1.
(e) Comment on the results.
Exercise 5
Show, using Lagrange multipliers, that the `2 minimizer in equation (9.18) from the textbook accepts the closed form solution
θˆ = X>(XX>)−1y
Now, show that for the system y = Xθ with X ∈ Rn×l and n > l the least squares solution is given by
θˆ = (X>X)−1X>y
Exercise 6
Show that the null space of a full rank N ×l matrix X is a subspace of imensionality l −N, for N < l.
Exercise 7
Generate in Python a sparse vector θ ∈ Rl, l = 100, with its first five components taking random values drawn from a normal distribution with mean zero and variance one and the rest being equal to zero. Build, also, a sensing matrix X with N = 30 rows having samples normally distributed, with mean zero and variance 1/N, in order to get 30 observations based on the linear regression model y = Xθ. Then perform the following tasks.
(a) Use a LASSO implementation to reconstruct θ from y and X.
(b) Repeat the experiment 500 times, with different realizations of X, in order to compute the probability of correct reconstruction (assume the reconstruction is exact when ||y = Xθ|| < 10−8).
(c) Repeat the same experiment (500 times) with matrices of the form
, with probability
X(i,j) = 0, with probability 1
, with probability
for p equal to 1,9,25,36,64 (make sure that each row and each column of X has at least a nonzero component). Give an explanation why the probability of reconstruction falls as p increases (observe that both the sensing matrix and the unknown vector are sparse).