$30
1 Proof - Ridge Regression - Primal vs Dual
In the primal formulation, the ridge regression problem takes the following form:
βb = argmin , (1)
where X is an N × D matrix, β is a D-dimensional vector and y is an N-dimensional vector. As you saw in the lecture, the optimal βb is given by
. (2)
Here ID is the D-dimensional unit matrix. You also know that the dual formulation of the problem is given by
α = argmax , (3) b with the solution for α
b
. (4)
For each feasible α, a corresponding feasible β is given by:
β = XTα .
(5)
Prove that the optimal βb corresponds to the optimal αb (and not to some other α):
βb = XTαb . (6)
Hint: Start by proving the following lemma (e.g. using the SVD of X). It will be useful in your derivation:
(7)
2 Kernelized (Ridge) Regression
From the lecture you know that using the dual formulation of the ridge regression, we are able to introduce the kernel trick with kernel matrix G. New regressed values ynew are computed through
ynew = yT (G + τIN)−1 κT . (8)
The elements of the kernel matrix are computed on the training set via the kernel function K
Gi1i2 = K(Xi1,Xi2), (9)
and the weight vector α = yT (G + τIN)−1 only needs to be computed once at the beginning. In contrast, the kernel vector κ has to be recomputed for each new instance Xnew according to κi = K(Xnew,Xi).
In this task, you shall apply kernel ridge regression to reconstruct all missing pixels in the grayscale image cc_90.png (available under external link on MaMPF). The features Xi are the pixel coordinates, and the response yi is the corresponding grayvalue. Pixels with grayvalue = 0 are considered missing and shall be replaced with their regressed values. Use a squared exponential kernel function
!
(10)
If you are careless in structuring your program, it will be too slow to run. Therefore, keep to the following guidelines:
Cut o the kernel function (i.e. set it to zero) at a sensibly large radius to get a sparse kernel matrix, and use a sparse array class.
Do not loop over each cell in the kernel matrix, vectorize at least one dimension.
Do not explicitly invert any matrices. In general, but especially for sparse matrices, solving Ax = b for x is many times faster than computing x = A−1b explicitly. For example, you can use scipy.sparse.linalg.spsolve
Do the same experiment for the Nadaraya-Watson kernel regression (a heuristic simpli cation of kernel ridge regression, which does not require the expensive pre-computed weights α):
y (11)
Play with the σ of the squared exponential as well as the τ of the ridge regression and nd the parameters that produce the visually best reconstructed image for both approaches. Comment on the quality of the two methods.
3 Fitting Circles
In this exercise we will have a closer look at tting circles to data. The numpy- le circles.npy (available under external link on MaMPF) contains many pairs of x-y-coordinates, and can be loaded through data = np.load("circles.npy"). Visualize the data in a scatter plot to show that the points are arranged in the shape of several circles and circle segments. Pay attention that the axes are scaled identically when plotting the data, otherwise your circles will look like ellipses. How many circles or circle segments would you t into the data as a human?
3.1 RANSAC
First implement the RANSAC-Algorithm for the tting of circles:
For a set number of times N, repeat the following:
Randomly choose 3 points and determine the circle passing throuhg all of them, parametrized though its radius and the coordinates of the center. For example, you can achieve this by foming and solving a system of three equations, one for each point. Classify points that are closer than to the circle as inliers and count them.
If the inlier count for this circle is higher than for the best circle so far, save the current circle and its inliers as the new best.
Fit further circles by deleting all inliers of the last tted circle from the dataset and repeat the procedure.
Estimate the number of iterations N that is needed, as you learnt in the lecture.
Plot all tted circles on top of original data and comment on the result. Is the result sensitive to the value of , the parameter that is used to determine the inliers?
Hint: For plotting circles, you can use the following methods:
circle = plt.Circle((cx, cy), radius=r, fill=False) # Create a circle plt.gca().add_patch(circle) # Add it to the plot
In the next two sub-tasks, you will further improve the ts of your circles using two di erent methods.
If you were not able to implement the RANSAC, then either try to t a circle manually in the data and get the inliers this way, or create data for a circle + noise and use it for the rest of the exercise.
3.2 Algebraic Distance (3 Points)
For each set of inliers you found with RANSAC, t a circle by minimizing the algebraic distance:
(12)
You can use scipy.optimize.least_squares (or another optimization library of your choice) as an implementation. Plot the re ned circles on top of the data and comment on the results.
3.3 Levenberg-Marquardt
For each set of inliers you found with RANSAC, solve
(13)
by adapting the Levenberg-Marquardt algorithm to this problem[1]. Derive the theoretical approach by hand, but feel free to use scipy.optimize.least_squares (or another optimization library of your choice) as an implementation.
For the theory, rst solve for r and express it in terms of x,c, then calculate the derivative with respect to c. Again, plot the resulting circles and comment on what you nd.
3.4 Comparison
It was mentioned in the lecture that the solution from (13) is more robust to outliers than solving (12). Try to show this experimentally.
Hint: You can approach this problem for example as follows: Pick one set of inliers, sample a subset of those and maybe add a couple of outliers. Then apply your procedures from 3.2 and 3.3 to this dataset.
[1] Since the Levenberg-Marquardt algorithm was not covered in the lecture this time, you can watch last year’s recordings at https://mampf.mathi.uni-heidelberg.de/media/17998/play, starting around 36’30