$30
Consider the problem of learning a regression model from 5 univariate observations
((0.8), (1), (1.2), (1.4), (1.6)) with targets (24,20,10,13,12).
1) [5v] Consider the basis function, ๐๐(๐ฅ) = ๐ฅ๐, for performing a 3-order polynomial regression,
๐ง
ฬ(๐ฅ, ๐ฐ) = ∑๐ค๐๐๐(๐ฅ) = ๐ค0 + ๐ค1๐ฅ + ๐ค2๐ฅ2 + ๐ค3๐ฅ3
3
๐=0
.
Learn the Ridge regression (๐
2 regularization) on the transformed data space using the closed
form solution with ๐ = 2.
Hint: use numpy matrix operations (e.g., linalg.pinv for inverse) to validate your calculus.
2) [1v] Compute the training RMSE for the learnt regression model.
3) [6v] Consider a multi-layer perceptron characterized by one hidden layer with 2 nodes. Using the
activation function ๐(๐ฅ) = ๐0.1๐ฅ on all units, all weights initialized as 1 (including biases), and the
half squared error loss, perform one batch gradient descent update (with learning rate ๐ = 0.1)
for the first three observations (0.8), (1) and (1.2).
II. Programming and critical analysis [8v]
Consider the following three regressors applied on kin8nm.arff data (available at the webpage):
− linear regression with Ridge regularization term of 0.1
− two MLPs – ๐๐ฟ๐1 and ๐๐ฟ๐2 – each with two hidden layers of size 10, hyperbolic tangent
function as the activation function of all nodes, a maximum of 500 iterations, and a fixed
seed (random_state=0). ๐๐ฟ๐1 should be parameterized with early stopping while ๐๐ฟ๐2
should not consider early stopping. Remaining parameters (e.g., loss function, batch size,
regularization term, solver) should be set as default.
Using a 70-30 training-test split with a fixed seed (random_state=0):
4) [4v] Compute the MAE of the three regressors: linear regression, ๐๐ฟ๐1 and ๐๐ฟ๐2.
5) [1.5v] Plot the residues (in absolute value) using two visualizations: boxplots and histograms.
Hint: consider using boxplot and hist functions from matplotlib.pyplot to this end
6) [1v] How many iterations were required for ๐๐ฟ๐1 and ๐๐ฟ๐2 to converge?
7) [1.5v] What can be motivating the unexpected differences on the number of iterations?
Hypothesize one reason underlying the observed performance differences between the MLPs.
END