Starting from:

$30

Machine Learning Homework 3 -Solved

Consider the problem of learning a regression model from 5 univariate observations 
((0.8), (1), (1.2), (1.4), (1.6)) with targets (24,20,10,13,12). 
1) [5v] Consider the basis function, ๐œ™๐‘—(๐‘ฅ) = ๐‘ฅ๐‘—, for performing a 3-order polynomial regression, 
๐‘ง
ฬ‚(๐‘ฅ, ๐ฐ) = ∑๐‘ค๐‘—๐œ™๐‘—(๐‘ฅ) = ๐‘ค0 + ๐‘ค1๐‘ฅ + ๐‘ค2๐‘ฅ2 + ๐‘ค3๐‘ฅ3 
3
๐‘—=0 

Learn the Ridge regression (๐‘™
2 regularization) on the transformed data space using the closed 
form solution with ๐œ† = 2. 
Hint: use numpy matrix operations (e.g., linalg.pinv for inverse) to validate your calculus. 
2) [1v] Compute the training RMSE for the learnt regression model. 
3) [6v] Consider a multi-layer perceptron characterized by one hidden layer with 2 nodes. Using the 
activation function ๐‘“(๐‘ฅ) = ๐‘’0.1๐‘ฅ on all units, all weights initialized as 1 (including biases), and the 
half squared error loss, perform one batch gradient descent update (with learning rate ๐œ‚ = 0.1) 
for the first three observations (0.8), (1) and (1.2).
 
II. Programming and critical analysis [8v] 
Consider the following three regressors applied on kin8nm.arff data (available at the webpage): 
− linear regression with Ridge regularization term of 0.1 
− two MLPs – ๐‘€๐ฟ๐‘ƒ1 and ๐‘€๐ฟ๐‘ƒ2 – each with two hidden layers of size 10, hyperbolic tangent 
function as the activation function of all nodes, a maximum of 500 iterations, and a fixed 
seed (random_state=0). ๐‘€๐ฟ๐‘ƒ1 should be parameterized with early stopping while ๐‘€๐ฟ๐‘ƒ2 
should not consider early stopping. Remaining parameters (e.g., loss function, batch size, 
regularization term, solver) should be set as default. 
Using a 70-30 training-test split with a fixed seed (random_state=0): 
4) [4v] Compute the MAE of the three regressors: linear regression, ๐‘€๐ฟ๐‘ƒ1 and ๐‘€๐ฟ๐‘ƒ2. 
5) [1.5v] Plot the residues (in absolute value) using two visualizations: boxplots and histograms. 
Hint: consider using boxplot and hist functions from matplotlib.pyplot to this end 
6) [1v] How many iterations were required for ๐‘€๐ฟ๐‘ƒ1 and ๐‘€๐ฟ๐‘ƒ2 to converge? 
7) [1.5v] What can be motivating the unexpected differences on the number of iterations? 
Hypothesize one reason underlying the observed performance differences between the MLPs. 
END

More products