$30
Generating Synthetic Data
This assignment shows how we can extend ordinary least squares regression, which uses the hypothesis class of linear regression functions, to non-linear regression functions modeled using polynomial basis functions and radial basis functions. The function we want to fit is ytrue = ftrue(x) = 6(sin(x + 2) + sin(2x + 4)). This is a univariate function as it has only one input variable. First, we generate synthetic input (data) xi by sampling n = 750 points from a uniform distribution on the interval [−7.5, 7.5].
We can generate a synthetic data set, with Gaussian noise.
In [ ]: import numpy as np # For all our math needs
n = 750 # Number of data points
X = np.random.uniform(-7.5, 7.5, n) # Training examples, in one dimension e = np.random.normal(0.0, 5.0, n) # Random Gaussian noise y = f_true(X) + e # True labels with noise
Now, we plot the raw data as well as the true function (without noise).
Recall that we want to build a model to generalize well on future data, and in order to generalize well on future data, we need to pick a model that trade-off well between fit and complexity (that is, bias and variance). We randomly split the overall data set (D) into three subsets:
Training set: Dtrn consists of the actual training examples that will be used to train the model;
Validation set: Dval consists of validation examples that will be used to tune model hyperparameters (such as λ > 0 in ridge regression) in order to find the best trade-off between fit and complexity (that is, the value of λ that produces the best model);
For this example, let us randomly partition the data into three non-intersecting sets: Dtrn = 60% of D, Dval = 10% of D and Dtst = 30% of D.
In [ ]: # scikit-learn has many tools and utilities for model selection from sklearn.model_selection import train_test_split
tst_frac = 0.3 # Fraction of examples to sample for the test set val_frac = 0.1 # Fraction of examples to sample for the validation set
# First, we use train_test_split to partition (X, y) into training and test se ts
X_trn, X_tst, y_trn, y_tst = train_test_split(X, y, test_size=tst_frac, random _state=42)
# Next, we use train_test_split to further partition (X_trn, y_trn) into train ing and validation sets
X_trn, X_val, y_trn, y_val = train_test_split(X_trn, y_trn, test_size=val_frac , random_state=42)
# Plot the three subsets plt.figure()
plt.scatter(X_trn, y_trn, 12, marker='o', color='orange') plt.scatter(X_val, y_val, 12, marker='o', color='green') plt.scatter(X_tst, y_tst, 12, marker='o', color='blue')
1. **Regression with Polynomial Basis Functions**, 30 points.
This problem extends ordinary least squares regression, which uses the hypothesis class of linear regression functions, to non-linear regression functions modeled using polynomial basis functions. In order to learn nonlinear models using linear regression, we have to explicitly transform the data into a higher-dimensional space. The nonlinear hypothesis class we will consider is the set of d-degree polynomials of the form f(x) = w0 + w1x + w2x2+...+wdxd or a linear combination of polynomial basis function:
⎡ 1 ⎤
⎢⎢⎢⎢⎢⎢⎢ x ⎥⎥⎥⎥⎥⎥⎥
f(x) = [w0, w1, w2 ...,wd]T x2 .
⋮
⎣xd ⎦
The monomials {1, x, x2, ..., xd} are called basis functions, and each basis function xk has a corresponding weight wk associated with it, for all k = 1,...,d. We transform each univariate data point xi into into a multivariate (d-dimensional) data point via ϕ(xi) → [1, xi, x2i , ..., xdi ]. When this transformation is applied to every data point, it produces the Vandermonde matrix:
⎡1 x1 x21 ... xd1 ⎤
⎢1 x2 x22 ... xd2
Φ = ⎢⎢⎢⎢ ⎥⎥⎥⎥⎥.
⋮ ⋮ ⋮ ⋱ ⋮
⎣1 xn x2n ⋯ xdn ⎦
a. (10 points)
Complete the Python function below that takes univariate data as input and computes a Vandermonde matrix of dimension d. This transforms one-dimensional data into d-dimensional data in terms of the polynomial basis and allows us to model regression using a d-degree polynomial.
b.
Complete the Python function below that takes a Vandermonde matrix Φ and the labels y as input and learns weights via ordinary least squares regression. Specifically, given a Vandermonde matrix Φ, implement the computation of w = (ΦT Φ)−1ΦT y. Remember that in Python, @ performs matrix multiplication, while * performs element-wise multiplication. Alternately, numpy.dot (https://docs.scipy.org/doc/numpy1.15.0/reference/generated/numpy.dot.html) also performs matrix multiplication.
c. (5 points)
Complete the Python function below that takes a Vandermonde matrix Φ, corresponding labels y, and a linear regression model w as input and evaluates the model using mean squared error. That is,
ϵMSE wT Φi)2.
d. (5 points, Discussion)
We can explore the effect of complexity by varying d = 3,6,9, ⋯,24 to steadily increase the non-linearity of the models. For each model, we train using the transformed training data (Φ, whose dimension increases) and evaluate its performance on the transformed validation data and estimate what our future accuracy will be using the test data.
From plot of d vs. validation error below, which choice of d do you expect will generalize best?
In [ ]: w = {} # Dictionary to store all the trained models validationErr = {} # Validation error of the models testErr = {} # Test error of all the models
for d in range(3, 25, 3): # Iterate over polynomial degree
Phi_trn = polynomial_transform(X_trn, d) # Transform train ing data into d dimensions
w[d] = train_model(Phi_trn, y_trn) # Learn model on training data
Phi_val = polynomial_transform(X_val, d) # Transform valid ation data into d dimensions
validationErr[d] = evaluate_model(Phi_val, y_val, w[d]) # Evaluate model on validation data
Phi_tst = polynomial_transform(X_tst, d) # Transform test data i nto d dimensions
testErr[d] = evaluate_model(Phi_tst, y_tst, w[d]) # Evaluate model on tes
t data
# Plot all the models plt.figure()
plt.plot(validationErr.keys(), validationErr.values(), marker='o', linewidth=3
, markersize=12)
plt.plot(testErr.keys(), testErr.values(), marker='s', linewidth=3, markersize
=12)
plt.xlabel('Polynomial degree', fontsize=16) plt.ylabel('Validation/Test error', fontsize=16) plt.xticks(list(validationErr.keys()), fontsize=12) plt.legend(['Validation Error', 'Test Error'], fontsize=16) plt.axis([2, 25, 15, 60])
Finally, let's visualize each learned model.
2. **Regression with Radial Basis Functions**, 70 points
In the previous case, we considered a nonlinear extension to linear regression using a linear combination of polynomial basis functions, where each basis function was introduced as a feature ϕ(x) = xk. Now, we consider Gaussian radial basis functions of the form:
ϕ(x) = e−γ (x−μ)2,
whose shape is defined by its center μ and its width γ > 0. In the case of polynomial basis regression, the user's choice of the dimension d determined the transformation and the model. For radial basis regression, we have to contend with deciding how many radial basis functions we should have, and what their center and width parameters should be. For simplicity, let's assume that γ = 0.1 is fixed. Instead of trying to identify the number of radial basis functions or their centers, we can treat **each data point as the center of a radial basis function**, which means that the model will be:
⎡ e−γ (x−x1)2 ⎤
⎢⎢⎢⎢⎢⎢⎢⎢ e−γ (x−x2)2 ⎥⎥⎥⎥⎥⎥⎥⎥
f(x) = [w1, w2, w3 ...,wn]T e−γ (x−x2)2 .
...
⎣ e−γ (x−xn)2 ⎦
This transformation uses radial basis functions centered around data points e−γ (x−xi)2 and each basis function has a corresponding weight wi associated with it, for all i = 1,...,n. We transform each univariate data point xj into into a multivariate (n-dimensional) data point via ϕ(xj) → [...,e−γ (xj−xi)2, .... When this] transformation is applied to every data point, it produces the radial-basis kernel:
⎡ 1 e−γ (x1−x2)2 e−γ (x1−x3)2 ... e−γ (x1−xn)2 ⎤
⎢⎢ e−γ (x2−x1)2 1 e−γ (x2−x3)2 ... e−γ (x2−xn)2 ⎥
Φ = ⎢⎢⎢⎢ ⎥⎥⎥⎥⎥.
⋮ ⋮⋮
⎣e−γ (xn−x1)2 e−γ (xn−x2)2 e−γ (xn−x3) ⋯ 1 ⎦
a. (15 points)
Complete the Python function below that takes univariate data as input and computes a radial-basis kernel. This transforms one-dimensional data into n-dimensional data in terms of Gaussian radial-basis functions centered at each data point and allows us to model nonlinear (kernel) regression.
b.
Complete the Python function below that takes a radial-basis kernel matrix Φ, the labels y, and a regularization parameter λ > 0 as input and learns weights via ridge regression. Specifically, given a radial-basis kernel matrix Φ, implement the computation of w = (ΦT Φ + λIn)−1 ΦT y.
c.
As before, we can explore the tradeoff between fit and complexity by varying
λ ∈ [10−3,10−2 ⋯,1, ⋯ 103]. For each model, train using the transformed training data (Φ) and evaluate its performance on the transformed validation and test data. Plot two curves: (i) λ vs. validation error and (ii) λ vs. test error, as above.
What are some ideal values of λ?
In [ ]: #
d.
Plot the learned models as well as the true model similar to the polynomial basis case above. How does the linearity of the model change with λ?
In [ ]: #
You have to submit a single .py file that contains all the co