$25
0. Linear Algebra Review
Let θ := (θ1,...,θd) ∈ Rd be a vector, and θ0 ∈ R be a scalar. Let the hyperplane H be the set of all points x := (x1,...,xd) ∈ Rd such that 0 = θx + θ0, where
θx = θ1x1 + ··· + θdxd
is the dot product. The goal is to find the shortest distance between H and a point y ∈ Rd. There are many ways to solve this problem, but we will be using Lagrange multipliers to familiarize ourselves with this powerful method.
Let ˜x be the point on H that is closest to y. Then ˜x solves the optimization problem
minimize (x − y)(x − y) x∈Rd subject to θx + θ0 = 0.
The Lagrangian for this optimization problem is
L(x,λ) = (x − y)(x − y) + λ(θx + θ0)
where λ is the Lagrange multiplier.
1.1. Write down the derivatives of L(x,λ) with respect to x1,...,xd and λ.
1.2. Equate the derivatives to zero, and solve the equations to find ˜x.
1.3. Use ˜x to find the distance of y to the hyperplane H.
1. Probability Review
Let X and Y be independent Poisson random variables, i.e.
, for all x,y ≥ 0.
for some rates α,β 0. Let the random variable Z = X + Y be their sum.
2.1. Write P(Z = z) as a sum of products of P(X = x) and P(Y = y).
2.2. Show that Z is also Poisson, and find its rate γ.
3. Linear Regression
We will use PyTorch to perform linear regression using gradient descent. Import the Boston housing data from the following link.
https://www.dropbox.com/s/kkeu8nvto35n0dt/boston.csv?dl=1
We will train a linear model that predicts the prices of houses MEDV using three inputs:
(i) average number of rooms per dwelling RM; (ii) index of accessibility to radial highways RAD; (iii) per capita crime rate by town CRIM.
You can access the selected inputs and target variables using the following code:
import matplotlib.pyplot as plt import numpy csv = ’boston.csv’
data = numpy.genfromtxt(csv,delimiter=’,’)
The data contains 506 observations on housing prices in suburban Boston. The first three columns are the inputs RM, RAD and CRIM. The last column is the target MEDV.
Convert the data to PyTorch tensors using the following code.
import torch inputs = data[:, [0,1,2]] inputs = inputs.astype(numpy.float32) inputs = torch.from_numpy(inputs) target = data[:,3] target = target.astype(numpy.float32) target = torch.from_numpy(target)
3.1. Write the code to generate (random) weights wRM,wRAD,wCRIM and bias b. After that, write a function to compute the linear model.
3.2. Write a function that computes the mean squared error (MSE).
3.3. Complete the loop below to update the weights and bias using a fixed learning rate (try different values from 0.01 to 0.0001) over 200 iterations/epochs.
4 DUE 14 FEB. TOTAL 40 POINTS.
for i in range(200): print("Epoch", i, ":")
# compute the model predictions # compute the loss and its gradient print("Loss=", loss) with torch.no_grad():
# update the weights # update the bias
w.grad.zero_()
b.grad.zero_()
(We use w.grad.zero () and b.grad.zero () to reset the gradients to zero because PyTorch accumulates gradients.)