Starting from:

$35

ECE4710J- Homework 3 Solved

Geometry of Least Squares
1. Suppose we have a dataset represented with the design matrix span(X) and response vector Y. We use linear regression to solve for this and obtain optimal weights as ✓ˆ. Draw the geometric interpretation of the column space of the design matrix span(X), the response vector Y, the residuals Y X✓ˆ, and the predictions X✓ˆ(using optimal parameters) and X↵ (using an arbitrary vector ↵).

 

(a)     What is always true about the residuals in least squares regression? Select all that apply.

⇤ A. They are orthogonal to the column space of the design matrix.

⇤ B. They represent the errors of the predictions.

⇤ C. Their sum is equal to the mean squared error.

⇤ D. Their sum is equal to zero. ⇤ E. None of the above.

1

(b)    Which are true about the predictions made by OLS? Select all that apply.

⇤ A. They are projections of the observations onto the column space of the design matrix.

⇤ B. They are linear combinations of the features.

⇤ C. They are orthogonal to the residuals.

⇤ D. They are orthogonal to the column space of the features.

⇤ E. None of the above.

(c)     We fit a simple linear regression to our data (xi,yi),i = 1,2,3, where xi is the independent variable and yi is the dependent variable. Our regression line is of the form yˆ = ✓ˆ0 +✓ˆ1x. Suppose we plot the relationship between the residuals of the model and the ysˆ , and find that there is a curve. What does this tell us about our model?

⇤ A. The relationship between our dependent and independent variables is well represented by a line.

⇤ B. The accuracy of the regression line varies with the size of the dependent variable.

⇤ C. The variables need to be transformed, or additional independent variables are needed.


3

Understanding Dimensions
2. In this exercise, we will examine many of the terms that we have been working with in regression (e.g. ✓ˆ) and connect them to their dimensions and to concepts that they represent.

First, we define some notation. The n⇥p design matrixX hasX corresponds top+1 features, where the additionn observations on p features. (In lecture, we stated that we sometimes say

feature is a column of all 1s for the intercept term, but strictly speaking that column doesn’t need to exist. In this problem, one of the p columns may be a column of all 1s.) Y is the response variable. It is a vector, containing the true response for all observations. We assume in this problem that we use X and Y to compute optimal parameters ✓ˆfor a linear model, and that this linear model generates predictions using Yˆ = X✓ˆ as we saw in lecture and in Question 1 of this discussion. Each of the n rows in our design matrix X contains all features for a single observation. Each of the p columns in our design matrix X contains a single feature, for all observations. We denote the rows and columns of X as follows:

X:,j   jth column vector in X,j = 1,...,p Xi,:                       ith row vector in X,i = 1,...,n

Below, on the left, we have several expressions, labelled a through h, and on the right we have several terms, labelled 1 through 10. For each expression, determine its shape (e.g., nat all. If a specific expression is nonsensical because the dimensions don’t line up for a matrix⇥ p), and match it to one of the given terms. Terms may be used more than once or not multiplication, write “N/A” for both.

(a)    X

(b)    ✓ˆ

(c)     X:,j

(d)    X1,: · ✓ˆ

(e)    X:,1 · ✓ˆ

(f)      X✓ˆ

(g)    (XT X) 1XT Y
1.    the residuals

2.    0

3.    1st response, y1

4.    1st predicted value, yˆ1

5.    1st residual, e1

6.    the estimated coefficients

7.    the predicted values
(h) (I X(XT X) 1XT )Y
8.        the features for a single observation

9.        the value of a specific feature for all observations

10.    the design matrix

As an example, for 2a, you would write: “2a. Dimension: n ⇥ p, Term: 10”.

More products