$30
We explored linear models the last lecture. We will strengthen this understanding by implementing linear and logistic regression models as part of theassignment.
Section I - Linear Regression
We will implement a linear regression model to fit a curve to some data. Since the data is nonlinear, we will implement polynomial regression and use ridgeregression to implement the best possible fit.
1. Load Data and Visualize
Let us load a dataset of points
. As a first step, let's import the required libraries followed by the dataset. (x,y)
In [3]:
import
numpy
as
np
from
datasets
import
ridge_reg_data
# Libraries for evaluating the solution
import
pytest
import
numpy.testing
as
npt
import
random
random
.
seed
(
1
)
np
.
random
.
seed
(
1
)
train_X
,
train_Y
,
test_X
,
test_Y
=
ridge_reg_data
()
# Pre-defined function for loading the dataset
train_Y
=
train_Y
.
reshape
(
-
1
,
1
)
# reshaping from (m,) -> (m,1)
test_Y
=
test_Y
.
reshape
(
-
1
,
1
)
print
(
'train_X.shape is '
,
train_X
.
shape
)
print
(
'train_Y.shape is '
,
train_Y
.
shape
)
print
(
'test_X.shape is '
,
test_X
.
shape
)
print
(
'test_Y.shape is '
,
test_Y
.
shape
)
Visualize Data
The dataset is split into train and test sets. The train set consists of 300 samples and the test set consists of 200 samples. We will use scatter plot to visualizethe relationship between the '
' and '
'. Lets visualize the data using the scatter plot from
matplotlib
. x y
In [4]:
import
matplotlib.pyplot
as
plt
%
matplotlib
inline
plt
.
scatter
(
train_X
,
train_Y
,
marker
=
'o'
,
s
=
4
)
plt
.
ylim
(
-
2
,
3
)
plt
.
xlabel
(
'x'
)
plt
.
ylabel
(
'y'
);
Linear Regression - Polynomial Transformation
Using the train data we hope to learn a relationship mapping
to
. We can evaluate this mapping using the test data. Linear regression will try to fit a straightline (linear relation) mapping
to
. However, we observe the
and
do not have a linear relationship. A straight line will not be a good fit. We need a non-linear mapping (curve) between
and
.
We discussed in the lecture that nonlinear regression can be achieved by transforming the scalar
to a high dimension sample and performing linearregression with the transformed data. We can transform
into a
dimensional vector (
) in order to perform nonlinear regression. For example,
transforms
into a
dimension vector
, where
is
raised to
. In vectorized notation, the dataset
is transformed to
of dimension
, where
is the number of samples.
Every scalar
is converted into a
dimension vector,
. We can now perform linear regression in
dimensions.
In the above equation,
is the target variable,
are the parameters/weights of the model,
is the transformed datapoint in the row vector format, where
is the
component.
In the vectorized notation, the linear regression for
samples is written as
, where
has the data points as row vectors and is ofdimensions
,
- is the Design matrix of dimension
, where
is the number of samples and
is the degree of the polynomial that we are trying to fit. Thefirst column of 1's in the design matrix will account for the bias , resulting in
dimensions
- Vector of the prediction labels of dimension
.
Lets implement a function to achieve this transformation.
x y
x y x y
x y
x
x d d ≥ 2 d = 5
x (d+1) [1,x,x2,x3,x4,x5]⊤ xk x k X
Φ(X) m×(d+1) m
x (d+1) [1,x1,x2,x3,…,xd]⊤ (d+1)
y = Φ(x)θ = θ0 +x1θ1+. . .+xd−1θd−1 +xdθd
y θ = [θ0, . . ,θd]⊤ Φ(x) = [1,x1, . . ,xd]
xk kth
m Y^ = Φ(X)θ Φ(X)
m×(d+1)
=
⎡
⎣
⎢⎢⎢⎢⎢
y^(1)
y^(2)
⋮
y^(m)
⎤
⎦
⎥⎥⎥⎥⎥
⎡
⎣
⎢⎢⎢⎢⎢⎢
1
1
⋮
1
x
(1)
1
x
(2)
1
⋮
x(m)
1
x
(1)
2
x
(2)
2
⋮
x(m)
2
…
…
⋮
…
x
(1)
d
x
(2)
d
⋮
x(m)
d
⎤
⎦
⎥⎥⎥⎥⎥⎥
⎡
⎣ ⎢⎢⎢⎢
θ0
θ1
⋮
θd
⎤
⎦ ⎥⎥⎥⎥
X m×(d+1) m d
d+1
Y m×1
In [5]:
def
poly_transform
(
X
,
d
):
'''
Function to transform scalar values into (d+1)-dimension vectors.
Each scalar value x is transformed a vector [1,x,x^2,x^3, ... x^d].
Inputs:
X: vector of m scalar inputs od shape (m, 1) where each row is a scalar input x
d: number of dimensions
Outputs:
Phi: Transformed matrix of shape (m, (d+1))
'''
Phi
=
np
.
ones
((
X
.
shape
[
0
],
1
))
for
i
in
range
(
1
,
d
+
1
):
col
=
np
.
power
(
X
,
i
)
Phi
=
np
.
hstack
([
Phi
,
col
])
return
Phi
Linear Regression - Objective Function
Let us define the objective function that will be optimized by the linear regression model.
Here,
is the design matrix of dimensions (m \times (d+1)) and
is the
dimension vector of labels.
is the
dimension vector of weightparameters.
Hint: You may want to use
numpy.dot
L(Φ(X),Y ,θ) = (Y −Φ(X)θ) (Y −Φ(X)θ) ⊤
Φ(X) Y m θ (d+1)
In [6]:
def
lin_reg_obj
(
Y
,
Phi
,
theta
):
'''
Objective function to estimate loss for the linear regression model.
Inputs:
Phi: Design matrix of dimensions (m, (d+1))
Y: ground truth labels of dimensions (m, 1)
theta: Parameters of linear regression of dimensions ((d+1),1)
outputs:
loss: scalar loss
'''
# your code here
prod
=
np
.
dot
(
Phi
,
theta
)
loss
=
np
.
dot
(
np
.
transpose
(
Y
-
prod
),(
Y
-
prod
))
return
loss
In [7]:
# Contains hidden tests
random
.
seed
(
1
)
np
.
random
.
seed
(
1
)
m1
=
10
;
d1
=
5
;
X_t
=
np
.
random
.
randn
(
m1
,
1
)
Y_t
=
np
.
random
.
randn
(
m1
,
1
)
theta_t
=
np
.
random
.
randn
((
d1
+
1
),
1
)
PHI_t
=
poly_transform
(
X_t
,
d1
)
loss_est
=
lin_reg_obj
(
Y_t
,
PHI_t
,
theta_t
)
Linear Regression - Closed Form Solution
Let us define a closed form solution to the objective function. Feel free to revisit the lecture to review the topic.
Closed form solution is given by,
Here
is the
dimension design matrix obtained using
poly_transform
function defined earlier and
are the ground truth labels ofdimensions
.
Hint: You may want to use
numpy.linalg.inv
and
numpy.dot
.
θ = (Φ(X)⊤Φ(X)) Φ(X Y −1 )⊤
Φ(X) (m×(d+1)) Y
(m×1)
In [8]:
#Closed form solution
def
lin_reg_fit
(
Phi_X
,
Y
):
'''
A function to estimate the linear regression model parameters using the closed form solution.
Inputs:
Phi_X: Design matrix of dimensions (m, (d+1))
Y: ground truth labels of dimensions (m, 1)
Outputs:
theta: Parameters of linear regression of dimensions ((d+1),1)
'''
# your code here
Phi_t
=
np
.
transpose
(
Phi_X
)
inverse_m
=
np
.
linalg
.
inv
(
np
.
dot
(
Phi_t
,
Phi_X
))
theta
=
np
.
dot
(
np
.
dot
(
inverse_m
,
Phi_t
),
Y
)
return
theta
In [9]:
# Contains hidden tests
random
.
seed
(
1
)
np
.
random
.
seed
(
1
)
m1
=
10
;
d1
=
5
;
X_t
=
np
.
random
.
randn
(
m1
,
1
)
Y_t
=
np
.
random
.
randn
(
m1
,
1
)
PHI_t
=
poly_transform
(
X_t
,
d1
)
theta_est
=
lin_reg_fit
(
PHI_t
,
Y_t
)
Metrics for Evaluation
We will evaluate the goodness of our linear regression model using root mean square error. This compares the difference between the estimate Y-labels andthe groundth truth Y-labels. The smaller the RMSE value, better is the fit.
1.
RMSE (Root Mean Squared Error)
Hint: You may want to use:
numpy.sqrt
,
numpy.sum
or
numpy.dot
.
(y_pre −
1
mΣ
i=1
m
d(i) y(i))2
−−−−−−−−−−−−−−−−−−−
√
In [117]:
def
get_rmse
(
Y_pred
,
Y
):
'''
function to evaluate the goodness of the linear regression model.
Inputs:
Y_pred: estimated labels of dimensions (m, 1)
Y: ground truth labels of dimensions (m, 1)
Outputs:
rmse: root means square error
'''
# your code here
sum
=
0
m
=
np
.
size
(
Y
,
0
)
# for i in range(0, m-1):
# sum = sum + np.power(Y_pred[i] - Y[i], 2)
diff
=
Y_pred
-
Y
diff_squared
=
diff
**
2
mean_diff
=
diff_squared
.
mean
()
rmse
=
np
.
sqrt
(
mean_diff
)
return
rmse
In [118]:
# Contains hidden tests
random
.
seed
(
1
)
np
.
random
.
seed
(
1
)
m1
=
50
Y_Pred_t
=
np
.
random
.
randn
(
m1
,
1
)
Y_t
=
np
.
random
.
randn
(
m1
,
1
)
rmse_est
=
get_rmse
(
Y_Pred_t
,
Y_t
)
Let's visualize the nonlinear regression fit and the RMSE evaluation error on the test data
In [119]:
d
=
20
Phi_X_tr
=
poly_transform
(
train_X
,
d
)
theta
=
lin_reg_fit
(
Phi_X_tr
,
train_Y
)
#Estimate the prediction on the train data
Y_Pred_tr
=
np
.
dot
(
Phi_X_tr
,
theta
)
rmse
=
get_rmse
(
Y_Pred_tr
,
train_Y
)
print
(
'Train RMSE = '
,
rmse
)
#Perform the same transform on the test data
Phi_X_ts
=
poly_transform
(
test_X
,
d
)
#Estimate the prediction on the test data
Y_Pred_ts
=
np
.
dot
(
Phi_X_ts
,
theta
)
#Evaluate the goodness of the fit
rmse
=
get_rmse
(
Y_Pred_ts
,
test_Y
)
print
(
'Test RMSE = '
,
rmse
)
import
matplotlib.pyplot
as
plt
%
matplotlib
inline
plt
.
scatter
(
test_X
,
test_Y
,
marker
=
'o'
,
s
=
4
)
# Sampling more points to plot a smooth curve
px
=
np
.
linspace
(
-
2
,
2
,
100
)
.
reshape
(
-
1
,
1
)
PX
=
poly_transform
(
px
,
d
)
py
=
np
.
dot
(
PX
,
theta
)
plt
.
xlabel
(
'x'
)
plt
.
ylabel
(
'y'
)
plt
.
ylim
(
-
2
,
3
)
plt
.
plot
(
px
,
py
,
color
=
'red'
);
2. Ridge Regression
The degree of the polynomial regression is
. Even though the curve appears to be smooth, it may be fitting to the noise. We will use Ridge Regressionto get a smoother fit and avoid-overfitting. Recall the ridge regression objective form:
where,
is the regularization parameter. Larger the value of
, the more smooth the curve. The closed form solution to the objective is give by:
Here,
is the identity matrix of dimensions
,
is the
dimension design matrix obtained using
poly_transform
function defined earlier and
are the ground truth labels of dimensions
.
d = 10
L(Φ(X),Y ,θ,λ) = (Y −Φ(X)θ) (Y −Φ(X)θ)+ θ ⊤
λ2θ⊤
λ ≥ 0 λ
θ = (Φ(X)⊤Φ(X)+λ2Id) Φ(X Y −1 )⊤
Id ((d+1)×(d+1)) Φ(X) (m×(d+1))
Y (m×1)
Ridge Regression Closed Form Solution
Similar to Linear regression, lets implement the closed form solution to ridge regression.
In [120]:
def
ridge_reg_fit
(
Phi_X
,
Y
,
lamb_d
):
'''
A function to estimate the ridge regression model parameters using the closed form solution.
Inputs:
Phi_X: Design matrix of dimensions (m, (d+1))
Y: ground truth labels of dimensions (m, 1)
lamb_d: regularization parameter
Outputs:
theta: Parameters of linear regression of dimensions ((d+1),1)
'''
#Step 1: get the dimension dplus1 using Phi_X to create the identity matrix $I_d$
#Step 2: Estimate the closed form solution similar to *linear_reg_fit* but now includethe lamb_d**2*I_d term
# your code here
col_no
=
np
.
size
(
Phi_X
,
1
)
Phi_t
=
np
.
transpose
(
Phi_X
)
Phi_product
=
np
.
dot
(
Phi_t
,
Phi_X
)
identity_m
=
np
.
power
(
lamb_d
,
2
)
*
np
.
identity
(
col_no
)
inverse_m
=
np
.
linalg
.
inv
(
Phi_product
+
identity_m
)
theta
=
np
.
dot
(
np
.
dot
(
inverse_m
,
Phi_t
),
Y
)
return
theta
In [121]:
# Contains hidden tests
random
.
seed
(
1
)
np
.
random
.
seed
(
1
)
m1
=
10
;
d1
=
5
;
lamb_d_t
=
0.1
X_t
=
np
.
random
.
randn
(
m1
,
1
)
Y_t
=
np
.
random
.
randn
(
m1
,
1
)
PHI_t
=
poly_transform
(
X_t
,
d1
)
theta_est
=
ridge_reg_fit
(
PHI_t
,
Y_t
,
lamb_d_t
)
Cross Validation to Estimate (
)
In order to avoid overfitting when using a high degree polynomial, we have used
ridge regression
. We now need to estimate the optimal value of
using
cross-validation
.
We will obtain a generic value of
using the entire training dataset to validate. We will employ the method of
-fold cross validation
, where we split thetraining data into
non-overlapping random subsets. In every cycle, for a given value of
,
subsets are used for training the ridge regression modeland the remaining subset is used for evaluating the goodness of the fit. We estimate the average goodness of the fit across all the subsets and select the
that results in the best fit.
λ
λ
λ k
k λ (k−1)
lambda
K-fold cross validation
It is easier to shuffle the index and slice the training into required number of segments, than processing the complete dataset. The below function
k_val_ind
returns a 2D list of indices by spliting the datapoints into '
' sets
Refer the following documentation for splitting and shuffling:
https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.shuffle.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html
()
k_fold
In [122]:
def
k_val_ind
(
index
,
k_fold
,
seed
=
1
):
'''
Function to split the data into k folds for cross validation. Returns the indices of the data points
belonging to every split.
Inputs:
index: all the indices of the training
k_fold: number of folds to split the data into
Outputs:
k_set: list of arrays with indices
'''
np
.
random
.
seed
(
seed
)
np
.
random
.
shuffle
(
index
)
# Shuffle the indices
k_set
=
np
.
split
(
index
,
k_fold
)
# Split the indices into 'k_fold'
return
k_set
K- Fold Cross Validation (10 Points)
Let's now implement
-fold cross validation. k
In [123]:
def
k_fold_cv
(
k_fold
,
train_X
,
train_Y
,
lamb_d
,
d
):
'''
Function to implement k-fold cross validation.
Inputs:
k_fold: number of validation subsests
train_X: training data of dimensions (m, 1)
train_Y: ground truth training labels
lamb_d: ridge regularization lambda parameter
d: polynomial degree
Outputs:
rmse_list: list of root mean square errors (RMSE) for k_folds
'''
index
=
np
.
arange
(
train_X
.
shape
[
0
])
# indices of the training data
k_set
=
k_val_ind
(
index
,
k_fold
)
# pre-defined function to shuffle and split indices
Phi_X
=
poly_transform
(
train_X
,
d
)
#transform all the data to (m,(d+1)) dimensions
rmse_list
=
[]
for
i
in
range
(
k_fold
):
ind
=
np
.
zeros
(
train_X
.
shape
[
0
],
dtype
=
bool
)
# binary mask
ind
[
k_set
[
i
]]
=
True
# validation portion is indicated
#Note: Eg. train_X[ind] -> validation set, train_X[~ind] -> training set
# Write your answer inside the 'for' loop
# Note: Phi_X[~ind,:] is training subset and Phi_X[ind,:] is validation subset. Similary for the train and validation labels.
# Step 1: Estimate the theta parameter using ridge_reg_fit with the training subset, training labels and lamb_d
# Step 2: Estimate the prediction Y_pred over the validation as a dot product over Phi_X[ind,:] and theta
# Step 3: use 'get_rmse' function to determine rmse using Y_pred and train_Y[ind]
# your code here
theta
=
ridge_reg_fit
(
Phi_X
[
~
ind
,:],
train_Y
[
~
ind
,:],
lamb_d
)
Y_pred
=
np
.
dot
(
Phi_X
[
ind
,:],
theta
)
rmse
=
get_rmse
(
Y_pred
,
train_Y
[
ind
])
rmse_list
.
append
(
rmse
)
return
rmse_list
In [124]:
# Contains hidden tests
np
.
random
.
seed
(
1
)
m1
=
20
;
d1
=
5
;
k_fold_t
=
5
# number of portions to split the training data
lamb_d_t
=
0.1
X_t
=
np
.
random
.
randn
(
m1
,
1
)
Y_t
=
np
.
random
.
randn
(
m1
,
1
)
rmse_list_est
=
k_fold_cv
(
k_fold_t
,
X_t
,
Y_t
,
lamb_d_t
,
d1
)
Let us select the value of
that provides the lowest error based on RMSE returned by the 'k_fold_cv' function.
In this example, we will choose the best value of
among 6 values.
λ
λ
In [125]:
k_fold
=
5
l_range
=
[
0
,
1e-3
,
1e-2
,
1e-1
,
1
,
10
]
# The set of lamb_d parameters used for validation.
th
=
float
(
'inf'
)
for
lamb_d
in
l_range
:
print
(
'lambda:'
+
str
(
lamb_d
))
rmse
=
k_fold_cv
(
k_fold
,
train_X
,
train_Y
,
lamb_d
,
d
)
print
(
"RMSE: "
,
rmse
)
print
(
"*************"
)
mean_rmse
=
np
.
mean
(
rmse
)
if
mean_rmse
<
th
:
th
=
mean_rmse
l_best
=
lamb_d
print
(
"Best value for the regularization parameter(lamb_d):"
,
l_best
)
Evaluation on the Test Set
As discussed in previous section, we will present the final evaluation of the model based on the test set.
In [126]:
lamb_d
=
l_best
# Step 1: Create Phi_X using 'poly_transform(.)' on the train_X and d=20
# Step 2: Estimate theta using ridge_reg_fit(.) with Phi_X, train_Y and the best lambda
# Step 3: Create Phi_X_test using 'poly_transform(.)' on the test_X and d=20
# Step 4: Estimate the Y_Pred for the test data using Phi_X_test and theta
# Step 5: Estimate rmse using get_rmse(.) on the Y_Pred and test_Y
# your code here
d
=
20
Phi_X
=
poly_transform
(
train_X
,
d
)
theta
=
ridge_reg_fit
(
Phi_X
,
train_Y
,
lamb_d
)
Phi_X_test
=
poly_transform
(
test_X
,
d
)
Y_pred
=
np
.
dot
(
Phi_X_test
,
theta
)
rmse
=
get_rmse
(
Y_pred
,
test_Y
)
print
(
"RMSE on test set is "
+
str
(
rmse
))
In [127]:
# Contains hidden tests checking for rmse < 0.5
Let's visualize the model's prediction on the test data set.
In [128]:
print
(
'Test RMSE = '
,
rmse
)
%
matplotlib
inline
plt
.
scatter
(
test_X
,
test_Y
,
marker
=
'o'
,
s
=
4
)
# Sampling more points to plot a smooth curve
px
=
np
.
linspace
(
-
2
,
2
,
100
)
.
reshape
(
-
1
,
1
)
PX
=
poly_transform
(
px
,
d
)
py
=
np
.
dot
(
PX
,
theta
)
plt
.
xlabel
(
'x'
)
plt
.
ylabel
(
'y'
)
plt
.
ylim
(
-
2
,
3
)
plt
.
plot
(
px
,
py
,
color
=
'red'
);
You have completed linear ridge regression and estimated the best value for the regularization parameter
using k-fold cross validation. λ
Section II - Logistic Regression
Machine learning is used in medicine for assisting doctors with crucial decision-making based on dignostic data. In this assignment we will be designing alogistic regression model (single layer neural network) to predict if a subject is diabetic or not. The model will classify the subjects into two groups diabetic(Class 1) or non-diabetic (Class 0) - a binary classification model.
We will be using the 'Pima Indians Diabetes dataset' to train our model which contains different clinical parameters (features) for multiple subjects along withthe label (diabetic or not-diabetic). Each subject is represented by 8 features (Pregnancies, Glucose, Blood-Pressure, SkinThickness, Insulin, BMI, Diabetes-Pedigree-Function, Age) and the 'Outcome' which is the class label. The dataset contains the results from 768 subjects.
We will be spliting the dataset into train and test data. We will train our model on the train data and predict the categories on the test data.
In [717]:
#importing a few libraries
import
numpy
as
np
from
datasets
import
pima_data
import
sys
import
matplotlib.pyplot
as
plt
import
numpy.testing
as
npt
1. Load Data, Visualize and Normalize
Let us load the training and test data.
In [718]:
train_X
,
train_Y
,
test_X
,
test_Y
=
pima_data
()
print
(
'train_X.shape = '
,
train_X
.
shape
)
print
(
'train_Y.shape = '
,
train_Y
.
shape
)
print
(
'test_X.shape = '
,
test_X
.
shape
)
print
(
'test_Y.shape = '
,
test_Y
.
shape
)
# Lets examine the data
print
(
'
\n
Few Train data examples'
)
print
(
train_X
[:
5
,
:])
print
(
'
\n
Few Train data labels'
)
print
(
train_Y
[:
5
])
In [719]:
# We notice the data is not normalized. Lets do a simple normalization scaling to data between 0 and 1
# Normalized data is easier to train using large learning rates
train_X
=
np
.
nan_to_num
(
train_X
/
train_X
.
max
(
axis
=
0
))
test_X
=
np
.
nan_to_num
(
test_X
/
test_X
.
max
(
axis
=
0
))
#Lets reshape the data so it matches our notation from the lecture.
#train_X should be (d, m) and train_Y should (1,m) similarly for test_X and test_Y
train_X
=
train_X
.
T
train_Y
=
train_Y
.
reshape
(
1
,
-
1
)
test_X
=
test_X
.
T
test_Y
=
test_Y
.
reshape
(
1
,
-
1
)
print
(
'train_X.shape = '
,
train_X
.
shape
)
print
(
'train_Y.shape = '
,
train_Y
.
shape
)
print
(
'test_X.shape = '
,
test_X
.
shape
)
print
(
'test_Y.shape = '
,
test_Y
.
shape
)
# Lets examine the data and verify it is normalized
print
(
'
\n
Few Train data examples'
)
print
(
train_X
[:,
:
5
])
print
(
'
\n
Few Train data labels'
)
print
(
train_Y
[
0
,:
5
])
In [720]:
#There are 8 features for each of the data points. Lets plot the data using a couple of features
fig
,
ax
=
plt
.
subplots
()
plt
.
scatter
(
train_X
[
6
,:],
train_X
[
7
,:],
c
=
train_Y
[
0
])
plt
.
xlabel
(
'Diabetes-Pedigree-Function'
)
plt
.
ylabel
(
'Age'
)
plt
.
show
();
# We have plotted train_X[6,:],train_X[7,:].
# Feel free to insert your own cells to plot and visualize different variable pairs.
2. Quick Review of the Steps Involved in Logistic Regression Using Gradient Descent.
1.
Training data
is of dimensions
where
is number of features and
is number of samples. Training labels
is of dimensions
.
2.
Initilaize logistic regression model parameters
and
where
is of dimensions
and
is a scalar.
is initialized to small random values and
isset to zero
3.
Calculate
using
and intial parameter values
4.
Apply the sigmoid activation to estimate
on
,
5.
Calculate the loss
between predicted probabilities
and groundtruth labels
,
6.
Calculate gradient dZ (or
),
7.
Calculate gradients
represented by
,
represented by
8.
Adjust the model parameters using the gradients. Here
is the learning rate.
9.
Loop until the loss converges or for a fixed number of epochs. We will first define the functions
logistic_loss()
and
grad_fn()
along with other functionsbelow.
X (d×m) d m Y (1×m)
w b w (d,1) b w b
Z X (w, b)
Z = w⊤X+b
A Z
A =
1
1+exp(−Z)
L() A Y
loss = logistic_loss(A,Y )
dL
dZ
dZ = (A−Y )
dL
dw
dw dL
db
db
dw,db = grad_fn(X,dZ)
α
w := w−α.dw
b := b−α.db
Review
Lecture Notes
Intialize Parameters
we will initialize the model parameters. The weights will be initialized with small random values and bias as 0. While the bias will be a scalar, the dimension ofweight vector will be
, where
is the number of features.
Hint:
np.random.randn
can be used here to create a vector of random integers of desired shape.
(d×1) d
In [721]:
def
initialize
(
d
,
seed
=
1
):
'''
Function to initialize the parameters for the logisitic regression model
Inputs:
d: number of features for every data point
seed: random generator seed for reproducing the results
Outputs:
w: weight vector of dimensions (d, 1)
b: scalar bias value
'''
np
.
random
.
seed
(
seed
)
# NOTE: initialize w to be a (d,1) column vector instead of (d,) vector
# Hint: initialize w to a random vector with small values. For example, 0.01*np.random.randn(.) can be used.
# and initialize b to scalar 0
# your code here
w
=
0.01
*
np
.
random
.
randn
(
d
,
1
)
b
=
0
return
w
,
b
In [722]:
# Contains hidden tests
Sigmoid Function
Let's now implement Sigmoid activation function.
where z is in the input variable.
Hint:
numpy.exp
can be used for defining the exponential function.
σ(z) =
1
1+exp(−z)
In [723]:
def
sigmoid
(
z
):
# your code here
A
=
1.
/
(
1
+
np
.
exp
(
-
1
*
z
))
return
A
In [724]:
# Contains hidden tests
np
.
random
.
seed
(
1
)
d
=
2
m1
=
5
X_t
=
np
.
random
.
randn
(
d
,
m1
)
Logistic Loss Function
We will define the objective function that will be used later for determining the loss between the model prediction and groundtruth labels. We will use vectors
(activation output of the logistic neuron) and
(groundtruth labels) for defining the loss.
where
is the number of input datapoints and is used for averaging the total loss.
Hint:
numpy.sum
and
numpy.log
.
A
Y
L(A,Y ) = − log +(1− )log(1− )
1
mΣ
i=1
m
y(i) a(i) y(i) a(i)
m
In [725]:
def
logistic_loss
(
A
,
Y
):
'''
Function to calculate the logistic loss given the predictions and the targets.
Inputs:
A: Estimated prediction values, A is of dimension (1, m)
Y: groundtruth labels, Y is of dimension (1, m)
Outputs:
loss: logistic loss
'''
m
=
A
.
shape
[
1
]
# your code here
calc
=
np
.
dot
(
np
.
log
(
A
),
np
.
transpose
(
Y
))
+
np
.
dot
(
np
.
log
(
1
-
A
),
np
.
transpose
(
1
-
Y
))
sum
=
np
.
sum
(
calc
)
loss
=
(
-
1
/
m
)
*
sum
return
loss
In [726]:
# Contains hidden tests
np
.
random
.
seed
(
1
)
d
=
2
m1
=
10
X_t
=
np
.
random
.
randn
(
d
,
m1
)
Y_t
=
np
.
random
.
rand
(
1
,
m1
)
Y_t
[
Y_t
>
0.5
]
=
1
Y_t
[
Y_t
<=
0.5
]
=
0
Gradient Function
Let us define the gradient function for calculating the gradients (
). We will use it during gradient descent.
The gradients can be calculated as,
Instead of
, we will use dZ (or
) since,
Make sure the gradients are of correct dimensions. Refer to lecture for more information.
Hint:
numpy.dot
and
numpy.sum
. Check use of 'keepdims' parameter.
dL ,
dw
dL
db
dw = X(A−Y
1
m
)T
db = ( − )
1
mΣ
i=1
m
a(i) y(i)
(A−Y ) dL
dZ
dZ = (A−Y )
In [727]:
def
grad_fn
(
X
,
dZ
):
'''
Function to calculate the gradients of weights (dw) and biases (db) w.r.t the objective function L.
Inputs:
X: training data of dimensions (d, m)
dZ: gradient dL/dZ where L is the logistic loss and Z = w^T*X+b is the input to the sigmoid activation function
dZ is of dimensions (1, m)
outputs:
dw: gradient dL/dw - gradient of the weight w.r.t. the logistic loss. It is of dimensions (d,1)
db: gradient dL/db - gradient of the bias w.r.t. the logistic loss. It is a scalar
'''
m
=
X
.
shape
[
1
]
# your code here
dw
=
1
/
m
*
np
.
dot
(
X
,
np
.
transpose
(
dZ
))
db
=
1
/
m
*
np
.
sum
(
dZ
)
return
dw
,
db
In [728]:
# Contains hidden tests
np
.
random
.
seed
(
1
)
d
=
2
m1
=
10
X_t
=
np
.
random
.
randn
(
d
,
m1
)
Y_t
=
np
.
random
.
rand
(
1
,
m1
)
Y_t
[
Y_t
>
0.5
]
=
1
Y_t
[
Y_t
<=
0.5
]
=
0
Training the Model
We will now implement the steps for gradient descent discussed earlier.
In [729]:
def
model_fit
(
w
,
b
,
X
,
Y
,
alpha
,
n_epochs
,
log
=
False
):
'''
Function to fit a logistic model with the parameters w,b to the training data with labels X and Y.
Inputs:
w: weight vector of dimensions (d, 1)
b: scalar bias value
X: training data of dimensions (d, m)
Y: training data labels of dimensions (1, m)
alpha: learning rate
n_epochs: number of epochs to train the model
Outputs:
params: a dictionary to hold parameters w and b
losses: a list train loss at every epoch
'''
losses
=
[]
for
epoch
in
range
(
n_epochs
):
# Implement the steps in the logistic regression using the functions defined earlier.
# For each iteration of the for loop
# Step 1: Calculate output Z = w.T*X + b
# Step 2: Apply sigmoid activation: A = sigmoid(Z)
# Step 3: Calculate loss = logistic_loss(.) between predicted values A and groundtruth labels Y
# Step 4: Estimate gradient dZ = A-Y
# Step 5: Estimate gradients dw and db using grad_fn(.).
# Step 6: Update parameters w and b using gradients dw, db and learning rate
# w = w - alpha * dw
# b = b - alpha * db
# your code here
Z
=
np
.
dot
(
np
.
transpose
(
w
),
X
)
+
b
A
=
sigmoid
(
Z
)
loss
=
logistic_loss
(
A
,
Y
)
dZ
=
A
-
Y
dw
,
db
=
grad_fn
(
X
,
dZ
)
w
=
w
-
alpha
*
dw
b
=
b
-
alpha
*
db
if
epoch
%
100
== 0:
losses
.
append
(
loss
)
if
log
==
True
:
print
(
"After
%i
iterations, Loss =
%f
"
%
(
epoch
,
loss
))
params
=
{
"w"
:
w
,
"b"
:
b
}
return
params
,
losses
In [730]:
# Contains hidden tests
np
.
random
.
seed
(
1
)
d
=
2
m1
=
10
X_t
=
np
.
random
.
randn
(
d
,
m1
)
Y_t
=
np
.
random
.
rand
(
1
,
m1
)
Y_t
[
Y_t
>
0.5
]
=
1
Y_t
[
Y_t
<=
0.5
]
=
0
Model Prediction
Once we have the optimal values of model parameters
, we can determine the accuracy of the model on the test data. (w, b)
Z = w X+b T
A = σ(Z)
In [731]:
def
model_predict
(
params
,
X
,
Y
=
np
.
array
([]),
pred_threshold
=
0.5
):
'''
Function to calculate category predictions on given data and returns the accuracy of the predictions.
Inputs:
params: a dictionary to hold parameters w and b
X: training data of dimensions (d, m)
Y: training data labels of dimensions (1, m). If not provided, the function merely makes predictions on X
outputs:
Y_Pred: Predicted class labels for X. Has dimensions (1, m)
acc: accuracy of prediction over X if Y is provided else, 0
loss: loss of prediction over X if Y is provided else, Inf
'''
w
=
params
[
'w'
]
b
=
params
[
'b'
]
m
=
X
.
shape
[
1
]
# Calculate Z using X, w and b
# Calculate A using the sigmoid - A is the set of (1,m) probabilities
# Calculate the prediction labels Y_Pred of size (1,m) using A and pred_threshold
# When A>pred_threshold Y_Pred is 1 else 0
# your code here
Z
=
np
.
dot
(
np
.
transpose
(
w
),
X
)
+
b
A
=
sigmoid
(
Z
)
Y_Pred
=
np
.
copy
(
A
)
for
i
in
range
(
len
(
A
[
0
,:])):
if
A
[
0
][
i
]
>
pred_threshold
:
Y_Pred
[
0
][
i
]
=
1
else
:
Y_Pred
[
0
][
i
]
=
0
Y_Pred
=
Y_Pred
.
reshape
(
1
,
-
1
)
acc
=
0
loss
=
float
(
'inf'
)
if
Y
.
size
!=
0
:
loss
=
logistic_loss
(
A
,
Y
)
acc
=
np
.
mean
(
Y_Pred
==
Y
)
return
Y_Pred
,
acc
,
loss
In [732]:
# Contains hidden tests
np
.
random
.
seed
(
1
)
d
=
2
m1
=
10
# Test standard
X_t
=
np
.
random
.
randn
(
d
,
m1
)
Y_t
=
np
.
random
.
rand
(
1
,
m1
)
Y_t
[
Y_t
>
0.5
]
=
1
Y_t
[
Y_t
<=
0.5
]
=
0
3. Putting it All Together
We will train our logistic regression model using the data we have loaded and test our predictions on diabetes classification.
In [733]:
#We can use a decently large learning rate becasue the features have been normalized
#When features are not normalized, larger learning rates may cause the learning to oscillate
#and go out of bounds leading to 'nan' errors
#Feel free to adjust the learning rate alpha and the n_epochs to vary the test accuracy
#You should be able to get test accuracy > 70%
#You can go up to 75% to 80% test accuracies as well
alpha
=
0.15
n_epochs
=
8000
# Write code to initialize parameters w and b with initialize(.) (use train_X to get feature dimensions d)
# Use model_fit(.) to estimate the updated 'params' of the logistic regression model and calculate how the 'losses' varies
# Use variables 'params' and 'losses' to store the outputs of model_fit(.)
# your code here
w
,
b
=
initialize
(
np
.
size
(
train_X
,
0
),
seed
=
1
)
params
,
losses
=
model_fit
(
w
,
b
,
train_X
,
train_Y
,
alpha
,
n_epochs
,
log
=
False
)
In [734]:
Y_Pred_tr
,
acc_tr
,
loss_tr
=
model_predict
(
params
,
train_X
,
train_Y
)
Y_Pred_ts
,
acc_ts
,
loss_ts
=
model_predict
(
params
,
test_X
,
test_Y
)
print
(
"Train Accuracy of the model:"
,
acc_tr
)
print
(
"Test Accuracy of the model:"
,
acc_ts
)
import
matplotlib.pyplot
as
plt
%
matplotlib
inline
plt
.
plot
(
losses
)
plt
.
xlabel
(
'Iterations(x100)'
)
plt
.
ylabel
(
'Train loss'
);
In [661]:
# Contains hidden tests testing accuracy of test to be greater than 0.7 with the above parameter settings
Congratulations on completing this week's assignment - building a single leayer neural network for binary classification. In the following weeks, we will learn tobuild and train a multilayer neural network for multi category classification.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
train_X.shape is (300, 1)
train_Y.shape is (300, 1)
test_X.shape is (200, 1)
test_Y.shape is (200, 1)
Train RMSE = 0.5136340617403364
Test RMSE = 0.5037691797614892
lambda:0
RMSE: [0.900555177526016, 0.5995063480546855, 0.48899370047004265, 0.5734994260228984, 0.5778294698629982]
*************
lambda:0.001
RMSE: [0.9254777112353597, 0.6018895272984746, 0.4886770424932113, 0.5708466724857937, 0.5784618729196633]
*************
lambda:0.01
RMSE: [1.0459044884476891, 0.625182224075695, 0.493313775686007, 0.5570647419146361, 0.5899621997377145]
*************
lambda:0.1
RMSE: [0.826147452070392, 0.646524587121386, 0.4903308187338262, 0.5660950349966889, 0.5945668688216352]
*************
lambda:1
RMSE: [0.6799665144970705, 0.6886693542491483, 0.5647357620788945, 0.6393074933827457, 0.6470329335868142]
*************
lambda:10
RMSE: [0.7335261033175889, 0.6993069184592259, 0.7556132543797057, 0.7992608698443789, 0.8199075014141747]
*************
Best value for the regularization parameter(lamb_d): 0.1
RMSE on test set is 0.49850101448090123
Test RMSE = 0.49850101448090123
train_X.shape = (500, 8)
train_Y.shape = (500,)
test_X.shape = (268, 8)
test_Y.shape = (268,)
Few Train data examples
[[6.000e+00 1.480e+02 7.200e+01 3.500e+01 0.000e+00 3.360e+01 6.270e-01
5.000e+01]
[1.000e+00 8.500e+01 6.600e+01 2.900e+01 0.000e+00 2.660e+01 3.510e-01
3.100e+01]
[8.000e+00 1.830e+02 6.400e+01 0.000e+00 0.000e+00 2.330e+01 6.720e-01
3.200e+01]
[1.000e+00 8.900e+01 6.600e+01 2.300e+01 9.400e+01 2.810e+01 1.670e-01
2.100e+01]
[0.000e+00 1.370e+02 4.000e+01 3.500e+01 1.680e+02 4.310e+01 2.288e+00
3.300e+01]]
Few Train data labels
[1. 0. 1. 0. 1.]
train_X.shape = (8, 500)
train_Y.shape = (1, 500)
test_X.shape = (8, 268)
test_Y.shape = (1, 268)
Few Train data examples
[[0.35294118 0.05882353 0.47058824 0.05882353 0. ]
[0.74371859 0.42713568 0.91959799 0.44723618 0.68844221]
[0.59016393 0.54098361 0.52459016 0.54098361 0.32786885]
[0.35353535 0.29292929 0. 0.23232323 0.35353535]
[0. 0. 0. 0.11111111 0.19858156]
[0.50074516 0.39642325 0.34724292 0.41877794 0.64232489]
[0.25909091 0.14504132 0.27768595 0.06900826 0.94545455]
[0.61728395 0.38271605 0.39506173 0.25925926 0.40740741]]
Few Train data labels
[1. 0. 1. 0. 1.]
Train Accuracy of the model: 0.782
Test Accuracy of the model: 0.7201492537313433