$30
Binary Classification Problem
In the last lab, we developed an optimization method to solve the optimization problem associated with binary classification problem. In this lab, we will introduce constraints to the optimization problem and try to extend the scheme we developed in the last lab.
Recall that for a data set where xi ∈ X ⊆ Rd, yi ∈ {+1,−1}, we solve:
. (1)
where we considered the following loss functions:
Lh(yi,w⊤xi) = max{0,1 − yiw⊤xi} (hinge)
Lℓ(yi,w⊤xi) = log(1 + exp(−yiw⊤xi)) (logistic)
Lsh(yi,w⊤xi) = (max{0,1 − yiw⊤xi})2. (squared hinge)
Solving the optimization problem (1) facilitates in learning a classification rule h : X → {+1,−1}. We used the following prediction rule for a test sample ˆx:
h(xˆ) = sign(w⊤xˆ). (2)
In the last lab, we used a decomposition of f(w) and solved an equivalent problem of the following form:
. (3)
In this lab, we will consider a constrained variant of the optimization problem (1).
For a data set where xi ∈ X ⊆ Rd, yi ∈ {+1,−1}, we solve:
, s.t. w ∈ C (4)
where C ⊂ Rd is a closed convex set.
Hence we would develop optimization method to solve the following equivalent constrained problem of (4):
. (5)
Exercise 0: Data Preparation
1. Use the following code snippet. Load the wine dataset from scikit-learn package using the following code. We will load the features into the matrix A such that the i-th row of A will contain the features of i-th sample. The label vector will be loaded into y.
(a) [R] Check the number of classes C and the class label values in wine data. Check if the class labels are from the set {0,1,...,C − 1} or if they are from the set {1,2,...,C}.
(b) When loading the labels into y do the following:
If the class labels are from the set {0,1,...,C − 1} convert classes 0,2,3,...,C − 1 to −1.
If the class labels are from the set {1,2,...,C} convert classes 2,3,...,C to −1.
Thus, you will have class labels eventually belonging to the set {+1,−1}.
2. Normalize the columns of A matrix such that each column has entries in range [−1,1].
3. Note that a shuffled index array indexarr is used in the code. Use this index array to partition the data and labels into train and test splits. In particular, use the first 80% of the indices to create the training data and labels. Use the remaining 20% to create the test data and labels. Store them in the variables train data, train label, test data, test label.
import numpy as np
#we will load the wine data from scikit-learn package from sklearn.datasets import load_wine wine = load_wine() #check the shape of wine data print(wine.data.shape)
A = wine.data
#Normalize columns of A so that all entries are in the range [-1,+1] #for i in range(A.shape[1]):
# A[i] = ???
#check the shape of wine target print(wine.target.shape)
#How many labels does wine data have?
#C=num_of_classes #print(C) n = wine.data.shape[0] #Number of data points d = wine.data.shape[1] #Dimension of data points
#In the following code, we create a nx1 vector of target labels y = 1.0*np.ones([A.shape[0],]) for i in range(wine.target.shape[0]):
# y[i] = ???? # Convert class labels that are not 1 into -1
#Create an index array indexarr = np.arange(n) #index array np.random.shuffle(indexarr) #shuffle the indices #print(indexarr) #check indexarr after shuffling
#Use the first 80% of indexarr to create the train data and the remaining 20% to create the test data #train_data = ????
#train_label = ????
#test_data = ????
#test_label = ????
4. Use the python function developed in last lab where you had implemented the prediction rule in eqn. (2).
5. Use the python function developed in the previous lab which takes as input the model parameter w, data features and labels and returns the accuracy on the data.
Exercise 2: An Optimization Algorithm
1. To solve the problem (5), we shall use the following method (denoted by ALG-LAB9). Assume that the training data contains ntrain samples.
(a) For t = 1,2,3,..., do:
i. Sample i uniformly at random from {1,2,...,ntrain}. ii. wt+1 = ProjC(wt − ηt∇fi(wt)).
The notation ProjC(z) = argminu∈C ∥u − z∥2 denotes the orthogonal projection of point z onto set C. In other words, we wish to find a point u⋆ ∈ C which is closest to z in terms of ℓ2 distance. For specific examples of set C, the orthogonal projection has a nice closed form.
2. When C = {w ∈ Rd : ∥w∥∞ ≤ 1}, find an expression for ProjC(z). (Recall: For a w = , we have ∥w∥∞ = max{|w1|,|w2|,...,|wd|}.)
3. Consider the hinge loss function Lh. Use the python modules developed in the last lab to compute the loss function Lh, and objective function value. Also use the modules developed in the last lab to compute the gradient (or sub-gradient) of fi(w) for the loss function Lh. Denote the (sub-)gradient by gi(w) = ∇wfi(w).
4. Define a module to compute the orthogonal projection onto set C.
def compute_orthogonal_projection(z):
#complete the code
5. Modify the code developed in the previous lab to implement ALG-LAB8. Use the following template.
def OPT1(data,label,lambda, num_epochs): t = 1
#initialize w #w = ???
arr = np.arange(data.shape[0]) for epoch in range(num_epochs):
np.random.shuffle(arr) #shuffle every epoch for i in np.nditer(arr): #Pass through the data points # step = ???
# Update w using w <- Proj_C(w - step * g_i (w)) t = t+1 if t>1e4: t = 1 return w
6. In OPT1, use num epochs=500, step=1t. For each λ ∈ {10−3,10−2,0.1,1,10}, perform the following tasks:
(a) [R] Plot the objective function value in every epoch. Use different colors for different λ values.
(b) [R] Plot the test set accuracy in every epoch. Use different colors for different λ values. (c) [R] Plot the train set accuracy in every epoch. Use different colors for different λ values.
(d) [R] Tabulate the final test set accuracy and train set accuracy for each λ value.
(e) [R] Explain your observations.
7. [R] Repeat the experiments (with num epochs=500) for different loss functions Lℓ and Lsh.
Explain your observations.
Exercise 3: A different constraint set
1. When C = {w ∈ Rd : ∥w∥1 ≤ 1}, find an expression for ProjC(z). (Recall: For a w = , we have
2. Consider the hinge loss function Lh. Use the python modules developed in the last lab to compute the loss function Lh, and objective function value. Also use the modules developed in the last lab to compute the gradient (or sub-gradient) of fi(w) for the loss function Lh. Denote the (sub-)gradient by gi(w) = ∇wfi(w).
3. Define a module to compute the orthogonal projection onto set C.
def compute_orthogonal_projection_ex3(z):
#complete the code
4. In OPT1, use num epochs=500, step=1t. For each λ ∈ {10−3,10−2,0.1,1,10}, perform the following tasks:
(a) [R] Plot the objective function value in every epoch. Use different colors for different λ values.
(b) [R] Plot the test set accuracy in every epoch. Use different colors for different λ values. (c) [R] Plot the train set accuracy in every epoch. Use different colors for different λ values.
(d) [R] Tabulate the final test set accuracy and train set accuracy for each λ value.
(e) [R] Explain your observations.
5. [R] Repeat the experiments (with num epochs=500) for different loss functions Lℓ and Lsh.
Explain your observations.