Starting from:

$30

IE684-Lab8 Solved

Binary Classification Problem
In the last lab, you might have recognized that decomposing a problem might help in coming up with optimization procedures to handle large data. In this lab, we will continue with this theme and try to develop procedures which are scalable and achieve reasonably accurate solutions.

Here, we will consider a different problem, namely the binary (or two-class) classification problem in machine learning. The problem is of the following form. For a data set  where xi ∈X ⊆Rd, yi ∈{+1,−1}, we solve:

                                                         .                                                      (1)

Note that we intend to learn a classification rule h : X → {+1,−1} by solving the problem (1). We will use the following prediction rule for a test sample ˆx:

                                                                              h(xˆ) = sign(w⊤xˆ).                                                                            (2)

We will consider the following loss functions:

Lh(yi,w⊤xi) = max{0,1 − yiw⊤xi} (hinge)

Lℓ(yi,w⊤xi) = log(1 + exp(−yiw⊤xi)) (logistic)

Lsh(yi,w⊤xi) = (max{0,1 − yiw⊤xi})2. (squared hinge)

Exercise 0: [R] For an example (x,y) ∈X ×Y , assume z = yw⊤x. Then, note that the loss functions Lh,Lℓ and Lsh can be equivalently written as Gh(z),Gℓ(z),Gsh(z). Write the loss functions Gh(z),Gℓ(z) and Gsh(z) as functions of z. Plot these loss functions Gh(z),Gℓ(z) and Gsh(z) where z takes values on the real line [−∞,∞]. Distinguish the loss functions using different colors. Comment on the behavior of respective loss functions with respect to z.

Exercise 1: Data Preparation
1. Use the following code snippet. Load the iris dataset from scikit-learn package using the following code. We will load the features into the matrix A such that the i-th row of A will contain the features of i-th sample. The label vector will be loaded into y.

(a)    [R] Check the number of classes C and the class label values in iris data. Check if the class labels are from the set {0,1,...,C − 1} or if they are from the set {1,2,...,C}.

(b)    When loading the labels into y do the following:

If the class labels are from the set {0,1,...,C − 1} convert classes 0,2,3,...,C − 1 to −1.

If the class labels are from the set {1,2,...,C} convert classes 2,3,...,C to −1.

Thus, you will have class labels eventually belonging to the set {+1,−1}.

(c)     Note that a shuffled index array indexarr is used in the code. Use this index array to partition the data and labels into train and test splits. In particular, use the first 80% of the indices to create the training data and labels. Use the remaining 20% to create the test data and labels. Store them in the variables train data, train label, test data, test label. import numpy as np

#we will load the iris data from scikit-learn package from sklearn.datasets import load_iris iris = load_iris() #check the shape of iris data print(iris.data.shape)

A = iris.data

#check the shape of iris target print(iris.target.shape)

#How many labels does iris data have?

#C=num_of_classes #print(C) n = iris.data.shape[0] #Number of data points d = iris.data.shape[1] #Dimension of data points

#In the following code, we create a nx1 vector of target labels y = 1.0*np.ones([A.shape[0],]) for i in range(iris.target.shape[0]):

# y[i] = ???? # Convert class labels that are not 1 into -1

#Create an index array indexarr = np.arange(n) #index array np.random.shuffle(indexarr) #shuffle the indices #print(indexarr) #check indexarr after shuffling

#Use the first 80% of indexarr to create the train data and the remaining 20% to create the test data #train_data = ????

#train_label = ????

#test_data = ????

#test_label = ????

 

(d)   Write a python function which implements the prediction rule in eqn. (2). Use the following code template.

 

def predict(w,x):

#return ???

 

(e)    Write a python function which takes as input the model parameter w, data features and labels and returns the accuracy on the data. (Use the predict function).

 

def compute_accuracy(data,labels,model_w): #Use predict function defined above #return ???

 


Exercise 2: An Optimization Algorithm
1.    Note that problem (1) can be written as

n

minf(w) = minXfi(w).              (3) w           w i=1 [R] Find an appropriate choice of fi(w).

2.    Consider the loss function Lh. Write a python module to compute the loss function Lh.

 

 

#return ???

 

4.    Write an expression to compute the gradient (or sub-gradient) of fi(w) for the loss function Lh. Denote the (sub-)gradient by gi(w) = ∇wfi(w). Define a python function to compute the gradient.

 

def compute_grad_loss_h(x,y,model_w):

#return ???

 

5.    Write an optimization algorithm where you pass through the training samples one by one and do the (sub-)gradient updates for each sample. Recall that this is similar to ALG-LAB8. Use the following template.

 

def OPT1(data,label,lambda, num_epochs): t = 1

#initialize w #w = ???

arr = np.arange(data.shape[0]) for epoch in range(num_epochs):

np.random.shuffle(arr) #shuffle every epoch for i in np.nditer(arr): #Pass through the data points # step = ???

# Update w using w <- w - step * g_i (w) t = t+1 if t>1e4: t = 1 return w

 

following tasks:

(a)    [R] Plot the objective function value in every epoch. Use different colors for different λ values.

(b)    [R] Plot the test set accuracy in every epoch. Use different colors for different λ values.

(c)    [R] Plot the train set accuracy in every epoch. Use different colors for different λ values.

(d)   [R] Tabulate the final test set accuracy and train set accuracy for each λ value.

(e)    [R] Explain your observations.

7.    [R] Note that in OPT1, a fixed number of epochs is used. Can you think of some other suitable stopping criterion for terminating OPT1? Implement your stopping criterion and check how it differs from the one in OPT1. Use step=1t and λ which achieved the best test set accuracy in the previous experiment.

8.    [R] Repeat the experiments (with num epochs=1000 and with your modified stopping criterion) for different loss functions Lℓ and Lsh. Explain your observations.

More products