$30
we implemented a neural network using only numpy. We observed that as we added more components to the network (activations, regularization, batchnorm, etc.), the code became very complex and difficult to manage. In this assignment we will start with using the deep learning library, Pytorch that will modularize the components of a network and hide the complexity.
The goal of this assignment is to become familiar with PyTorch and to build a neural network using PyTorch for FashionMnist data for Image classification. We have discussed in the lecture that the size of the model increases exponentially with the addition of fully connected layers. It would be impractical to build a deep neural network with only fully connected layers for images. In this assignment, we will build a Convolutional Neural Network that is the most popular approach to extracting features from an image without too much computation (compared to a fully connected layer).
It will be ideal to solve this assignemnet on a computer with a aviour (reproduce same results) of the model.
Setting up the DataLoaders
In the following cell we will first download and arrange the data. FashionMNIST dataset is already available in the official PyTorch repository. Hence, the following cell checks for the availability of FashionMNIST data and downloads if the data is not available.
The following parts are already written for you to handle the data.
import neccesary pytorch packages for data handling.
We then move the data onto PyTorch tensors.
Next we define the parameters like batch_size for data handling. A different batch_size for test data is used to make sure that number of samples in the test data are perfectly divisible. create dataloaders for training and testing data to iterate.
Visualize a Few Data Samples
In the following cell we first peek into a random batch of images together with labels and visualize them.
Architecture
We implement a Convolutional Neural Network as our model. We make use of the following layers in our model.
a convolution layer for extracting features. batchnorm layer for normalizing the weights in the hidden layers.
ReLU activation function for the non-linearity between layers.
Finally fully connected layers in the end.
Model:
we make use of the following convolutional neural network architecture for our dataset.
convolution layer output_channels-16 kernel_size=3 stride=1 padding-1 batchnormalization layer ReLU activation layer maxpool layer kernel_size=2 stride=2 convolution layer output_channels-32 kernel_size=3 stride=1 padding-1 batchnormalization layer ReLU activation layer maxpool layer kernel_size=2 stride=2 convolution layer output_channels-64 kernel_size=5 stride=1 padding-2 batchnormalization layer ReLU activation layer maxpool layer kernel_size=2 stride=2 fully connected layer - number_of_classes
Build the model
We first define a class called Model nheriting from Pytorch's nn.Module.
In init(constructor), we define all the layers that are used to build the model
Define a forward function for a sequential model that takes in images as input and returns the predictions as output. All the functions are available in the PyTorch package. Read the documentation/source code for a better understanding.
Convolutional layer: https://pytorch.org/docs/stable/nn.html#convolution-layers (https://pytorch.org/docs/stable/nn.html#convolution-layers)
Batchnorm layer: https://pytorch.org/docs/stable/nn.html#normalization-layers (https://pytorch.org/docs/stable/nn.html#normalization-layers)
Activation ReLU: https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity
(https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity)
Maxpooling layer: https://pytorch.org/docs/stable/nn.html#pooling-layers (https://pytorch.org/docs/stable/nn.html#pooling-layers)
Fully connected layer: https://pytorch.org/docs/stable/nn.html#linear-layers (https://pytorch.org/docs/stable/nn.html#linear-layers)
## Run the cell to check the implementation of your model
# The testcase only tests the input and output dimensions of your architecture.
# The only constraints you Model needs to satisfy are:
# The Model object is initialized by providing num_classes as input # The network takes input Tensors of dimensions (B,1,28,28), where B is arbitrary batch_size,
# 1 is the number of channels in the grayscale image and 28 is image size
# The output of the network is Tensor of dimensions (B,10) where 10 is the num_classes
model = Model(num_classes=10) test_input1 = torch.randn(16,1,28,28) out1 = model(test_input1) test_input2 = torch.rand(20,1,28,28) out2 = model(test_input2)
#hidden tests follow
Initialize the CNN Model
Define a loss criterion, In this assignment we will use cross-entropy loss between the predictions and ground truth to estimate the loss.
CrossEntropyLoss - https://pytorch.org/docs/stable/nn.html#crossentropyloss (https://pytorch.org/docs/stable/nn.html#crossentropyloss)
We also define a optimization strategy to update the weights. In this assignment we use the most commonly used Adam optimizer from the PyTorch package.
Adam - https://pytorch.org/docs/stable/optim.html#algorithms (https://pytorch.org/docs/stable/optim.html#algorithms)
(bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU()
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(fc): Linear(in_features=576, out_features=10, bias=True) )
In [8]:
Training the Model
The training loop is setup in the following way:
For every batch in the defined number of epochs
Move the images and labels to the gpu by checking is_cuda Extract output by passing images through the model pass the output and ground truth to the loss criterion for batch loss clear the gradients backpropagate (compute gradients w.r.t the parameters) using backward() update the parameters with a single optimization step update the training loss for plots repeat In [9]:
Epoch 0/5, itr = 200, Train Loss = 0.461, LR = 8.178E-03
Epoch 0/5, itr = 300, Train Loss = 0.298, LR = 7.400E-03
Epoch 0/5, itr = 400, Train Loss = 0.399, LR = 6.695E-03
Epoch 0/5, itr = 500, Train Loss = 0.359, LR = 6.058E-03
Epoch 0/5, itr = 600, Train Loss = 0.292, LR = 5.481E-03
Epoch 0/5, itr = 700, Train Loss = 0.409, LR = 4.959E-03
------------------------------------------------
Epoch 0/5, itr = 0, Val Loss = 0.291, LR = 4.491E-03
Epoch 0/5, itr = 100, Val Loss = 0.188, LR = 4.491E-03
################################################
Epoch 1/5, itr = 0, Train Loss = 0.446, LR = 4.487E-03
Epoch 1/5, itr = 100, Train Loss = 0.352, LR = 4.060E-03
Epoch 1/5, itr = 200, Train Loss = 0.372, LR = 3.673E-03
Epoch 1/5, itr = 300, Train Loss = 0.256, LR = 3.324E-03
Epoch 1/5, itr = 400, Train Loss = 0.291, LR = 3.007E-03
Epoch 1/5, itr = 500, Train Loss = 0.217, LR = 2.721E-03
Epoch 1/5, itr = 600, Train Loss = 0.410, LR = 2.462E-03
Epoch 1/5, itr = 700, Train Loss = 0.169, LR = 2.227E-03
------------------------------------------------
Epoch 1/5, itr = 0, Val Loss = 0.266, LR = 2.017E-03
Epoch 1/5, itr = 100, Val Loss = 0.177, LR = 2.017E-03
################################################
Epoch 2/5, itr = 0, Train Loss = 0.150, LR = 2.015E-03
Epoch 2/5, itr = 100, Train Loss = 0.166, LR = 1.823E-03
Epoch 2/5, itr = 200, Train Loss = 0.146, LR = 1.650E-03
Epoch 2/5, itr = 300, Train Loss = 0.270, LR = 1.493E-03
Epoch 2/5, itr = 400, Train Loss = 0.125, LR = 1.351E-03
Epoch 2/5, itr = 500, Train Loss = 0.298, LR = 1.222E-03
Epoch 2/5, itr = 600, Train Loss = 0.160, LR = 1.106E-03
Epoch 2/5, itr = 700, Train Loss = 0.303, LR = 1.000E-03
------------------------------------------------
Epoch 2/5, itr = 0, Val Loss = 0.117, LR = 9.061E-04
Epoch 2/5, itr = 100, Val Loss = 0.268, LR = 9.061E-04
################################################
Epoch 3/5, itr = 0, Train Loss = 0.127, LR = 9.052E-04
Epoch 3/5, itr = 100, Train Loss = 0.118, LR = 8.190E-04
Epoch 3/5, itr = 200, Train Loss = 0.139, LR = 7.410E-04
Epoch 3/5, itr = 300, Train Loss = 0.165, LR = 6.705E-04
Epoch 3/5, itr = 400, Train Loss = 0.260, LR = 6.066E-04
Epoch 3/5, itr = 500, Train Loss = 0.133, LR = 5.489E-04
Epoch 3/5, itr = 600, Train Loss = 0.197, LR = 4.966E-04
Epoch 3/5, itr = 700, Train Loss = 0.137, LR = 4.493E-04
------------------------------------------------
Epoch 3/5, itr = 0, Val Loss = 0.121, LR = 4.070E-04
Epoch 3/5, itr = 100, Val Loss = 0.290, LR = 4.070E-04
################################################
Epoch 4/5, itr = 0, Train Loss = 0.084, LR = 4.066E-04
Epoch 4/5, itr = 100, Train Loss = 0.100, LR = 3.679E-04
Epoch 4/5, itr = 200, Train Loss = 0.212, LR = 3.328E-04
Epoch 4/5, itr = 300, Train Loss = 0.319, LR = 3.011E-04
Epoch 4/5, itr = 400, Train Loss = 0.079, LR = 2.725E-04
Epoch 4/5, itr = 500, Train Loss = 0.074, LR = 2.465E-04
Epoch 4/5, itr = 600, Train Loss = 0.126, LR = 2.231E-04
Epoch 4/5, itr = 700, Train Loss = 0.101, LR = 2.018E-04
------------------------------------------------
Epoch 4/5, itr = 0, Val Loss = 0.124, LR = 1.828E-04
Epoch 4/5, itr = 100, Val Loss = 0.227, LR = 1.828E-04
################################################ Time to train in seconds 51.0800986289978
In [12]:
Testing the Classsification
In the testing loop we don't update the weights. The trained model is tested for all the samples in test data to compute the accuracy and observe how well the model is generalizing to the unseen data.
The testing loop is setup in the following way:
For every batch in the testing data
Put the model in the evaluation mode and turn off the gradients Move the images and labels to the device available extract output from the model for the input compute the prediction class by choosing the one with maximum probability in the predictions.
Compare the prediction classes with true classes. calculate accuracy update test_loss for plots repeat