EE569-Homework 5 and CNN Training on LeNet-5 CIFAR10 Classification Solved
Problem 1: CNN Training on LeNet-5 In this problem, you will learn to train a simple convolutional neural network (CNN) called the LeNet-5, introduced by LeCun et al. [1], and apply it to the CIFAR-10 dataset [2]. LeNet-5 is designed for handwritten and machine-printed character recognition. Its architecture is shown in Fig. 1. This network has two conv layers, and three fc layers. Each conv layer is followed by a max pooling layer. Both conv layers accept an input receptive field of spatial size 5x5. The filter numbers of the first and the second conv layers are 6 and 16 respectively. The stride parameter is 1 and no padding is used. The two max pooling layers take an input window size of 2x2, reduce the window size to 1x1 by choosing the maximum value of the four responses. The first two fc layers have 120 and 84 filters, respectively. The last fc layer, the output layer, has size of 10 to match the number of object classes in the CIFAR-10 dataset. Use the popular ReLU activation function [3] for all conv and all fc layers except for the output layer, which uses softmax [4] to compute the probabilities.
Figure 1: A CNN architecture derived from LeNet-5
The CIFAR-10 dataset consists of 60,000 RGB 32x32 pixel images in 10 classes (with 6000 images per class). It includes a labeled training set of 50,000 images and a test set of 10,000 images. Fig. 2 shows some exemplary images from the CIFAR-10 dataset.
Figure 2: CIFAR-10 images
(a) CNN Architecture Explain the architecture and operational mechanism of convolutional neural networks by performing the following tasks.
1. Describe CNN components in your own words: 1) the fully connected layer, 2) the convolutional layer, 3) the max pooling layer, 4) the activation function, and 5) the softmax function. What are the functions of these components?
2. What is the over-fitting issue in model learning? Explain any technique that has been used in CNN training to avoid the over-fitting.
3. Why CNNs work much better than other traditional methods in many computer vision problems? You can use the image classification problem as an example to elaborate your points.
4. Explain the loss function and the classical backpropagation (BP) optimization procedure to train such a convolutional neural network.
Show your understanding as much as possible in your own words in your report.
(b) CIFAR-10 Classification Train the CNN given in Fig. 1 using the 50,000 training images from the CIFAR-10 dataset. You can adopt proper preprocessing techniques and the random network initialization to make your training work easy.
1. Compute the accuracy performance curves using the epoch-accuracy (or iteration-accuracy) plot on training and test datasets separately. Plot the performance curves under 5 different yet representative initial parameter settings (filter weights, learning rate, decay and etc.). Discuss your observations and the effect of different settings.
2. Find the best parameter setting to achieve the highest accuracy on the test set. Then, plot the performance curves for the test set and the training set under this setting.
(c) State-of-the-Art CIFAR-10 Classification Check the state-of-art implementation on CIFAR-10 classification in [5]. Select one paper from the list for discussion.
1. Describe what the authors did to achieve such a result. You do not have to implement the network.
2. Compare the solution with LeNet-5 and discuss pros and cons of the two methods.
You can add pictures, flowcharts, and diagrams in your report. If you do so, you need to cite their sources.
Problem 2: CIFAR10 Classification Feel free in modifying the baseline CNN in Fig. 1 to improve the classification accuracy obtained in Problem 1(b). For example, you can increase the depth of the network by adding more layers, or/and change the number of filters in some layers. You can augment the dataset. You can also try different activation functions or optimization algorithms. They all have a potential to improve the result. You may need to fine-tune the training parameters to get the training job done.
Your score will be determined by three aspects:
• Motivation and logics behind your design : You can draw the diagram of your network architecture and explain in your own words. Describe the training parameter setting to reach the result below. Discuss the sources of performance improvement compared with Problem 1(b).
• Classification accuracy : Report the best accuracy that you can achieve; report the training time and inference time; draw the train and test accuracy performance curves using the epochaccuracy (or iteration-accuracy) plot; draw the test accuracy curve by randomly drop training samples to see how the performance degrades.
• Model size You have limited resources at hand, such as GPU, we don’t want you to waste time on long and complex training process or using pre-trained models on GitHub. For example, using ResNet in this homework is meaningless. Therefore, you are required to compute the model size or parameter numbers that you use, which helps to release your stress on obtaining the accuracy as high as possible.