Assignment 2
• It is your responsibility to make sure that all code and other deliverables are in the correct format and that your submission compiles and runs. We will not manually check your code (this is not feasible given the class size). Thus, non-runnable code in our test environment will directly lead to a score of 0. Also, your entire programming parts will NOT be graded and given a 0 score if your code prints out anything that is not asked in each question.
Theory Problem Set
You must show your work for full credit
1. The convolution layer inside of a CNN is intrinsically an affine transformation: A vector is received as input and is multiplied with a matrix to produce an output (to which a bias vector is usually added before passing the result through a nonlinearity). This operation can be represented as y = Ax, in which A describes the affine transformation.
We will first revisit the convolution layer as discussed in the class. Consider a convolution layer with a 3x3 kernel W, operated on a single input channel X, represented as:
(1)
Using this example, let us work out a stride-4 convolution layer, with zero padding size of 2. Consider ‘flattening’ the input tensor X in row-major order as:
(2)
Write down this convolution as a matrix operation A such that: Y = AX. Output Y is also flattened in row-major order. NOTE: For this problem we are referring to a convolution in the deep learning context (i.e. do not flip the kernel).
2. Consider a specific 2 hidden layer ReLU network with inputs x ∈ R, 1 dimensional outputs, and 2 neurons per hidden layer. This function is given by
h(x) = W(3) max{0,W(2) max{0,W(1)x +~b(1)} + b(2)} + b(3) (3)
where the max is element-wise, with weights:
(4)
(5)
(6)
(7)
(8)
(9)
An interesting property of networks with piece-wise linear activations like the ReLU is that on the whole they compute piece-wise linear functions. At each of the following points x = xo, determine the value of the new weight W ∈ R and bias b ∈ R such that and
Wxo + b = h(xo).
xo = 2 (10)
xo = −1 (11)
xo = 1 (12)
Paper Review
In this section, you must choose one of the papers below and complete the following:
1. provide a short review of the paper
2. answer paper-specific questions
Guidelines: Please restrict your reviews to no more than 350 words and answers to questions to no more than 350 words per question. The review part (1) should include answers to the following:
3. What is the main contribution of this paper? In other words, briefly summarize its key insights. What are some strengths and weaknesses of this paper?
Paper Choice 1:
The first of our paper reviews for this module comes from recent work that discusses biases induced by the convolutional neural network. It turns out that CNNs and humans have different biases, but there might be methods to force the network to learn biases that better reflect ones that humans have.
The paper can be viewed here.
Questions for this paper:
• Why would we care about the biases of the neural network, even if it can do well on the training set? Should we desire the network to have the same biases as humans or not? Why or why not?
• Why would training on stylized images change the bias of the network, and why might this bias generalize better to datasets with corruptions?
Paper Choice 2:
The second paper is an empirical study of transfer learning, specifically looking at which task (e.g. semantic segmentation, edge detection, depth estimation, etc.) is more transferable to another task. The paper is here and webpage/visualization are here.
Questions for this paper:
• Do the task pairs with stronger arrows (better transfer) make sense in terms of why they would transfer better? Pick one positive pair (with good transfer) and one negative pair (with bad transfer) and conjecture why it might be the case. Note that there are several types of features
in deep learning, including low-level (e.g. edges), mid-level (components), and high-level (abstract concepts and classification layer) that you might reason about.
• What does this say in terms of practical usage of deep learning across tasks? How might we use this information to guess where to transfer from if we have a new target task?
Coding: Implement and train a network on
CIFAR-10 Overview
Convolutional Neural Networks (CNNs) are one of the major advancements in computer vision over the past decade. In this assignment, you will complete a simple CNN architecture from scratch and learn how to implement CNNs with PyTorch, one of the most commonly used deep learning frameworks. You will also run different experiments on imbalanced datasets to evaluate your model and techniques to deal with imbalanced data.
Python and dependencies
In this assignment, we will work with Python 3. If you do not have a python distribution installed yet, we recommend installing Anaconda (or miniconda) with Python 3. We provide environment.yaml (present under ./part1-convnet) which contains a list of libraries needed to set the environment for this assignment. You can use it to create a copy of conda environment. Refer to the users’ manual for more details.
$ conda env create -f environment.yaml
If you already have your own Python development environment, please refer to this file to find necessary libraries, which are used to set the same coding/grading environment.
Code Test
There are two ways (steps) that you can test your implementation:
1. Python Unit Tests: Some public unit tests are provided in the tests/ in the assignment repository. You can test each part of your implementation with these test cases by:
$ python -m unittest tests.<name_of_tests>
1 Implementing CNN from Scratch
You will work in ./part1-convnet for this part of the assignment. Note that vectorization is not a requirement for this part of the assignment.
1.1 Module Implementation
You will now learn how to build CNN from scratch. Typically, a convolutional neural network is composed of several different modules and these modules work together to make the network effective. For each module, you will implement a forward pass (computing forwarding results) and a backward pass (computing gradients). Therefore, your tasks are as follows:
(a) Follow the instructions in the code to complete each module in ./modules. Specifically, modules to be implemented are 2D convolution, 2D Max Pooling, ReLU, and Linear. These will be the building blocks of the full network. The file ./modules/conv_classifier.py ties each of the aforementioned modules together and is the subject of the next section.
1.2 Network Implementation
After finishing each module, it’s time to put things together to form a real convolutional neural network. Your task is:
(a) Follow the instructions in the code to complete a CNN network in ./modules/conv_classifier.py. The network is constructed by a list of module definitions in order and should handle both forward and backward communication between modules.
1.3 Optimizer
You have implemented a simple SGD optimizer in assignment-1. In practice, it is common to use a momentum term in SGD for better convergence. Specifically, we introduce a new velocity term vt and the update rule is as follows:
where β denotes the momentum coefficient and η denotes the learning rate
(a) Follow the instructions in the code to complete SGD with momentum in ./optimizer/sgd.py. Hint: you will need to store and use the velocity from the previous iteration of SGD to compute the new gradient for the current iteration. Feel free to add member variable(s) to achieve this.
You might have noticed that the training process of your implementation can be extremely slow. Therefore, we only want to deliberately overfit the model with a small portion of data to verify whether the model is learning something or not. First, you should download the dataset by
$ cd data
$ sh get_data . sh
$ cd . . /
Microsoft Windows 10 Only C:assignmentfolder> cd data C: assignmentfolder data> get_data . bat C: assignmentfolder data> cd . .
You can then simply run:
$ python train.py which trains a small CNN with only 50 samples in CIFAR-10 dataset. The script will make a plot on the training data only and be sure to include the plot in your report. Your final accuracy should be slightly under 0.9 with the given network in the script.
2 PyTorch
You will work in ./part2-pytorch for this part of the assignment. The main function in main.py contains the major logic of the code and you can run it by
$ python main.py --config configs/<name_of_config_file>.yaml
2.1 Training
The first thing of working with PyTorch is to get yourself familiarized with the basic training step of PyTorch.
1. Complete train and validate functions in main.py.
2.2 PyTorch Model
You will now implement some actual networks with PyTorch. We provide some starter files for you in ./models. The models for you to implement are as follows:
1. Two-Layer Network. This is the same network you have implemented from scratch in assignment 1. You will build the model with two fully connected layers and a sigmoid activation function in between the two layers. Please implement the model as instructed in ./models/twolayer.py
2. Vanilla Convolutional Neural Network. You will build the model with a convolution layer, a ReLU activation, a max-pooling layer, followed by a fully connected layer for classification. Your convolution layer should use 32 output channels, a kernel size of 7 with stride 1 and zero padding. You max-pooling should use a kernel size of 2 and stride of 2. The fully connected layer should have 10 output features.
Please implement the model as instructed in ./models/cnn.py
3. Your Own Network. You are now free to build your own model. Notice that it’s okay for you to borrow some insights from existing well-known networks, however, directly using those networks as-is is NOT allowed. In other words, you have to build your model from scratch, which also means using any sort of pre-trained weights is also NOT allowed. Please implement your model in ./models/my_model.py
We provide you configuration files for these three models respectively. For Two-Layer Network and Vanilla CNN, you need to train the model without modifying the configuration file. The script automatically saves the weights of the best model at the end of training. We will evaluate your implementation by loading your model weights and evaluating the model on CIFAR-10 test data. You should expect the accuracy of Two-Layer Network and Vanilla CNN to be around 0.3 and 0.4 respectively.
3 Data Wrangling
So far we have worked with well-balanced datasets (samples of each class are evenly distributed). However, in practice, datasets are often not balanced. In this section, you will explore the limitation of standard training strategy on this type of dataset. This being an exploration, it is up to you to design experiments or tests to validate these methods are correct and effective.
You will work with an unbalanced version of CIFAR-10 in this section, and you should use the ResNet-32 model in ./models/resnet.py.
3.1 Class-Balanced Focal Loss
NOTE: the CVPR-19 paper uses Sigmoid̲ CB focal loss (section 4). Softmax CB focal loss is not described in the paper, but it is easy to derive from the mentioned papers. You are free to use sigmoid or softmax but be careful with the implementation (there are differences).
4 Deliverables
4.1 Code Submission
4.1.1 Part-1 ConvNet
Simply run bashcollect_submission.sh or collect_submission.bat if running MS Windows 10 in part1-convnet, and upload the zip file to Gradescope (part1-code). The zip file should contain: modules/, optimizer/, trainer.py, and train.py.
4.1.2 Part-2 PyTorch
Simply run bashcollect_submission.sh or collect_submission.bat if running MS Windows 10 in part2-pytorch, and upload the zip file to Gradescope (part2-code). The zip file should contain: configs/, losses/, checkpoints/, models/, and main.py. Make sure you include all configs used for the experiments for focal loss in case we want to verify your results in your report
4.2 Write-up
Please follow the report template in the starter folder to complete your writeup. You will need to explain your design of the network from Section 2.2 and explain your experimental results from the data wrangling section.
Note: Explanations should go into why things work the way they do with proper deep learning theory. If you need more than one slide for a question, you are free to create new slides right after the given one.
You will need to export your report in pdf format and submit under ”Assignment 2 Writeup” in Gradescope. When submitting to Gradescope, make sure you select ALL corresponding slides for each question. Failing to do so will result in -1 point for each incorrectly tagged question, with future assignments having a more severe penalty.