$25
This homework contains two questions. The first question asks you to follow PyTorch tutorials to learn about PyTorch. In the second question, you use a CNN for few-shot visual counting. The maximum score for this homework is 100 + 15 bonus points.
1 Learning PyTorch (40 points)
In this question, you will follow some tutorials, in order to learn PyTorch. Specifically, you will need to follow https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html and https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html and answer the following questions.
1.1 Tensors and Operations (8 points)
Follow the tutorial and make sure you understand each step: https://pytorch.org/tutorials/ beginner/blitz/tensor_tutorial.html
(a) After you follow this tutorial, construct a randomly initialized tensor x of size (3, 2) and dtype float. Report it on your answer.
(b) Construct another tensor y with the same size as x, but filled with ones. Report it on your answer. Print also its size, using the pytorch function size().
(c) Add the 2 tensors x and y, (1) saving the result on another tensor out and (2) saving the result on y (in-place addition). Report your result for both cases.
(d) Construct a randomly initialized numpy array x of size (3, 2) and dtype float. Convert it to a tensor using the pytorch function from numpy(). Convert it back to a numpy array using the function .numpy().
1.2 Autograd (8 points)
Follow the tutorial and make sure you understand each step: https://pytorch.org/tutorials/ beginner/blitz/autograd_tutorial.html
(a) After you follow this tutorial, construct a randomly initialized tensor x of size (3, 2) and set requires grad=True to track computation with it. Report it on your answer.
(b) Construct another tensor y by multiplying x by 10 and then adding 0.1. Then, take the maximum and save it to the final tensor out. Report all the intermediate results. Explain the grad fn attribute of each intermediate result.
(c)Do backpropagation and compute the gradient . Report it on your answer.
(d) Using the flag “with torch.no grad()”, run again the step (b). Explain what is the difference.
1.3 Neural Networks (10 points)
Follow the tutorial and make sure you understand each step: https://pytorch.org/tutorials/ beginner/blitz/neural_networks_tutorial.html
(a) After you follow this tutorial, define your own network. You are free to use whatever architecture youlike, but it should be different from the one on the tutorial. In addition, your network should have at least one convolutional and one linear layer. Print and report your network on your answers. How many parameters does your network have?
(b) Construct a randomly initialized input of size 32x32. Run the forward function of your network andprint the output. Report the output and its size on your answers.
Figure 1: Few-shot counting task. Given an image from a novel class and a few exemplar objects from the same image delineated by bounding boxes, the objective is to count the total number of objects of the novel class in the image.
(c) Construct a random target output of the same size as your network’s output. Use MSE for the lossand SGD for the optimizer and perform backpropagation. Select one intermediate layer of your network and report its bias gradients after backpropagation.
1.4 Training (6 points)
Follow the tutorial and make sure you understand each step: https://pytorch.org/tutorials/ beginner/blitz/cifar10_tutorial.html
Similarly to the tutorial, train a network on the CIFAR10 dataset. Use the network architecture you defined for Question 1.3. Train it for 2-3 epochs and follow the same steps as in the tutorial. You do not need to train it on a GPU. Report the accuracy on the whole test set, as well as the accuracy per class.
1.5 Transfer Learning (8 points)
Follow the tutorial and make sure you understand each step: https://pytorch.org/tutorials/ beginner/transfer_learning_tutorial.html
(a) Similarly to this tutorial, fine-tune the pre-trained ResNet-18 on a new dataset. Use the CIFAR10 dataset, similarly to Question 1.4. After loading the pre-trained model, change only the last linear layer, in order to output the correct number of classes (Be careful to set the number of classes of CIFAR10). Train it for only 2-3 epochs. You do not need to train it on a GPU. Report the accuracy on the whole test set, as well as the accuracy per class.
(b) Run the same experiment as in (a), but now use the pre-trained model as a feature extractor. Freezeall the network except the final layer. Again, report the accuracy on the whole test set, as well as the accuracy per class.
2 Few Shot Counting (60 points + 15 bonus)
For this question, we will use a CNN for counting objects in images. Given an image from a novel class and a few exemplar objects from the same image delineated by bounding boxes, the objective of few shot counting is to obtain the total number of objects of the novel class in the image, as illustrated in Figure 1. We are providing you with FamNet [1], which is a CNN trained on the training set of FSC147 dataset for few shot visual counting task. There are two ways to use FamNet: 1) we can use the pretrained FamNet to obtain the count for any test image. 2) We can adapt the FamNet to any test image using few bounding boxes from the test image. This adaptation is called test time adaptation. The paper discusses two loss functions for doing the test time adaptation. Read the paper to get a better understanding.
We are also providing you with Validation and Test sets from FSC147 dataset. The Val and Test sets consist of 1286 and 1190 images respectively. We have divided each of the Val and Test sets into two subsets: PartA and PartB. Val-PartA consists of 100 images from the Val set of FSC147 dataset, and ValPartB consists of rest of the 1186 images. Similarly, Test-PartA and Test-PartB consists of 100 and 1090 images respectively from the FSC147 Test set.
Additionally and also different from the paper [1], we are providing you with negative regions for the Val and Test images, which are manually-specified regions without any object of interest. The negative regions are provided as a binary map where a value of 1 is assigned to pixels where there isn’t any object of interest. At locations with value 0, there may or may not be any object of interest. Binary maps for negative areas can be found here: https://bit.ly/3gshsvP
2.1 Reading and comprehension (5 points)
Read the paper [1], and summarize it in one paragraph. Reading the paper will give you a clear understanding of the few shot counting task and how to adapt FamNet to any test image. The paper can be found here: https://www3.cs.stonybrook.edu/˜minhhoai/papers/fewshot_counting_CVPR21.pdf
2.2 Result analysis (10 points)
Setup the Github repo https://github.com/cvlab-stonybrook/LearningToCountEverything. Follow the instructions to install it and download the Val and Test sets. Run the quick demo to see some example results. For this question, you might need a GPU. You can use Google Colab.
Run the evaluation code test.py without test time adaptation on the Val set, and report the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) metrics. Also include a scatter plot for Val set where the X-axis contains the ground truth count, and the Y-axis contains the predicted count. Also, include 5 images and corresponding predicted density maps (output of FamNet) with the highest over-count error (predictedcount - gt-count) in the pdf report. Similarly, include 5 images and corresponding predicted density maps (output of FamNet) with the highest under-count error (gt-count - predicted-count) in the pdf report.
2.3 Adaptation (35 points)
Test time adaptation of FamNet using Negative Strokes: In the paper [1], we adapt FamNet for any test image using few exemplar bounding boxes from the test image. Your task is to use the negative stroke annotation for a test image and come up with a new loss function to adapt FamNet for any test image. You’ll have to tune two hyper-parameters for the test time adaptation: number of gradient descent steps for the adaptation, and the scalar weight to be multiplied with the loss. Use the Val-PartA subset to find the best hyper parameters. Report the MAE and RMSE metrics on Val-PartA. Feel free to use Val-PartB for any finetuning, but it is not required. Also, present a scatter plot on Val-PartA which shows groundtruth count in the X-axis and predicted count on the Y-axis. Include plots for two cases: 1) with test time adaptation using your designed loss function 2) Without any test time adaptation (Hint for designing negative stroke loss: since a value of 1 in the negative stroke map signifies pixels where there isn’t any object of interest, the network should predict 0 density values at such locations. Design a loss to enforce this behavior.)
2.4 Bonus question (maximum 15 points)
Kaggle Submission for Test-PartB: Submit your results to Kaggle on Test-PartB subset. Your csv file should contain your predictions for the Test-PartB, in addition to your predictions for the Test-PartA from Question
2.5 The top three people in the leader board will receive 15, 10, and 5 bonus points.