$30
1 Introduction
Single image super resolution (SISR) is a classical image restoration problem which aims to recover a high-resolution (HR) image from the corresponding low-resolution (LR) image.
In this assignment, you’ll need to implement a super resolution convolutional neural network (SRCNN) with PyTorch. We use “Learning a Deep Convolutional Network for Image Super-Resolution” [1] as the basic reference. The basic network architecture and implementation details will be provided in the following sections. In the end, you should submit the source code and the well-trained model after your finish this assignment.
2 Implementation details
2.1 SRCNN
SRCNN uses pairs of LR and HR images to learn the mapping between them. For this purpose, image databases containing LR and HR pairs are created and used as a training set. The learned mapping can be used to predict HR details in a new image.
The SRCNN consists of the following operations:
Preprocessing: Upscales LR image to desired HR size (using bicubic interpolation).
Feature extraction: Extracts a set of feature maps from the upscaled LR image.
Non-linear mapping: Maps the feature maps representing LR to HR patches.
Reconstruction: Produces the HR image from HR patches.
Operations 2–4 above can be cast as a convolutional layer in a CNN that accepts the upscaled images as input, and outputs the HR image. This CNN consists of three convolutional layers:
Layer 1: Patch extraction o 64 filters of size 3 x 9 x 9 (padding=4, stride=1) o Activation function: ReLU o Output: 64 feature maps
Layer 2: Non-linear mapping o 32 filters of size 64 x 1x 1 (padding=0, stride=1) o Activation function: ReLU o Output: 32 feature maps
Layer 3: Reconstruction o 3 filter of size 32 x 5 x 5 (padding=2, stride=1) o Activation function: Identity o Output: HR image
The overall structure of SRCNN is shown in Figure 1.
Figure 1. Network Architecture of SRCNN with upscaling factor=3 In this assignment, you will need to implement a SRCNN with upscaling factor 3 in PyTorch. Let 𝑌𝑌𝜃𝜃(𝑥𝑥) denote this SRCNN model in the following sections.
2.2 Model Training
A typical training framework for a neural network is as follows:
Define the neural network that has some learnable parameters (or weights)
Iterate over a dataset of inputs
Process input through the network
Compute the loss between output and the ground truth (how far is the output from being correct)
Propagate gradients back into the network’s parameters
Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient
The SRCNN is a simple feed-forward neural network. It upscaled the input LR, feeds the upscaled image through several layers one after the other, and then finally gives the output. The overall training procedure of this network is the same as the above framework. To be specific, with PyTorch, the pseudocode of training procedure for SRCNN can be described as follows:
procedure TrainOneEpoch(𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑌𝑌𝜃𝜃,𝑚𝑚𝑜𝑜𝑜𝑜𝑜𝑜𝑚𝑚𝑜𝑜𝑜𝑜𝑚𝑚𝑜𝑜,𝑜𝑜𝑜𝑜𝑡𝑡𝑜𝑜𝑛𝑛𝑡𝑡𝑚𝑚𝑜𝑜) for each (𝐿𝐿𝑅𝑅𝑖𝑖, 𝐻𝐻𝑅𝑅𝑖𝑖) pair in 𝑜𝑜𝑜𝑜𝑡𝑡𝑜𝑜𝑛𝑛𝑡𝑡𝑚𝑚𝑜𝑜 do
zero the gradient buffers of 𝑚𝑚𝑜𝑜𝑜𝑜𝑜𝑜𝑚𝑚𝑜𝑜𝑜𝑜𝑚𝑚𝑜𝑜
compute 𝑚𝑚𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖 = 𝑌𝑌𝜃𝜃(𝐿𝐿𝑅𝑅𝑖𝑖) compute the loss ℓ = 𝒍𝒍𝒍𝒍𝒍𝒍𝒍𝒍_𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒍𝒍𝒇𝒇(𝑚𝑚𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖, 𝐻𝐻𝑅𝑅𝑖𝑖)
back-propagate the gradients from ℓ to the parameters 𝜃𝜃 of model 𝑌𝑌𝜃𝜃 use 𝑚𝑚𝑜𝑜𝑜𝑜𝑜𝑜𝑚𝑚𝑜𝑜𝑜𝑜𝑚𝑚𝑜𝑜 to update the parameters 𝜃𝜃
record the loss for training statistics [optional]
Note that the actual code might differ from the pseudocode. Please check tutorial notes and PyTorch document for related APIs. Besides, we use mean squared error (MSE) as the 𝒍𝒍𝒍𝒍𝒍𝒍𝒍𝒍_𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒇𝒍𝒍𝒇𝒇:
𝑛𝑛
1 2
𝐿𝐿(𝜃𝜃) =‖𝑌𝑌𝜃𝜃(𝐿𝐿𝑅𝑅𝑖𝑖)−𝐻𝐻𝑅𝑅𝑖𝑖‖
𝑛𝑛
𝑖𝑖=1
where 𝑛𝑛 is the number of training samples. This loss functions can be found in PyTorch APIs. Using MSE as the loss function favors a high peak signal-to-noise ratio (PSNR). The PSNR is a widely used metric for quantitatively evaluating image restoration quality and is at least partially related to the perceptual quality. We will also use PSNR (the higher the better) to measure the performance of the trained model. The PSNR related snippets are provided in the skeleton code.
In this assignment, we use 91-Image dataset as our training dataset and Set-5 dataset as the testing dataset. The data related part is provided in the skeleton code.
Other hyperparameters related to training are listed below:
Training epoch=100; one epoch means completing one loop over whole dataset
Optimizer: Adam
Learning rate=0.0001
Training batch size=128; the number of inputs being feed into the network at once
Note that the above hyperparameters might not lead to reasonable performance. You are encouraged to find other possible hyperparameters to achieve better performance.
2.3 Skeleton code usage
2.3.1 Project structure
The skeleton code consists of 6 files:
py: a CLI program, which contains the procedure of model training [to be completed]
py: SRCNN model [need to be completed]
py: dataset related codes
py: helper functions
py: a CLI program, which can super resolve images given a well-trained model
py: submission info [need to be completed]
In this assignment, you are required to implement a SRCNN in PyTorch 1.2+. In order to make the skeleton code functional, you need to complete these three files in the skeleton code: train.py, model.py, info.py.
2.3.2 train.py
The usage of train.py can be describe as follows:
# train the SRCNN model using GPU, set learning rate=0.0005, batch size=256, # make the program train 100 epoches and save a checkpoint every 10 epoches python train.py train --cuda --lr=0.0005 --batch-size=256 --num-epoch=100 --savefreq=10
# train the SRCNN model using CPU, set learning rate=0.001, batch size=128, # make the program train 20 epoches and save a checkpoint every 2 epoches python train.py train --lr=0.001 --batch-size=128 --num-epoch=20 --save-freq=2
# resume training with GPU from "checkpoint.x" with saved hyperparameters python train.py resume checkpoint.x --cuda
# resume training from "checkpoint.x" and override some of saved hyperparameters python train.py resume checkpoint.x --batch-size=16 --num-epoch=200
# inspect "checkpoint.x"
python train.py inspect checkpoint.x
Note that the checkpoint consists of the parameters of a trained model, the state of an optimizer, and the arguments (or hyperparameters) used in current training procedure. Thus, you can use checkpoint to resume training.
2.3.3 super_resolve.py
The usage of super_resolve.py can be describe as follows:
# use the model stored in "checkpoint.x" to super resolve "lr.bmp" python super_resolve.py --checkpoint checkpoint.x lr.bmp
You may use this program to perform qualitative comparison using the images inside the
image_examples.zip file. This file contains LR images, upscaled images with bicubic interpolation, and ground truth (GT) HR images.