Starting from:

$29.99

Deep Learning Homework 5- PyTorch for Classification Solution

In the previous exercises, we have implemented the constituents of a deep learning framework. Now we will use the knowledge we gathered while doing so. We will implement a version of the common convolutional neural network architecture ResNet and the necessary workflow surrounding deep learning algorithms using the open source library PyTorch.
PyTorch allows to build data flow graphs for numerical computations. A graph consists of nodes, which represent mathematical operations (e.g. convolutions), and edges, which represent the data arrays (tensors).
We will use our implementation to detect defects on solar cells. To this end, a dataset containing images of solar cells is provided along with the corresponding labels. We will complement the baseline implementation task with an open classification challenge. During the challenge period, the best results of each team will be listed in an online leader board.
1 Dataset
In this exercise, we focus on two different types of defects (see Fig. 1):
2. Inactive regions: These regions are mainly caused by cracks. It can happen when a part of the cell becomes disconnected and therefore does not contribute to the power production. Hence, the cell performance is decreased.
Of course, the two defect types are related, since an inactive region is often caused by cracks. However, we treat them as independent and only classify a cell as cracked if the cracks are visible (see Fig. 1, right example).

Figure 1: Left: Crack on a polycrystalline module; Middle: Inactive region; Right: Cracks and inactive regions on a monocrystalline module
The images are provided in png-format, and are collected in a zip folder. The filenames and the corresponding labels are listed in the csv-file data.csv. Each row in the csv-file contains the path to an image in our dataset and two numbers indicating if the solar cell shows a “crack” and if the solar cell can be considered “inactive”.
2 Preparation
We provide a skeleton that you will complete during this assignment and a few unit tests as an additional guide line. Furthermore, this skeleton contains a environment.yml file that lists the necessary software packages for a conda environment (see conda documentation).
The computers in the CIP-Pools have already the required packages installed. However, before using PyTorch, one has to first activated via the terminal. E.g. by issuing
module load python3 module load torch
3 Data and Evaluation
When you start working on a new machine learning problem, you have to deal with the format of the given data collection and implement a pipeline to load, preprocess and augment the data.
Task
• Implement a container for our dataset in the file data.py, i.e., a class ChallengeDataset which inherits from the torch.utils.data.Dataset class and provides some basic functionalities. Example code for this can be found here.
• The constructor of ChallengeDataset receives two parameters: First, a pandas.dataframe “data” – a container structure that stores the information found in the file ”data.csv”. The second parameter is a flag “mode” of type String which can be either “val” or
“train”.
Furthermore, create a member “self. transform” of type “tv.transforms.Compose”. It takes a list of torchvision.transforms as a parameter, which should include at least the following: ToPILImage(), ToTensor() and Normalize().
• Overwrite the method len (self). It returns the length of the currently loaded data.
• Overwrite the method getitem (self,index), which returns the sample as a tuple: the image and the corresponding label. Since our raw data is grayscale you need to convert the image to rgb using the skimage.color.gray2rgb(*args) function. Before returning the sample, perform the transformations specified in the transform member. The two return values need to be of type torch.tensor.
Notes
• The Normalize transformation requires the mean and standard deviation of your data. Both are given in the skeleton.
• The Compose object allows to easily perform a chain of transformations on the data. Among other aspects, this is interesting for data augmentation. In the transpose package, you can find different augmentation strategies. Consider creating two different transforms based on whether you are in the training or validation dataset.
• You can test your implementation using the corresponding test in the PytorchChallengeTests.py file.
4 Architecture
In this exercise, you will implement a variant of the ResNet architecture. The details of the architecture are specified in Tab. 1.
The main component of ResNet are blocks that are augmented by skip connections. We name those blocks ResBlock(in channels, out channels, stride). For our variant of ResNet, each ResBlock consists of a sequence of (Conv2D, BatchNorm, ReLU) that is repeated twice. The skip connection is added to the output of the BatchNorm from the second sequence, and then ReLU is applied.
Within the ResBlock, the number of input and output channels for Conv2D is given by the arguments in channels and out channels respectively. The stride of the first Conv2D is given by stride. For the second convolution, no stride is used.
All Conv2D layers within a ResBlock have a filter size of 3.
Finally, the input of the ResBlock is added to the output. Therefore, the size and number of channels needs to be adapted. To this end, we recommend to apply a 1 × 1 convolution to the input with stride and channels set accordingly. Also, we recommend to apply a batchnorm layer before you add the result to the output.
Task
Implement the ResNet architecture according to the specification in Tab. 1 in the file “model.py”. The model should inherit from the base class torch.nn.Module. Overwrite the necessary methods to allow training your model.
Model ResNet Layers

Conv2D(3, 64, 7, 2)
BatchNorm()
ReLU()
MaxPool(3, 2)
ResBlock(64, 64, 1)
ResBlock(64, 128, 2)
ResBlock(128, 256, 2)
ResBlock(256, 512, 2)
GlobalAvgPool()
Flatten()
FC(512, 2)
Sigmoid()
Table 1: Architectural details for our ResNet. Convolutional layers are denoted by
Conv2D(in channels, out channels, filter size, stride). Max pooling is denoted MaxPool(pool size, stride). ResBlock(in channels, out channels, stride) denotes one block within a residual network. Fully connected layers are represented by FC(in features, out features).
5 Training
The training process consists of alternating between training for one epoch on the training dataset (training step) and then assessing the performance on the validation dataset (validation step). After that, a decision is made if the training process should be continued. A common stopping criterion is called EarlyStopping with the following behaviour: If the validation loss does not decrease after a specified number of epochs, then the training process will be stopped. This criterion will be used in our implementation and should be realised in trainer.py.
Task
Implement the class Trainer in “trainer.py” according to the comments.
6 Put it together
Now that we have all of the individual parts of our pipeline ready, we can assemble them in “train.py”. This means, we have to first load our data, split the data into train and test set, create our model and set an optimizer and the loss function, before we can start training.
There are many different classification tasks requiring also different loss-functions. In a multiclass setting, the SoftMax-loss is often used. This loss assumes that the class assignments are mutually exclusive. This however does not hold for our problem, as your classes “crack” and “inactive” are not mutually exclusive.
Task
Make yourself familiar with loss functions that are suitable to use in a multi-label setting. Keep in mind, that some loss-functions perform internally a “sigmoid” operation. However, we already included this in our model. Implement the missing parts in “train.py” according to the comments.
7 Train and tune hyperparameters
At this point, we are able to start training the model. We might need to adjust the hyperparameters to obtain good results.
Task
Train the model and watch the evaluation measures on the validation split. Observe and document how changes in hyperparameters affect the performance. Determine good hyperparameter settings by experiment. Most commonly you start with the default values and alternate for example the learning rate by a change factor of 10. Note that the learning rate and the batch size are highly dependent on each other. For the ratio between training and validation data try to get a validation set which is as small as possible while still being a representative subset.
8 Save model and submit
Debug your implementation until every test in the suite passes. You can run all tests by providing no commandline parameter. To run the unittests you can either execute them with python in the terminal or with the dedicated unittest environment of PyCharm. We recommend the latter one, as it provides a better overview of all tests. For the automated computation of the bonus points achieved in one exercise, run the unittests with the bonus flag in a terminal, with python3 PytorchChallengeTests.py Bonus
or set in PyCharm a new “Python” configuration with Bonus as “Parameters”. Notice, in some cases you need to set your src folder as “Working Directory”. More information about PyCharm configurations can be found here .
Make sure you don’t forget to upload your submission to StudOn. Use the dispatch tool, which checks all files for completeness and zips the files you need for the upload. Try python3 dispatch.py --help to check out the manual. For dispatching your folder run e.g.
python3 dispatch.py -i ./src -o submission.zip
and upload the .zip file to StudOn and create an account for the online leaderboard and submit your model.
With the skeleton, we provide the ability to save a trained model using the save onnx(epoch) function. An example file on how to use it is defined in export onnx.py. In it, a onnx-file is automatically created that can be submitted to the online evaluation server. To make this work, it is required that the in- and outputs of your model have a fixed name. Please do not change the specified names in “trainer.py”.
In order to submit a model, you need to be part of a team. Open the team-page to create a team or join an existing team. Note that only one group member should create the team. The other person can then join the team. If you are working alone, you need to create a one-person team.
Finally, you can start submitting jobs to the test server. Please note that we will do the final evaluation on a second test set that will not be available on the evaluation service during the challenge period. Therefore, you should avoid optimizing your parameters using the feedback from the evaluation server. You should use your validation split for that.

More products