Starting from:

$35

Machine-Learning- HW13: Food Classification Solved

Task Description
Network Compression: Use a small model to simulate the prediction/accuracy of the large model.
In this task, you need to train a very small model to complete HW3, that is, do the classification on the food-11 dataset.
Task - Food Classification AGAIN!

Same as 
Task - Food Classification AGAIN!

The images are collected from the food-11 dataset classified into 11 classes.
The dataset here is slightly modified:
Training set: 280 * 11 labeled images + 6786 unlabeled images
Validation set: 60 * 11 labeled images
Testing set: 3347 images
DO NOT utilize the original dataset or labels.
○ This is cheating.

Intro
There are many types of Network/Model Compression, here we introduce two:
○ Knowledge Distillation: Let the small model learn better by observing the behavior(prediction) of the large model when learning. (in literal: let the small model extract the knowledge out of the big model)

○ Design Architecture: Use fewer parameters to represent the original layer.

(E.g. Normal Convolution → Depthwise & Pointwise Convolution)

○ If you are interested in Network Pruning, you can view colab tutorial in ML-Spring2020-HW7.

Intro - Knowledge Distillation
When training a small model, add some information from the large model (such as the probability distribution of the prediction) to help the small model learn better.
We have provided a well-trained network to help you do knowledge distillation (Acc ~= 0.855).
Please note that you can only use the pre-trained model we provide when writing homework.
Intro - Design Architecture
Depthwise & Pointwise Convolution Layer (Proposed in MobileNet)
○ You can consider the original convolution as a Dense/Linear Layer, but each line/each weight is a filter, and the original multiplication becomes a convolution operation. (input*weight  →  input * filter)

○ Depthwise:  let each channel pass a respective filter first, and let every pixel pass the shared-weight Dense/Linear.

○ Pointwise is a 1x1 Conv.

It is strongly recommended that you use similar techniques to design your model.
(NMkk / Nkk+NM)

Regulations
You should NOT plagiarize, if you use any other resource, you should cite it in the reference. (*)
Do NOT share codes or prediction files with any living creatures.
Do NOT use any approaches to submit your results more than 5 times a day.
Do NOT search or use additional data.
Do NOT search the label or dataset on the Internet.
Do NOT use pre-trained models on any image datasets.
Your final grade x 0.9 if you violate any of the above rules.
Lee & TAs preserve the rights to change the rules & grades.
Special Regulations - 1
Make sure that the total number of parameters of your model should less or equal than 100, 000.Please make sure to follow this rule before submitting kaggle / NTU COOL to prevent anyone from polluting the leaderboard.
○ If you don’t follow this rule, you’ll get 0 point in this assignment.

DO NOT USE TEST DATA FOR PURPOSE OTHER THAN INFERENCING.Because If you use teacher network to predict pseudo-labels of the test data, you can only use student network to overfit these pseudo-labels without train/unlabeled data. In this way, your kaggle accuracy will be as high as the teacher network, but the fact is that you just overfit the test data and your true testing accuracy is very low.
○ These contradict the purpose of these assignment (network compression); therefore, you should not misuse the test data.

○ If you have any concerns, you can email us.

Special Regulations - 2
We strongly recommend that you use the torchsummary package to measure the number of parameters of your model. Note that non-trainable parameters should also be considered.
Ensemble techniques / ( or any other multi-model techniques ) are allowed. But you need to sum all numbers of the parameters and make sure this number is not exceed 100,000.
Grades

More products