$35
● Unsupervised anomaly detection in computer vision: Whether a machine learning model is able to tell a testing image is of the same class (distribution) as the training images
● Unsupervised anomaly detection in computer vision: Whether a machine learning model is able to tell a testing image is of the same class (distribution) as the training images
Training
Model
● Unsupervised anomaly detection in computer vision: Whether a machine learning model is able to tell a testing image is of the same class (distribution) as the training images
Training Testing
Model
Model
Seen
Normal
● Unsupervised anomaly detection in computer vision: Whether a machine learning model is able to tell a testing image is of the same class (distribution) as the training images
Training Testing
Model
Model
Model
Seen
Normal
Unseen
Anomaly
Data
● Trainingset: About 140k human faces (size 64*64*3)
● Testingset: Another 10k data from the same distributions as the trainingset (normal data, of class label 0) along with 10k human face images from the other distributions (anomalies, of class label 1)
● Notice: Additional training data and pretrained models are prohibited
● Data format: tar zxvf data-bin.tar.gz
● data-bin/
○ trainingset.npy
○ testingset.npy
Method - Autoencoder
Autoencoder
● When to stop training? Training should stop when the mse loss converges
● During inference, we calculate the reconstruction error between the input image and the reconstructed one
● The reconstruction error will be referred to as abnormality (anomaly score)
● The abnormality of an image can be a metric of the possibility that it’s distribution is unseen during training
● Therefore we use the abnormality as our predicted values
Accuracy score
● Usually we compute accuracy scores for classification tasks
● Here, our model functions as a sensor (or a detector) rather than a classifier
● Thus, we need a threshold with respect to abnormality (usually the reconstruction error) to determine whether a piece of data is an anomaly
● If we used accuracy score for this assignment, you would have to try every possible threshold for one single model to get a satisfactory score
● However, what we want is a sensor that gets the highest accuracy on the average of every possible threshold
Which sensor is better?
Metric - ROC_AUC score
● A good sensor should
○ Give high anomaly scores to the anomalies and low scores to the normal data
○ Exhibit a large gap between the scores of 2 groups
● An ROC is suitable for our task
● Each point on the ROC curve stands for true positive rate and false positive rate at certain threshold
● The Area Under the ROC curve is calculated to measure the general ability of the model
ROC_AUC score
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
Kaggle
Metric: ROC_AUC score Sample output:
ID
Anomaly score
Label
0
11383
0
1
256676
1
2
862365
1
3
152435
0
4
848171
0
ID
Anomaly score
Label
2
862365
1
4
848171
0
1
256676
1
3
152435
0
0
11383
0
Sort by score
https://towardsdatascience.com/how-to-calculate-use-the-auc-score-1fc85c9a8430
ID
Anomaly score
Label
fp before normalization
tp before normalization
2
862365
1
0
1
4
848171
0
1
1
1
256676
1
1
2
3
152435
0
2
2
0
11383
0
3
2
ID
Anomaly score
Label
fp
tp
0
11383
0
0
0.5
3
152435
0
0.333333
0.5
1
256676
1
0.333333
1
4
848171
0
0.666667
1
2
862365
1
1
1
Area Under Curve: 0.5*⅓ + ⅔ = 0.8333
Scoring
● Code submission: 4 pt
● Baselines 6 pt (3 pt for the public ones and the other 3 pt for the private ones)
○ Simple public: 1 pt (public score: 0.64046)
○ Medium public: 1 pt (public score: 0.75719)
○ Strong public: 0.5 pt (public score: 0.81304)
○ Boss public: 0.5 pt (public score: 0.86590)
○ Simple private: 1 pt
○ Medium private: 1 pt
○ Strong private: 0.5 pt
○ Boss private: 0.5 pt
● Bonus for submitting report: 0.5 pt
Bonus
● If you succeed in beating both boss baselines, you can get extra 0.5 pt by submitting a brief report to explain your methods (in less than 100 English words), which will be made public to the whole class ● Report Template
Baseline guides
● Simple
○ FCN autoencoder
● Medium
○ CNN autoencoder
○ Try smaller models (less layers)
○ Smaller batch size
● Strong
○ Add BatchNorm
○ Train for longer
● Boss:
○ Add an extra classifier
○ Sample random noises as anomaly images
○ Or one-class-classification (OCC) with GANs: OCGAN, End-to-end OCC, paper pool for Anomaly Detection
Baseline training statistics
● Simple
○ Number of parameters: 3176419
○ Training time on colab: ~ 30 min
● Medium
○ Number of parameters: 47355 ○ Training time on colab: ~ 30 min
● Strong
○ Number of parameters: 47595 ○ Training time on colab: 4 ~ 5 hrs
● Boss:
○ Number of parameters: 4364140
○ Training time on colab: 1.5~3 hrs
Strong baseline training curve