$40
Introduction to Machine Learning Program Assignment #4 - Support Vector Machine & ANN
This programming assignment requires you to understand Support Vector Machine and Artificial Neural Networks.
Competition
This homework is held on Kaggle as a competition so that you could see how it works.
Click the link to participate.
The competition provides you a training and a testing set.
training set - train.json
testing set - test.json
Since it’s a competition, you won’t know the answer to the testing set, which is for you to predict and submit.
The standard procedure of a competition:
1. Understand the data
Split the provided training set into training subset and validation set for validation methods.
Preprocessing, model construction, tuning
Retrain the best model with as much data as possible, and predict testing set and make a submission.
Win the competition
If you have any questions, post them in the Discussion section or on Discord so everyone can see and understand.
Objective
Data Input - 5%
Download the training set and testing set from Kaggle.
Data Preprocessing - 15%
Transform data format and shape so your model can process them.
Shuffle the data.
Any data augmentation that can boost your final results. - 10%
Model Construction - 50%
Support Vector Machine - 20%
for SVM model, you may want to try out different types of kernels and compare the result.
Artificial Neural Networks - 30%
for ANN model, you could use any Neural Network based model you want and implement it by yourself.
Every framework (such as TensorFlow or PyTorch) is allowed.
explain the reasoning of your model choice, data augmentation, and training process.
Validation method
Holdout validation with the ratio 7:37:3
Results - 10%
Obtain the performances of all experiment settings in tables by the following metrics:
1. Confusion matrix
Accuracy
Sensitivity(Recall)
Precision
Comparison & Conclusion - 10%
Also some feedback, anything you want to tell me.
Kaggle Submission - 10% (+30%)
After the validation, now you have working SVM and ANN models.
Retrain one of your best models with the whole train.json, predict test.json, and submit your y_test.csv to Kaggle.
You can check sample_submission.csv for the submission format.
Take a screenshot of the Leaderboard, highlight your name, and put it in the report.
Top 10 in the final Private Leaderboard can get 30 bonus scores.
Note that you still need to submit your report and code to the newE3 system.
Data - Recipe Ingredients Dataset
The objective of the competition is to predict the category of a dish’s cuisine given a list of its ingredients.
In the dataset, we include the recipe id, the type of cuisine, and the list of ingredients of each recipe (of variable length). The data is stored in JSON format.
An example of a recipe node in train.json:
· {
· "id": 24717,
· "cuisine": "indian",
· "ingredients": [
· "tumeric",
· "vegetable stock",
· "tomatoes",
· "garam masala",
· "naan",
· "red lentils",
· "red chili peppers",
· "onions",
· "spinach",
· "sweet potatoes"
· ]
· },
In the test file test.json, the format of a recipe is the same as train.json, only the cuisine type is removed, as it is the target variable you are going to predict.
Submission & Scoring Policy
Please submit a zip file, which contains the following, to the newE3 system.
Report
Explanation of how your code works.
All the content mentioned above.
Your name and student ID at the very beginning - 10%
Accept formats: HTML
Source codes
Accept languages: python3
Accept formats: .ipynb
Package-provided models are allowed
Your score will be determined mainly by the submitted report.
if there’s any problem with your code, TA might ask you (through email) to demo it. Otherwise, no demo is needed.
Scores will be adjusted at the end of the semester for them to fit the school regulations.
Plagiarizing is not allowed.
You will get ZERO on that homework if you get caught the first time.
The second time, you’ll FAIL this class.
Tools that might be useful
Jupyter Lab, pre-installed in PC classrooms
Numpy - Math thingy
matplotlib - Plot thingy
pandas - Data thingy
scipy - Science thingy
scikit-learn - Machine Learning and stuff
Neural Network frameworks
TensorFlow
Keras
PyTorch