Starting from:

$30

NCTU_ML-Homework 4 Solved

Before we start
You may choose to go to the PC classrooms or finish this HW elsewhere.
It won’t affect a thing. For fairness’ sake, we’ll use discord as the Q&A system.

Tiny changes: we WILL answer questions asked verbally in class this time.
You can still use discord for Q&A.
Join the discord server for TA support

This is the same one as the program assignment #1, #2, and #3 used.
Ask questions on it, and we shall reply. (We won’t respond to raised hands.)
Try not to ask for obvious answers or bug fixes.
Memes and chit chat welcome
Competition
This homework is held on Kaggle as a competition so that you could see how it works.

Click the link to participate.
The competition provides you a training and a testing set.training set - train.json
testing set - test.json
Since it’s a competition, you won’t know the answer to the testing set, which is for you to predict and submit.
The standard procedure of a competition:Understand the data
Split the provided training set into training subset and validation set for validation methods. 
Preprocessing, model construction, tuning
Retrain the best model with as much data as possible, and predict testing set and make a submission.
Win the competition
If you have any questions, post them in the Discussion section or on Discord so everyone can see and understand.
Objective
Data Input - 5%Download the training set and testing set from Kaggle.
Data Preprocessing - 15%Transform data format and shape so your model can process them.
Shuffle the data.
Any data augmentation that can boost your final results. - 10%
Model Construction - 50%Support Vector Machine - 20%for SVM model, you may want to try out different types of kernels and compare the result.
Artificial Neural Networks - 30%for ANN model, you could use any Neural Network based model you want and implement it by yourself.
Every framework (such as TensorFlow or PyTorch) is allowed.
explain the reasoning of your model choice, data augmentation, and training process.
Validation methodHoldout validation with the ratio 7:3
Results - 10%Obtain the performances of all experiment settings in tables by the following metrics:Confusion matrix
Accuracy
Sensitivity(Recall)
Precision
Comparison & Conclusion - 10%Also some feedback, anything you want to tell me.
Kaggle Submission - 10% (+30%)After the validation, now you have working SVM and ANN models.
Retrain one of your best models with the whole train.json, predict test.json, and submit your y_test.csv to Kaggle.You can check sample_submission.csv for the submission format.
Take a screenshot of the Leaderboard, highlight your name, and put it in the report.
Top 10 in the final Private Leaderboard can get 30 bonus scores.
Note that you still need to submit your report and code to the newE3 system.
Data - Recipe Ingredients Dataset
The objective of the competition is to predict the category of a dish’s cuisine given a list of its ingredients.
In the dataset, we include the recipe id, the type of cuisine, and the list of ingredients of each recipe (of variable length). The data is stored in JSON format.
An example of a recipe node in train.json: {
 "id": 24717,
 "cuisine": "indian",
 "ingredients": [
     "tumeric",
     "vegetable stock",
     "tomatoes",
     "garam masala",
     "naan",
     "red lentils",
     "red chili peppers",
     "onions",
     "spinach",
     "sweet potatoes"

More products