$20.99
lease clearly state the UB Person numbers and UB IT names for all the group members on the cover of the report.
Two datasets (project3_dataset1, project3_dataset2) can be found on Piazza. Please check the README file first for a short description of the two datasets. This is a team project. Each team consists of at most three members.
Complete the following tasks:
Implement three classification algorithms by yourself: Nearest Neighbor, Decision Tree, and Naïve Bayes.
Implement Random Forests based on your own implementation of Decision Tree.
Adopt 10-fold Cross Validation to evaluate the performance of all methods on the provided two datasets in terms of Accuracy, Precision, Recall, and F-1 measure.
We will send you an invite for a Kaggle competition. For that dataset, we hold out the class labels for testing data. Apply various tricks on top of any classification algorithm discussed in class (including nearest neighbor, decision tree, Naïve Bayes, SVM, logistic regression, bagging, AdaBoost, random forests) and tune parameters using training data. You can call packages for these algorithms but need to implement any improvement on top of these algorithms. Submit your classification result for the testing data. Your efforts towards improving these algorithms will be evaluated. Those who are among the top on the leaderboard after the deadline will receive bonus points.
Your final submission should be a zip file named as project3.zip. In the zip file, you need to include a folder Code and a folder Report:
Code: Implementation of four methods and your implementation for the Kaggle competition. The four methods must be implemented by yourself. The implementation for the Kaggle competition can use learning packages of the algorithms that were mentioned, but the improvement should be implemented by yourself. Together with your code submission, a README file should be included to explain how to execute your code.
Report: For the four methods: Describe the flow of all the implemented methods, and describe the choice you make (such as parameter setting, pre-processing, etc.). Compare their performance, and state their pros and cons based on your findings. For the competition: Explain why a certain base algorithm is chosen, state clearly the parameters you choose for this algorithm, and discuss the improvement you have made towards improving its performance.