1. Implement a KNN based classifier to predict digits from images of handwritten digits in the dataset.
2. Featurize the images as vectors that can be used for classification.
3. Experiment with different values of K(number of neighbors).
4. Experiment with different distance measures - Euclidean distance, Manhattan distance,
5. Report accuracy score, F1-score, Confusion matrix and any other metrics you feel useful.
6. Implement baselines such as random guessing/majority voting and compare performance. Also, report the performance of scikit-learn’s kNN classifier. Report your findings.
2. k-Nearest Neighbors - Task 2
1. Implement a KNN based classifier to classify given set of features in Mushroom Database. Missing data must be handled appropriately.(Denoted by ”?”).
2. Choose an appropriate distance measure for categorical features.
3. Experiment with different values of K(number of neighbors).
4. Report accuracy score, F1-score, Confusion matrix and any other metrics you feel useful.
5. Implement baselines such as random guessing/majority voting and compare performance. Also, report the performance of scikit-learn’s kNN classifier. Report your findings.
3. Decision Tree
1. Implement a decision tree to predict housing prices for the given dataset using the available features.
2. The various attributes of the data are explained in the file data description.txt. Note that some attributes are categorical while others are continuos.
3. Feel Free to use Python libraries such as binarytree or any other library in Python to implement the binary tree. However, you cannot use libraries like scikit-learn which automatically create the decision tree for you.
4. Experiment with different measures for choosing how to split a node(Gini impurity, information gain, variance reduction) . You could also try different approaches to decide when to terminate the tree.
5. Report metrics such as Mean Squared Error(MSE) and Mean Absolute Error(MAE) along with any other metrics that you feel may be useful.
6. For feature engineering, you may consider normalizing/standardizing the data.
7. Implement simple baselines such as always predicting the mean/median of the training data. Also, compare the performance against scikit-learn’s decision tree. Report your findings.