$30
KNN is a classification algorithm that makes predictions based on the distance between a testing sample and the samples in the training set. Though KNN is a simple algorithm, it may work surprisingly well if the test data and train data are from the same data distribution. For this assignment, you need to build a KNN classifier for digits classification using the scikit-learn digits dataset.
Purpose:
· Get familiar with Python programming language and the scikit-learn library.
· Develop a KNN algorithm for a given task.
Directions:
For this assignment, you need to build a KNN classifier from scratch. Below is a detailed instruction of what you may need to do.
· Dataset Preparation
o You need to load the dataset using sklearn.datasets.load_digits.
§ More information about the function can be found at: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits
o After loading the dataset, randomly shuffle the dataset to split the dataset to train/dev/test sets.
§ Use the 70% of data for the train set, 15% for the dev set, and 15% for the test set
§ You need to make sure that the labels and images are still matching after shuffling the data.
§ You may want to use the random shuffle function provided by Numpy.
· KNN Development
o You need to write your own distance comparison function
o Use the train set as the training data, and use the dev set to determine the best K and best distance metric.
§ You may need to test multiple K values and distance metrics to select the optimal ones.
· Test the Model
o After the optimal K value and distance metric are selected, test the model using the test set.
· Submission
o You need to submit a written report for this assignment.
o For this report, you need to:
§ Explain what you have done
· E.g., what distance metrics you have tested, what K values you have tested, etc.
§ Report the best performance on the test set (in terms of accuracy)
· You also need to indicate the K value and distance matric for achieving this result
§ Visualize the prediction result
· Randomly select 10 data samples from the test set and specify the ground truth label and the predicted label for each of the samples.
§ Include your code as an appendix
· You could save your Colab code as a PDF file and attach it to your report, or you could copy and paste your code into the report.
o If you want to copy/paste your code, make sure to maintain the appropriate indentation and make the code readable.