CSC781 Assignment 2 – KNN Digits Classifier Solved

Starting from:

$30

KNN is a classification algorithm that makes predictions based on the distance between a testing sample and the samples in the training set. Though KNN is a simple algorithm, it may work surprisingly well if the test data and train data are from the same data distribution. For this assignment, you need to build a KNN classifier for digits classification using the scikit-learn digits dataset.

Purpose:

· Get familiar with Python programming language and the scikit-learn library.

· Develop a KNN algorithm for a given task.

Directions:

For this assignment, you need to build a KNN classifier from scratch. Below is a detailed instruction of what you may need to do.

· Dataset Preparation

o You need to load the dataset using sklearn.datasets.load_digits.

§ More information about the function can be found at: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits

o After loading the dataset, randomly shuffle the dataset to split the dataset to train/dev/test sets.

§ Use the 70% of data for the train set, 15% for the dev set, and 15% for the test set

§ You need to make sure that the labels and images are still matching after shuffling the data.

§ You may want to use the random shuffle function provided by Numpy.

· KNN Development

o You need to write your own distance comparison function

o Use the train set as the training data, and use the dev set to determine the best K and best distance metric.

§ You may need to test multiple K values and distance metrics to select the optimal ones.

· Test the Model

o After the optimal K value and distance metric are selected, test the model using the test set.

· Submission

o You need to submit a written report for this assignment.

o For this report, you need to:

§ Explain what you have done

· E.g., what distance metrics you have tested, what K values you have tested, etc.

§ Report the best performance on the test set (in terms of accuracy)

· You also need to indicate the K value and distance matric for achieving this result

§ Visualize the prediction result

· Randomly select 10 data samples from the test set and specify the ground truth label and the predicted label for each of the samples.

§ Include your code as an appendix

· You could save your Colab code as a PDF file and attach it to your report, or you could copy and paste your code into the report.

o If you want to copy/paste your code, make sure to maintain the appropriate indentation and make the code readable.

More products

Machine Learning Homework 5 Solution

$29.99

Add to cart

Machine Learning Homework 4- Speaker Identification Solution

$24.99

Add to cart

Machine Learning Homework 3 Solution

$24.99

Add to cart