Starting from:

$40

CS6364 Machine Learning Project 1 Solved

Machine Learning Project 1



Header: List the major resources you used to complete this project and the programming language you used.

You may use any programming language you l ike (Python, C++, C, Java... ). All programming must be done individually from first principles. You are only permitted to use existing tools for simple l inear algebra such as matrix multiplication/inversion. Do NOT use any toolkit that performs machine learning functions and do NOT collaborate with your classmates. Cite any resources that were used.

In this project you will practice the basics of Machine Learning Classification by creating a K-NN clas-sifier for two datasets. You will also practice good practices for how to describe, evaluate, and write up a report on the classifier performance.

It i s expected that your project report may require 2 pages per dataset i f you are good about making interesting figures and making them not too l arge, or 3 pages i f your figures are big.

Datasets: The project will explore two datasets, the famous MNIST dataset of very small pictures of handwritten numbers, and a dataset that explores the prevelance of diabetes in a native american tribe named the Pima. You can access the datasets here:

1.   https://www.kaggle.com/uciml/pima-indians-diabetes-database

2.   https://www.kaggle.com/c/digit-recognizer/data

Programming Task: For each dataset, you must create a K-NN classifier that uses the training data to build a classifier, and evaluate and report on the classifier performance.

(30 points) Dataset details: Describe the data and some simple visualizations (for images, a few examples from each category; for other data, perhaps some scatter plots or histograms that show a big picture of the data). Describe your training/test split for K-NN and justify your choices.

(15 points) Algorithm Description: K-NN is a very clear algorithm, so here describe any data preprocessing, feature scaling, distance metrics, or otherwise that you did.

(45 points) Algorithm Results: Show the accuracy of your algorithm — in the case of the Pima Dataset, show accuracy with tables showing false positive, false negative, true positive and true negatives. For the Pima Dataset, use three different distance metrics and compare the results.

In the case of the MNIST digits show the complete confusion matrix. Choose a single digit to measure accuracy and show how that number varies as a function of K.

(10 points) Runtime: Describe the run-time of your algorithm and also share the actual ”wall-clock” time that it took to compute your results.

1

More products