Starting from:

$30

CSE572-Project 3 Cluster Validation Solved

In this project you will apply the cluster validation technique to data extracted from a provided data set.

Objectives
Students will be able to:

●       Develop code that performs clustering.  

●       Test and analyze the results of the clustering code.

●       Assess the accuracy of the clustering using SSE and supervised cluster validity metrics.

 

Technology Requirements
Python 3.6 to 3.8 (do not use 3.9).  scikit-learn==0.21.2 pandas==0.25.1

Python pickle

Project Description
For this project you will write a program, using Python, that takes a dataset and performs clustering. Using the provided training data set you will perform cluster validation to determine the amount of carbohydrates in each meal.  

Directions
There are two main parts to the process:  

1.     Extract features from Meal data

2.     Cluster Meal data based on the amount of carbohydrates in each meal

Data:

Use the Project 1 data files

 

 

CGMData.csv

InsulinData.csv

Extracting Ground Truth:  

Derive the max and min value of meal intake amount from the Y column of the Insulin data. Discretize the meal amount in bins of size 20. Consider each row in the meal data matrix that you generated in Project 2. Put them in the respective bins according to their meal amount label.

In total you should have n = (max-min/20) bins.

Performing clustering:

Use the features in your Project 2 to cluster the meal data into n clusters. Use DBSCAN and KMeans.  

Report your accuracy of clustering based on SSE, entropy and purity metrics.

Expected Output:

A Result.csv file which contains a 1 X 6 vector. The vector should have the following format

 

SSE for Kmeans
SSE for

DBSCAN
Entropy for

KMeans
Entropy for

DBSCAN
Purity for

KMeans
Purity for

DBSCAN
 
 
 
 
 
 

More products