Starting from:

$30

TDT4300-Assignment 3 CLUSTERING Solved

1       k-Means Clustering
1.1       Assignment
This is a programming part of the assignment. Your task is to implement the k-means clustering algorithm and assess the quality of the outputs by calculating Silhouette Coefficient. You are given a Jupyter Notebook[1] (formerly known as the IPython Notebook) file k_means_clustering.ipynb in which you have to implement only two functions: kmeans() and silhouette_score(). Everything else has already been prepared for you. As you have maybe already guessed, the programming language of our choice is Python.

Before you start, you need to install Jupyter Notebook and Python 3. Having that done, open your terminal, navigate to a folder with the k_means_clustering.ipynb file, and execute the command jupyter notebook. A window of your Internet browser should pop-up with the Jupyter Notebook interface. Open the k_means_clustering.ipynb notebook, read it carefully through, and execute it line by line. If this is new to you, get yourself familiar with the Jupyter Notebook and Python.

The assignment is very easy as you do not have to worry about anything else except the core k-means algorithm and Silhouette Coefficient. If you consider yourself a good programmer but without knowledge of Python, you should not have struggles, and you can add a new programming language to your portfolio. If you consider yourself a rather unexperienced programmer and without knowledge of Python, it is a good chance to learn new beginner friendly programming language, and gain more practice in programming. If programming scares you, seek help from other students. Use Piazza to find help if you do not know anyone. We do not have to remind you that plagiarism is not tolerable.

1

 

2           Hierarchical Agglomerative Clustering (HAC)
(a)     Explain the Hierarchical Agglomerative Clustering (HAC) and the difference between MINlink and MAX-link.

(b)     You are given a two-dimensional dataset shown in Table 1. Perform HAC (for both MINlink and MAX-link) and present the results in the form of dendrogram. Use the Euclidean distance. Describe thoroughly the process and the outcome of each step.

(c)     Verify your results using the KNIME data analytics platform. For clarification, MIN-link and MAX-link is in KNIME referred as SINGLE and COMPLETE linkage methods. We provide you the file hac_dataset.csv containing the very same data. Present a picture of your workflow and the dendrograms.

ID
x
y
A
4
3
B
5
8
C
5
7
D
9
2
E
11
6
F
14
8
Table 1: Dataset for HAC.

3        DBSCAN Clustering
You are given following points: P1 = (1,1), P2 = (3,3), P3 = (3,4), P4 = (2,4), P5 = (6,5), P6 = (7,6), P7 = (7,8), P8 = (6,10), P9 = (12,4), P10 = (5,11), P11 = (6,11), P12 = (5,10), P13 = (16,8), P14 = (11,9), P15 = (13,8), P16 = (10,7), P17 = (12,8), P18 = (15,3).

(a)     Your task is to perform DBSCAN clustering given the parameters Eps = 2 (Euclidean metric) and MinPts = 3 (including the analyzed point). Identify core, border and noise points. Identify clusters. Describe thoroughly the process and the outcome of each step.

(b)     Verify your results using the KNIME data analytics platform. We provide you the file dbscan_dataset.csv containing the very same data. Present a picture of your workflow and the scatter plot with marked clusters and outliers.

 

Describe thoroughly the process and the outcome of each step.


 
[1] https://jupyter.org/

More products