Starting from:

$30

CSC487 Data Mining Homework 4-Solved


1Please use the table below for questions 1 through 3. Notice that Count column is NOT an attribute. It just 
tells how many times a row occurs in our database and status is our target variable. 
department 

32. Split your diabetes data into two parts for training and testing purposes. Namely, reserve 
last 10 rows of the diabetes_train.csv for the test set. Then fit a SVM classifier on the bigger 
portion of this data and test it on these 10 rows you had reserved.
 41 ... 32.0 0.391 39 tested_negative 
. 24.3 0.178 50 tested_positive 

53. Draw the ROC curve based on the table below and fill the empty columns based on threshold 
at each step. 


T P R4. Please use the data shown for question
ya.) If h and c are selected as the initial centers for your k-means clustering, assign memberships 
for other points, and compute the means (centroids) of your initial clusters. You can use 
Manhattan distance. 
x

We can see that cluster within initial center c contains, {c,e,f,g}, and cluster with initial center h contains, 
{a,b,d,h}. Moving forward we will reference cluster C1 as that with initial center c, and cluster C2 as that 
with initial center h. To now compute the means, or centroids, of C1 and C2, we will take the average x and y 
values for the points within the clusters. We will reference the centroid of C1 as z1 and the center of C2 as z2. 
z
We can now see that we have clusters C1 = {c, e, f, g} and C2 = {a, b, d, h}. In fact, our clusters are unchanged. 
115. Given the distance matrix below answer the following questions. Notice that this is a 
distance matrix, meaning the distance between any pair of points can be found by checking the 
corresponding cell to them. 
a
a.) Perform hierarchical clustering using single link measure for the above and draw the final 
dendrogram. 
Itteration 1 

b
b.) Determine whether a point is core based on ϵ = 6 and minP ts = 2. 
To meet this requirement we must have 2 points within a distance of 6 for a point to be considered core. 

More products