$30
1Please use the table below for questions 1 through 3. Notice that Count column is NOT an attribute. It just
tells how many times a row occurs in our database and status is our target variable.
department
32. Split your diabetes data into two parts for training and testing purposes. Namely, reserve
last 10 rows of the diabetes_train.csv for the test set. Then fit a SVM classifier on the bigger
portion of this data and test it on these 10 rows you had reserved.
41 ... 32.0 0.391 39 tested_negative
. 24.3 0.178 50 tested_positive
53. Draw the ROC curve based on the table below and fill the empty columns based on threshold
at each step.
e
7
T P R4. Please use the data shown for question
ya.) If h and c are selected as the initial centers for your k-means clustering, assign memberships
for other points, and compute the means (centroids) of your initial clusters. You can use
Manhattan distance.
x
*
We can see that cluster within initial center c contains, {c,e,f,g}, and cluster with initial center h contains,
{a,b,d,h}. Moving forward we will reference cluster C1 as that with initial center c, and cluster C2 as that
with initial center h. To now compute the means, or centroids, of C1 and C2, we will take the average x and y
values for the points within the clusters. We will reference the centroid of C1 as z1 and the center of C2 as z2.
z
We can now see that we have clusters C1 = {c, e, f, g} and C2 = {a, b, d, h}. In fact, our clusters are unchanged.
115. Given the distance matrix below answer the following questions. Notice that this is a
distance matrix, meaning the distance between any pair of points can be found by checking the
corresponding cell to them.
a
a.) Perform hierarchical clustering using single link measure for the above and draw the final
dendrogram.
Itteration 1
a
b
b.) Determine whether a point is core based on ϵ = 6 and minP ts = 2.
To meet this requirement we must have 2 points within a distance of 6 for a point to be considered core.
a