Starting from:

$25

Β CS4342 - Assignment #6 - Solved

Conceptual and Theoretical Questions 

1. This problem involves hyperplanes in two dimensions.

(a)  Sketch the hyperplane 1 + 3 𝑋1 − 𝑋2 = 0. Indicate the set of points for which 1 + 3 𝑋1 − 𝑋2 > 0, as well as the set of points for which 1 + 3 𝑋1 − 𝑋2 < 0.

(b)  On the same plot, sketch the hyperplane −2 + 𝑋1 + 2 𝑋2 = 0. Indicate the set of points for which

−2 + 𝑋1 + 2 𝑋2 > 0, as well as the set of points for which −2 + 𝑋1 + 2 𝑋2 < 0. Points: 5 

 

2. We have seen that in p = 2 dimensions, a linear decision boundary takes the form 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 =  0.We now investigate a non-linear decision boundary.

(a)  Sketch the curve

(1    + 𝑋 )2 + (2 − 𝑋 )2 = 4.

(b)  On your sketch, indicate the set of points for , as well as the set of points for which (1 + 𝑋1)2 + (2 − 𝑋2)2 ≤ 4. (c) Suppose that a classifier assigns an observation to the blue class if

(1   + 𝑋1)2 + (2 − 𝑋2)2 > 4,

and to the red class otherwise. To what class is the observation (0, 0) classified? (−1, 1)? (2, 2)? (3, 8)? (d) Argue that while the decision boundary in (c) is not linear in terms of 𝑋1 and 𝑋2, it is linear in terms of

𝑋1, 𝑋2, 𝑋12, and 𝑋22.

Points:5 

 

3. Here we explore the maximal margin classifier on a toy data set.

  

 

(a)  We are given n = 7 observations in p = 2 dimensions. For each observation, there is an associated class label.

(b)  Sketch the optimal separating hyperplane, and provide the equation for this hyperplane (of the form (9.1)).

(c)  Describe the classification rule for the maximal margin classifier.

It should be something along the lines of “Classify to Red if 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 > 0, and classify to Blue otherwise.” Provide the values for 𝛽0, 𝛽1, and 𝛽2.

(d)  On your sketch, indicate the margin for the maximal margin hyperplane.

(e)  Indicate the support vectors for the maximal margin classifier.

(f)   Argue that a slight movement of the seventh observation would not affect the maximal margin hyperplane.

(g)  Sketch a hyperplane that is not the optimal separating hyperplane, and provide the equation for this hyperplane.

(h)  Draw an additional observation on the plot so that the two classes are no longer separable by a hyperplane.

Points: 5 

  

4. Suppose that we have four observations, for which we compute a dissimilarity matrix, given by

  

For instance, the dissimilarity between the first and second observations is 0.3, and the dissimilarity between the second and fourth observations is 0.8.

(a)  On the basis of this dissimilarity matrix, sketch the dendrogram that results from hierarchically clustering these four observations using complete linkage. Be sure to indicate on the plot the height at which each fusion occurs, as well as the observations corresponding to each leaf in the dendrogram.

(b)  Repeat (a), this time using single linkage clustering.

(c)  Suppose that we cut the dendogram obtained in (a) such that two clusters result. Which observations are in each cluster?

(d)  Suppose that we cut the dendogram obtained in (b) such that two clusters result. Which observations are in each cluster?

(e)  It is mentioned in the chapter that at each fusion in the dendrogram, the position of the two clusters being fused can be swapped without changing the meaning of the dendrogram. Draw a dendrogram that is equivalent to the dendrogram in (a), for which two or more of the leaves are repositioned, but for which the meaning of the dendrogram is the same.

Points: 5 

 

5. In this problem, you will perform K-means clustering manually, with K = 2, on a small example with n = 6 observations and p = 2 features. The observations are as follows.

 

  

 

(a)  Plot the observations.

(b)  Randomly assign a cluster label to each observation. Report the cluster labels for each observation.

(c)  Compute the centroid for each cluster.

(d)  Assign each observation to the centroid to which it is closest, in terms of Euclidean distance. Report the cluster labels for each observation.

(e)  Repeat (c) and (d) until the answers obtained stop changing.

(f)   In your plot from (a), color the observations according to the cluster labels obtained. Points: 5 

 

6. Suppose that for a particular data set, we perform hierarchical clustering using single linkage and using complete linkage. We obtain two dendrograms.

(a) At a certain point on the single linkage dendrogram, the clusters {1, 2, 3} and {4, 5} fuse. On the complete linkage dendrogram, the clusters {1, 2, 3} and {4, 5} also fuse at a certain point. Which fusion will occur higher on the tree, or will they fuse at the same height, or is there not enough information to tell? (b) At a certain point on the single linkage dendrogram, the clusters {5} and {6} fuse. On the complete linkage dendrogram, the clusters {5} and {6} also fuse at a certain point. Which fusion will occur higher on the tree, or will they fuse at the same height, or is there not enough information to tell? Points: 5 

 

7. In words, describe the results that you would expect if you performed K-means clustering of the eight shoppers in Figure 10.14 – shown below, on the basis of their sock and computer purchases, with K = 2. Give three answers, one for each of the variable scalings displayed. Explain.

  

Points: 5                                                 


Applied Questions 

1. We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary.We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

(a)  Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a

quadratic decision boundary between them. For instance, you can do this as follows:

> 𝑋1=random.uniform (500) -0.5     

https://numpy.org/doc/stable/reference/random/generated/numpy.random.uniform.html > 𝑋2= random.uniform (500) -0.5 

> 𝑦 = 1 ∗ ( 𝑋12 − 𝑋22 > 0) 

(b)  Plot the observations, colored according to their class labels. Your plot should display 𝑋1 on the x-axis, and 𝑋2 on the y-axis.

(c)  Fit a logistic regression model to the data, using 𝑋1 and 𝑋2 as predictors.

(d)  Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

(e)  Now fit a logistic regression model to the data using non-linear functions of 𝑋1 and 𝑋2 as predictors (e.g.

𝑋12, 𝑋1 × π‘‹2, log(𝑋2), and so forth).

(f)   Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

(g)  Fit a support vector classifier to the data with 𝑋1 and 𝑋2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

(h)  Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels. (i) Comment on your results.

Hint: 

-      Check https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html Points: 15 

 

2. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

(a)  Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

(b)  Fit a support vector classifier to the data with the linear kernel, in order to predict whether a car gets high or low gas mileage. Report the cross-validation error. Comment on your results.

(c)  Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree. Comment on your results.

(d)  Make some plots to back up your assertions in (b) and (c).

Hints: 

-      Check https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html -   Check https://scikit-learn.org/0.18/auto_examples/svm/plot_iris.html Points: 15 

 

 

3. Consider the USArrests data. We will now perform hierarchical clustering on the states.

(a) Using hierarchical clustering with complete linkage and Euclidean distance, cluster the states. (b) Cut the dendrogram at a height that results in three distinct clusters. Which states belong to which clusters?

(c)           Hierarchically cluster the states using complete linkage and Euclidean distance, after scaling the variables to have standard deviation one.

(d)           What effect does scaling the variables have on the hierarchical clustering obtained? In your opinion, should the variables be scaled before the inter-observation dissimilarities are computed? Provide a justification for your answer.

Hints: 

        -     Check

https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.dendrogram.html Points: 15 

 

 

4. In this problem, you will generate simulated data, and then perform PCA and K-means clustering on the data.

(a)  Generate a simulated data set with 20 observations in each of three classes (i.e. 60 observations total), and 50 variables. Use uniform or normal distributed samples.

(b)  Perform PCA on the 60 observations and plot the first two principal component score vectors. Use a different color to indicate the observations in each of the three classes. If the three classes appear separated in this plot, then continue on to part (c). If not, then return to part (a) and modify the simulation so that there is greater separation between the three classes. Do not continue to part (c) until the three classes show at least some separation in the first two principal component score vectors. Hint: you can assign different means to different classes to create separate clusters.

(c)  Perform K-means clustering of the observations with K = 3. How well do the clusters that you obtained in K-means clustering compare to the true class labels?

(d)  Perform K-means clustering with K = 2. Describe your results.

(e)  Now perform K-means clustering with K = 4, and describe your results.

(f)   Now perform K-means clustering with K = 3 on the first two principal component score vectors, rather than on the raw data. That is, perform K-means clustering on the 60 x 2 matrix of which the first column is the first principal component score vector, and the second column is the second principal component score vector. Comment on the results.

(g)  Using the z-score function to scale your variables, perform K-means clustering with K = 3 on the data after scaling each variable to have standard deviation one. How do these results compare to those obtained in (b)? Explain.

Hints: 

-   Check https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html 

-   Check https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html -      Check https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zscore.html Points: 25 

More products