Starting from:

$20

CECS551-Programming Assignment 3 Solved

1.    Write an R function called partition that takes as input a data frame df, and a real number α ∈ (0,1) and returns a list consisting of two data frames df1 and df2, where df1 consists of bαrc randomly selected rows (without replacement) of df, and df2 consists of the remaining unselected rows of df, where r denotes the number of rows of df. For example, if r = 100, and α = 0.4, then df1 will consist of 40 randomly selected rows of df, and df2 will consist of the other 60 rows.

2.    Implement the R functionbest_svm(df, alpha, degree, cost)

which works as follows. First, partition(df,alpha) is called to obtain two data frames df1 and df2 from df, where df1 will serve as the training set, and df2 will serve as the test set. Now, for each (d,c) combination, where d is a member of vector degree, and c is a member of vector cost, function

svm(Class~., data = df1, kernel="polynomial", degree = d, type = "C-classification",cost=c)

is called to create a support-vector machine model, where we assume that “Class” is the name of the data-frame attribute that represents the class label of each vector in the frame. The model is then tested against data set df2. Finally best svm returns a list having the three attributes d, c, and accuracy, that give the d and c value of the model demonstrating highest accuracy against the test set. Note: make sure that each df1 and df2 remain fixed for each learning model.

3.    Review and download the Car data set athttps://archive.ics.uci.edu/ml/datasets/Car+Evaluation

Place attribute names in line one of the file, making sure that Class is the name of the final attribute. Load the data set into a data frame df, and call best  svm on inputs df, α = 0.8, degree = (1,2,3,4), and cost = (0.1,1.0,10.0,100.0,1000.0,10000.0). Call best svm 10 different times on these inputs, and provide a table (as an R-comment) showing the d, c, and accuracy values of each call.

4.    By setting the C-classification cost to 105 (meaning very little to no slack allowance), determine the least degree value d that can produce a (nonlinear) model that can attain 100% accuracy for the entire data set.

5.    Repeat Exercise 2, but instead implement the functionbest.svm.cross(df, degree, cost, n)

This function works similar to the previous one, but now, instead of the partitioning df, cross validation is performed on the entire data set using n 0 as the fold. Thus, the call to svm is now

svm(Class~., data = df, kernel="polynomial", degree = d, type = "C-classification",cost=c, cross = n)

Finally best.svm.cross returns a list having the three attributes d, c, and accuracy, that give the d and c value of the model demonstrating highest average cross-validation accuracy, along with the value of the average accuracy (obtained from the model via model$tot.accuracy).

6.    Apply your function best.svm.cross to the Wisconsin Breast Cancer data set found at

https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29

Place attribute names in line one of the file, making sure that Class is the name of the final attribute. Load the data set into a data frame df, and call best.svm.cross on inputs df, degree = (1,2,3,4), cost = (0.1,1.0,10.0,100.0,1000.0,10000.0), and n = 10. In a comment, report on your findings.

7.    Implement the R function

bootstrap(df, model, p, n)

that takes as input a data frame df, svm model trained on df, probability p, and positive integer n. Function bootstrap then makes n bootstrap samples of df and, for each sample S, determines the accuracy of model tested against S. The n accuracies are then sorted and used to create a confidence interval I so that, with probability p, the model accuracy lies within I. Finally bootstrap returns a list having the two attributes lower and upper that store the lower and upper bounds of the confidence interval.

Use function bootstrap on the Wisconsin Breast Cancer data set, p = 0.90, and n = 100. For the model, train an svm on the entire data set using the values for d and c as reported in Exercise 6. In a comment provide the 90% confidence interval for the accuracy 

More products