Starting from:

$20

CECS551-Programming Assignment 1 Solved

Mammographic Mass Data Set
Review and download the mammographic mass data set at

https://archive.ics.uci.edu/ml/datasets/Mammographic+Mass

In examining the data, notice that some datapoints have missing attribute values. In these cases “?” is substituted for each missing value. Add a header line to the csv file that labels the attributes as Birads,Age,Shape,Margin,Density, and Severity.

Exercises
1.    In R the more appropriate indicator for missing data is “NA” (not available). Therefore, replaceeach occurrence of “?” with “NA”.

a.     For this exercise, create an R data frame for the mammographic data using only datapointsthat have no missing values. This can be done using the complete.cases function which inputs a data frame and returns a Boolean vector v, where v[i] equals TRUE iff the i th data-frame sample is complete (meaning it does not possess an NA). For example, if the data-frame is stored in mammogram.frame, then mammogram2.frame = mammogram.frame[complete.cases(mammogram.frame),]

creates a new data frame called mammogram2.frame that has all the complete mammogram data samples.

b.     Use R’s summary function to provide a statistical summary of each of attribute of the altered data frame.

c.      Use the e1071svm function to construct a linear classifier for the data set. Report on the percentage of datapoints that are correctly classified by the svm model.

d.     Repeat part b) using a degree-2 polynomial classifier. This particular type of svm can beconstructed using the input options kernel = ‘‘polynomial’’, degree = 2, type = ‘‘C-classification’’

2.    Repeat each part of the previous exercise, but, for Part a, instead of removing datapoints withmissing attribute values, replace each NA with the nominal value -1 and use the entire data set.

More products