Starting from:

$25

CS60050-Assignment 2 Solved

1)  Randomly divide the data into 80% for training and 20% for testing. Apply the following:

a)     Handle the missing values in both train and test set.

b)     Encode categorical variables using appropriate encoding method (in-built function allowed).

c)     After completing step (a) and (b), compute 5-fold cross validation on the training set

(normalisation of data is allowed, if required). Print the final test accuracy.

2)  Apply PCA (select number of components by preserving 95% of total variance) on the processed data from step (1).  

a)     Plot the graph for PCA (in-built function allowed for PCA and visualisation).  

b)     Use the features extracted from PCA to train your model. Compute 5-fold cross validation on the training set (normalisation of data is allowed, if required). Print the final test accuracy.  

3)  Using the processed data from step (1), apply the following:

a)     A feature value is considered as an outlier if its value is greater than mean + 3 x standard deviation. A sample having maximum such outlier features must be dropped.

b)     Using the sequential backward selection method, remove features.  

c)     Print the final set of features formed.

d)     Compute 5-fold cross validation on the training set (normalisation of data is allowed if required). Print the final test accuracy. Report and results.  

More products