$30
Task 6:Machine Learning Support Vector Machine Classifier
Problem Statement:
Spam email classification using Support Vector Machine: In this assignment you will use a SVM to classify emails into spam or non-spam categories. And report the classification accuracy for various SVM parameters and kernel functions.
Data Set Description:
An email is represented by various features like frequency of occurrences of certain keywords, length of capitalized words etc. A data set containing about 4601 instances are available in this link (data folder): Link
The data format is also described in the above link. You have to randomly pick 70% of the data set as training data and the remaining as test data.
Assignment Tasks:
In this assignment you can use any SVM package to classify the above data set. You should use one of the following languages: c/C++/Java/Python. You have to study performance of the SVM algorithms.
Report Format :
The report should contain the following sections:
Mention library which you are using.
Methodology: Details of the SVM package used.
Experimental Results:You have to use each of the following three kernel functions (a) Linear, (b) Quadratic, (c) RBF.
For each of the kernels, you have to report training and test set classification accuracy for the best value of generalization constant C. The best C value is the one which provides the best test set accuracy that you have found out by trial of different values of C. Report accuracies in the form of a
For Reference :
Support Vector Machines