$25
Total Credit: 30 (Implementation: 20; Documentation & Explanation: 10)
Problem: Designing a spam filtering based on Naive Bayes classifier. You have to implement for both multinomial and multivariate Naive Bayes classifier versions. To avoid zero counts, make sure you also implement the add-one smoothing
Reading the dataset
Download and unpack this zip file (smsspamcollection.zip). The SMS Spam Collection v.1 is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged according to being ham (legitimate) or spam. It has a total of 4,827 SMS legitimate messages (86.6%) and a total of 747 (13.4%) spam messages. The files contain one message per line. Each line is composed of two columns: one with a label (ham or spam) and the other with the raw text. Here are some examples:
● ham What you doing?how are you?
● spam FreeMsg: Txt: CALL to No: 86888 & claim your reward of 3 hours talk time to use from your phone now! ubscribe6GBP/ month inc 3hrs 16 stop?txtStop Evaluating the classifier:
Report the 5-fold cross-validation results in terms of accuracy.