$25
In this problem, you will implement Fischer’s Linear Discriminant from scratch as learnt in class, i.e. given the higher dimensional data reduce the data to one dimension while maximizing difference of means and minimizing sum of variances of the clusters. Finally, calculate the intersection point of both the normal distributions corresponding to the collapsed clusters and find the discriminant vector in 1-D and 3-D.
. Note that, for this question, you need not make any train-test split and you can use the entire data for the procedure.
. Try to vectorize your code as much as possible to make your computations faster and efficient. Do not hard code any parts of the implementation unless it is absolutely necessary.
Plot of the higher dimensional data. (you can use Matplotlib’s 3D plotting feature for this)
Plots of the reduced clusters and their corresponding normal distribution in two separate plots. It is recommended that you use two different colors (say red and blue) to represent the two classes. Also, do visualize the discriminant line in your plots.
. The intersection point of both the normal distributions and unit vector along the discriminant line in 1-D and 3-D.
Problem Statement
. In this problem, you will implement a simple Naive Bayes classifier to classifiy mails as spam or not. You will need to create a 7-fold cross validation to train and test your model. You may choose to discard various stop words, commas, fullstops, numbers, hyphens, brackets, exclamation marks and any other single/double letter words (such as a, an, the, be etc) which do not contribute to the sentiment of the text.
. Use laplace smoothening to avoid the problem of divison by zero.
. Try to vectorize your code as much as possible to make your computations faster and efficient. Do not hard code any parts of the implementation unless it is absolutely necessary.
Accuracy of your model over each fold and the overall average accuracy.
Problem Statement
. In this problem, you will implement a linear perceptron as discussed in class. You have two different datasets for this assignment on which you need to train and test your model independently. Create 70:30 train-test splits on both the datasets and train the model for a maximum of 106 iterations in case the model does not converge on the given dataset.
. Try to vectorize your code as much as possible to make your computations faster and efficient. Do not hard code any parts of the implementation unless it is absolutely necessary.
Accuracy of your model on both the datasets.
Dataset which was more linearly separable.
. Major limitations of the Perceptron classifier.