Starting from:

$25

ECE448 - Assignment 3 -Naive Bayes - Logistic Regression Classification  - Solved

In this assignment you will apply machine learning techniques for image and text classification task, and apply logistic regression classifier to do binary classification. 

Programming language 

You may only use modules from the Python standard library and numpy. 

Contents
•       Part 1:Image Classification

•       Part 2: Text Classification

•       Part 3: Linear Classifier

•       Extra Credit

•       Provided Code Skeleton

•       Deliverables

•       Report checklist

Part 1: Digit image classification
 


Data: You are provided with part of the Digit Mnist dataset. There are 55000 training examples and 10000 test examples. The labels are from 0 to 9, representing digits of 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. In this section, you will apply Naïve Bayes model for this task.

Naive Bayes model 

•       Features: Each image consists of 28*28 pixels which we represent as a flattened array of size 784, where each feature/pixel Fi takes on intensity values from 0 to 255 (8 bit grayscale).

•       Training: The goal of the training stage is to estimate the likelihoods P(Fi | class) for every pixel location i and for every digit class (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). The likelihood estimate is defined as

P(Fi = f | class) = (# of times pixel i has value f in training examples from this class) / (Total # of training examples from this class)

In addition, as discussed in the lecture, you have to smooth the likelihoods to ensure that there are no zero counts. Laplace smoothing is a very simple method that increases the observation count of every value f by some constant k. This corresponds to adding k to the numerator above, and k*V to the denominator (where V is the number of possible values the feature can take on). The higher the value of k, the stronger the smoothing. Experiment with different values of k (say, from 0.1 to 10) and find the one that gives the highest classification accuracy.

You should also estimate the priors P(class) by the empirical frequencies of different classes in the training set.

•       Testing: You will perform maximum a posteriori (MAP) classification of test digit class according to the learned Naive Bayes model. Suppose a test image has feature values f1, f2, ... , f784. According to this model, the posterior probability (up to scale) of each class given the digit is given by

P(class)   P(f1 | class)   P(f2 | class)   ...   P(f784 | class)

Note that in order to avoid underflow, it is standard to work with the log of the above quantity: log P(class) + log P(f1 | class) + log P(f2 | class) + ... + log P(f784 | class)

After you compute the above decision function values for all ten classes for every test image, you will use them for MAP classification.

•       Evaluation: Report your performance in terms of average classification rate and the classification rate for each digit (percentage of all test images of a given item correctly classified). Also report your confusion matrix. This is a 10x10 matrix whose entry in row r and column c is the percentage of test images from class r that are classified as class c. In addition, for each class, show the test examples from that class that have the highest and the lowest posterior probabilities according to your classifier. You can think of these as the most and least "prototypical" instances of each digit class (and the least "prototypical" one is probably misclassified).

•       Likelihood visualization: When using classifiers in real domains, it is important to be able to inspect what they have learned. One way to inspect a naive Bayes model is to look at the most likely features for a given label. Another tool for understanding the parameters is to visualize the feature likelihoods

for high intensity pixels of each class. Here high intensities refer to pixel values from 128 to 255. Therefore, the likelihood for high intensity pixel feature Fi of class c1 is sum of probabilities of the top 128 intensities at pixel location i of class c1.

255

                                                                     feature likelihood(𝐹𝑖, 𝑐1) =        ∑ 𝑃(𝐹𝑖 = 𝑘|𝑐1)

𝑘=128

For each of the ten classes, plot their trained likelihoods for high intensity pixel features to see what likelihood they have learned.

Part 2: Text Classification
You are given a dataset consisting of texts which belong to 14 different classes. We have split the dataset into a training set and a development dataset. The training set consists of 3865 texts and their corresponding class labels from 1-14, with instances from each of the classes and the development set consists of 483 test instances and their corresponding labels. We have already done the preprocessing of the dataset and extracted into a Python list structure in text_main.py. Using the training set, you will learn a Naive Bayes classifier that will predict the right class label given an unseen text. Use the development set to test the accuracy of your learned model. Report the accuracy, recall, and F1-Score that you get on your development set. We will have a separate (unseen) train/test set that we will use to run your code after you turn it in. No other outside nonstandard python libraries can be used.

Unigram Model
The bag of words model in NLP is a simple unigram model which considers a text to be represented as a bag of independent words. That is, we ignore the position the words appear in, and only pay attention to their frequency in the text. Here each text consists of a group of words. Using Bayes theorem, you need to compute the probability of a text belonging to one of the 14 classes given the words in the text. Thus you need to estimate the posterior probabilities:

𝑃(𝐶𝑙𝑎𝑠𝑠 = 𝐶𝑖|𝑊𝑜𝑟𝑑𝑠)  (𝑊𝑜𝑟𝑑|𝐶𝑙𝑎𝑠𝑠 = 𝐶𝑖)

𝐴𝑙𝑙 𝑤𝑜𝑟𝑑𝑠

It is standard practice to use the log probabilities so as to avoid underflow. Also, P(words) is just a constant, so it will not affect your computation.

Training and Development
•       Training: To train the algorithm you are going to need to build a bag of words model using the texts.

After you build the model, you will need to estimate the log-likelihoods log𝑃(Word|Type =

𝐶𝑖) .The variable 𝐶𝑖 can only take on 14 values, 1-14. Additionally, you will need to make sure you

smooth the likelihoods to prevent zero probabilities. In order to accomplish this task, use Laplace

•       Development: After you have computed the log-likelihoods, you will have your model predict class labels of the text from the development set. In order to do this, you will do MAP estimatation classification using the equation shown above.

Use only the training set to learn the individual probabilities. The following results should be put in your report:

1.       Plot your confusion matrix. This is a 14x14 matrix whose entry in row r and column c is the percentage of test text from class r that are classified as class c.

2.       Accuracy, recall, and F1 scores for each of the classes on the development set.

3.       Top 20 feature words of each of the classes .

4.       Calculate your accuracy without including the class prior into the Naive Bayes equation i.e. Only computing the ML inference of each instance. Report the change in accuracy numbers, if any. Also state your reasoning for this observation. Is including the class prior always beneficial? Change your class prior to a uniform distribution. What is the change in result?

Part 3: Linear Classifier
 



There are some points on the 2-D plane, some of which are labeled as 1, and others are labeled as 0. Your task is to find the boundary line that can correctly separate these two categories of points. As in the image shown above, the solid line is the real boundary, and the dashed line is the boundary found by a logistic regression classifier.  

Note:

a)         You need to achieve a logistic regression classifier for this task. The logistic.py are the only file you need to modify in this section.

b)        Although we only do classification on 2-D points in this task, your code should be working on arbitrary dimensions.

Logistic regression model 

Logistic regression model, also known as differentiable perceptron, which is as follows:

 𝒇(⃗⃗⃗ 𝒘

𝑻⃗ 𝒇) = 𝐬𝐢𝐠𝐦𝐨𝐢𝐝(⃗⃗⃗ 𝒘𝑻⃗ 𝒇) = 𝟏 + 𝒆𝟏−𝒘⃗⃗⃗ 𝑻⃗ 𝒇

Note: 

a)         This logistic regression model is different from the one in the lecture slide. You should implement this one in this task, NOT the one in the slide.

b)        The derivative of sigmoid function is 𝑓′(𝑥) = 𝑓(𝑥) × (1 − 𝑓(𝑥))

•       Features: The coordinates of every points. Denote the number of points as N, the dimension of coordinates as P, the feature matrix should be P*N.

•       Training: Achieve the training process of Logistic Regression model. Recall the loss function of logistic regression in lecture slide, which is as follows: (Note: a better mesurement would be logistic loss which is not required in this MP. If you are interested, see Logistic regression here.)

𝑛

𝐿(𝑦1, … , 𝑦𝑛,𝑓 ,… , 𝑓 𝑛   − sigmoid(𝑤⃗⃗ 𝑇𝑓 𝑖))2

𝑖=1

•       Testing: The code provided has already achieve the testing process for you. You do NOT need to achieve this. But do Not forget to report the test results on your report.

•       Evaluation: We repeated the process of training and testing for many times, and take the average training error and testing error as our evaluation of our model. This is also achieved in the skeleton code for you.

Extra Credit Suggestion
Implement the naive Bayes algorithm over a bigram model as opposed to the unigram model. Bigram model is defined as follows:

P(𝑤1.. 𝑤𝑛) = 𝑃(𝑤1)𝑃(𝑤2|𝑤1).. 𝑃(𝑤𝑛|𝑤𝑛−1)

Then combine the bigram model and the unigram model into a mixture model defined with parameter λ:

                                                                                          𝑛                                                               𝑚

 (𝑤𝑖|𝑌)  

                                                                                        𝑖=1                                                          𝑖=1

Did the bigram model help improve accuracy? Find the best parameter   that gives the highest classification accuracy. Report the optimal parameter λ and report your results (Accuracy number) on the bigram model and optimal mixture model, and answer the following questions:

1.       Running naive Bayes on the bigram model relaxes the naive assumption of the model a bit. However, is this always a good thing? Why or why not?

2.       What would happen if we did an N-gram model where N was a really large number?

Provided Code Skeleton
We have provided ( zip file) all the code to get you started on your MP.

For part 1, you are provided the following. The doc strings in the python files explain the purpose of each function.

•       image_main.py- This is the main file which loads the dataset and calls your Naive Bayes algorithms.

•       naive_bayes.py- This is the only file that needs to be modified.

•       x_train.npy, y_train.npy, x_test.npy and y_test.npy- These files contain the training and testing examples.

For part 2, you are provided the following. The doc strings in the python files explain the purpose of each function

•       text_main.py- This is the main file which loads the dataset and calls your Naive Bayes Algorithm.

•       TextClassifier.py- This is the only file that needs to be modified.

•       train_text.csv- This file contains the training examples.

•       dev_text.csv- This file contains the development examples for testing your model.

•       stop_words.csv- This file contains the stop words which are required for preprocessing the dataset.

For part 3, you are provided the following. The doc strings in the python files explain the purpose of each function

•       linear_classifier_main.py- This is the main file which loads the dataset and calls your Perceptron and Logistic Regression Algorithm.

•       logistic.py- This is the only file that needs to be modified to achieve your Logistic Regression Algorithm.

•       mkdata.py - This is the file to make synthetic data for your algorithm.

•       plotdata.py - This file is to plot the experiment result of your perceptron model.

•       plotdata_log_reg.py- This file is to plot the experiment result of your Logistic Regression model.


1.       Title Page:

List of all team members, course number and section for which each member is registered, date on which the report was written

2.       Section I:

Image Classification. Report average classification rate, the classification rate for each class and the confusion matrix. For each class, show the test examples from that class that have the highest and lowest posterior probabilities or perceptron scores according to your classifier. Show the ten visualization plots both feature likelihoods.

3.       Section II:

Text Classification. Report all your results, confusion matrix ,recall ,precision, F1 score for all the 14 classes. Include the top feature words for each of the classes. Also, report the change in accuracy results when the class prior changes to uniform distribution and when its removed. Provide the reasoning for these observations

4.       Section III:

Linear Classifier . Report all your average error rate of training and test set for your Logistic Regression model. Show your visual result of the models.


More products