Starting from:

$34.99

CS329 Lab 3-Naive Bayes Solution


Understand the Naive Bayes algorithm Learn how to implement the Naive Bayes
Classifier in python

Outline
Ø Background and Probability Basics
Ø Probabilistic Classification Principle
• Probabilistic discriminative models
• Generative models and their application to classification
• MAP and converting generative into discriminative
Ø Naïve Bayes – an generative model
• Principle and Algorithms (discrete vs. continuous)
• Example: Play Tennis
Ø Zero Conditional Probability and Treatment
Ø Summary

Background
• There are three methodologies:
ü Model a classification rule directly
Examples: k-NN, linear classifier, SVM, neural nets, ..
ü Model the probability of class memberships given input data Examples: logistic regression, probabilistic neural nets (softmax),… Make a probabilistic model of data within each class Examples: naive Bayes, model-based ….
• Important ML taxonomy for learning models probabilistic models vs non-probabilistic models discriminative models vs generative models
Background
• Based on the taxonomy, we can see different the essence of learning models (classifiers) more clearly.
Probabilistic Non-Probabilistic
Discriminative • •
• Logistic Regression Probabilistic neural nets
…….. • •
• K-nn
Linear classifier
SVM
• Neural networks
• ……
Generative • •
• Naïve Bayes
Model-based (e.g.,
GMM)
…… N.A. (?)
Probability Basics
Ø Prior, conditional and joint probability for random variables
– Prior probability: P(x)
– Conditional probability: P(x1|x 2 ), P(x2| x1 )
– Joint probability: x  (x1, x2 ), P(x)  P(x1,x 2)
– Relationship: P(x1 ,x2 )  P(x2 |x1)P(x1)  P(x1| x2 )P(x2)
– Independence: P(x2| x1 )  P(x2) , P(x1| x2 )  P(x1), P(x1,x 2 )  P(x1)P(x2 )
Ø Bayesian Rule P(c|x)  P(x|c)P(c) Posterior  Likelihood Prior P(x) Evidence
Discriminative Generative

Establishing a probabilistic model for classification
– Discriminative model
P(c|x) c  c1 ,,cL , x  (x1 ,, xn )
• To train a discriminative classifier regardless its probabilistic or non-probabilistic nature, all training examples of different classes must be jointly used to build up a single discriminative classifier.
• Output L probabilities for L class labels in a probabilistic classifier while a single label is achieved by a non-probabilistic classifier .
x  (x1, x2,, xn)
Establishing a probabilistic model for classification (cont.)

– Generative model (must be probabilistic)
P(x|c) c  c1 ,,cL , x  (x1 ,, xn )
P(x|c 1 ) P(x|c L )

x 1 x 2 x n x 1 x 2 x n
x  (x1 , x2 ,, xn )
• L probabilistic models have to be trained independently
• Each is trained on only the examples of the same label
• Output L probabilities for a given input with L models
• “Generative” means that such a model produces data subject to the distribution via sampling.

Maximum A Posterior (MAP) classification rule
– For an input x, find the largest one from L probabilities output by a discriminative probabilistic classifier P(c1 |x), ..., P(cL |x).
– Assign x to label c* if P(c* |x)is the largest.
Generative classification with the MAP rule
– Apply Bayesian rule to convert them into posterior probabilities
P(c i|x)  P(x|ci )P(ci )  P(x|ci )P(ci )
Common factor for all L probabilities
P(x)
for i  1, 2, , L
– Then apply the MAP rule to assign a label

Naive Bayes
Ø Bayes classification
P(c| x)  P(x | c)P(c)  P(x1,, xn | c)P(c) for c  c1,...,cL.
Difficulty: learning the joint probabilityP(x1, , xn | c) is infeasible!
[P(a | c* )  P(a | c* )]P(c* )  [P(a | c)  P(a | c)]P(c),
1 n 1 n c  c* , c  c ,, c
1 L
estimate of P(a ,, a | c* )
1 n
esitmate of P(a1,, an | c)
Ø Naïve Bayes classification
– Assume all input features are class conditionally independent!
P(x1 , x2 ,, xn |c)  P(x1 |x2 ,, xn ,c)P(x2 ,, xn |c)
Applying the  P(x1 |c)P(x2 ,, xn |c) independence
assumption  P(x1 |c)P(x2 |c)  P(xn |c)
– Apply the MAP classification rule: assign x'  (a1, a2 ,, an )to c* if
Naive Bayes



• Learning Phase
Outlook Play=Yes Play=No Temperature Play=Yes Play=No
Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5
Humidity Play=Yes Play=No Wind Play=Yes Play=No
High 3/9 4/5 Strong 3/9 3/5
Normal 6/9 1/5 Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
13
• Test Phase
– Given a new instance, predict its label x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables achieved in the learning phrase
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
– Decision making with the MAP rule
P(Yes|x’) ≈ [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’) ≈ [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206 Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
Naive Bayes
• Algorithm: Continuous-valued Features
– Numberless values taken by a continuous-valued feature
Pˆ(x j | c i )  21 ji exp j2 2jiji 2
 ji : mean (avearage) of feature values x j of examples for which c  ci
 ji : standard deviation of feature values x j of examples for which c  ci
– Conditional probability often modeled with the normal distribution (x  ) 
– Learning Phase: for X  ( X1,, XF ), C  c1,, cL
Output: FxL normal distributions and P(C  c ) i  1,  , L – Test Phase: Given an unknown instance X  (a , i ,a )
• Instead of looking-up tables, calculate conditional probabi1lities wnith all the normal distributions achieved in the learning phrase
• Apply the MAP rule to assign a label (the same as done for the discrete case)
13 15

Naive Bayes
• Example: Continuous-valued Features
– Temperature is naturally of continuous value.
Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8 No: 27.3, 30.1, 17.4, 29.5, 15.1
– Estimate mean and variance for each class
  1 N x n,
N n1 2  1 N (xn  )2
N n1
Yes  21.64, Yes  2.35
No  23.88, No  7.09
– Learning Phase: output two Gaussian models for P(temp|C)
Pˆ(x | Yes)  2.351 2 exp  (x 2  21.64) 2.352 2   2.351 2 exp  (x 1 121.64).09 2  
Pˆ(x | No)  7.091 2exp (x 2  23.88)7.092 2   7.091 2exp  (x 5 023.88).25 2 

13
Zero conditional probability
• If no example contains the feature value
– In this circumstance, we face a zero conditional probability problem during test
Pˆ(x1 | ci )  Pˆ(a jk | ci )  Pˆ(xn | ci )  0 for x j  aj k , Pˆ(a jk | ci )  0
– For a remedy, class conditional probabilities re-estimated with
(m-estimate)
Pˆ(a jk | c i )  nc  mp
n  m
nc : number of training examples for which x j  a j k and c  ci n : number of training examples for which c  ci
p : prior estimate (usually, p  1/ t for t possible values of x j )
m : weight to prior (number of " virtual" examples, 13m  1)
Zero conditional probability
• Example: P(outlook=overcast|no)=0 in the play-tennis dataset
– Adding m “virtual” examples
• In this dataset, training examples for the “no” class is 5.
• We can only add m=1 “virtual” example in our m-esitmate remedy.
– The “outlook” feature can takes only 3 values. So p=1/3.
– Re-estimate P(outlook|no) with the m-estimate

Summary
• Probabilistic Classification Principle
• Discriminative vs. Generative models: learning P(c|x) vs. P(x|c)
• Generative models for classification: MAP and Bayesian rule
• Naïve Bayes: the conditional independence assumption
• Training and test are very efficient.
• Two different data types lead to two different learning algorithms.
• Working well sometimes for data violating the assumption!
• Naïve Bayes: a popular generative model for classification
• Performance competitive to most of state-of-the-art classifiers even in presence of violating independence assumption
• Many successful applications, e.g., spam mail filtering
• A good candidate of a base learner in ensemble learning
• Apart from classification, naïve Bayes can do more…

Lab Task
1. Complete the exercises and questions in the lab03_Bayes.pdf 2. Submit it to bb.

More products