$24.99
Assignment #2
Naive Bayes Classifiers
Submission Instructions
Tasks
1 Independent Events and Bayes Theorem [20 pts]
1. [5 Points] For events A, B prove:
(¬A denote the event that A does not occur.)
2. Let X, Y , and Z be random variables taking values in {0, 1}. The following table lists the probability of each possible assignment of 0 and 1 to the variables X, Y , and Z:
Z = 0 Z = 1
X = 0 X = 1 X = 0 X = 1
Y = 0 0.1 0.05 0.1 0.1
Y = 1 0.2 0.1 0.175 0.175
(a) [5 Points] Is X independent of Y ? Why or why not?
(b) [5 Points] Is X conditionally independent of Y given Z? Why or why not?
(c) [5 Points] Calculate P(X ̸= Y |Z = 0).
2 Maximum Likelihood Estimation [40 pts]
This problem explores maximum likelihood estimation (MLE), which is a technique for estimating an unknown parameter of a probability distribution based on observed samples. Suppose we observe the values of n i.i.d. random variables X1, . . . , Xn drawn from a single Bernoulli distribution with parameter . In other words, for each Xi, we know that:
P(Xi = 1) = θ and P(Xi = 0) = 1 − θ
Our goal is to estimate the value of θ from these observed values of X1 through Xn.
For any hypothetical value θˆ, we can compute the probability of observing the outcome X1, . . . , Xn if the true parameter value θ were equal to θˆ. This probability of the observed data is often called the data likelihood, and the function L(θˆ) = P(X1,....,Xn|θˆ) that maps each θˆ to the corresponding likelihood is called the likelihood function. A natural way to estimate the unknown parameter θ is to choose the θˆ that maximizes the likelihood function. Formally,
θˆMLE = argmaxL(θˆ)
θˆ
Often it is more convenient to work with the log likelihood function l‘(θˆ) = logL(θˆ). Since the log function is increasing, we also have:
θˆMLE = argmaxl(θˆ)
θˆ
1. [8 Points] Write a formula for the log likelihood function, l(θˆ). Your function should depend on the random variables X1, . . . , Xn, the hypothetical parameter θˆ, and should be simplified as far as possible (i.e., don’t just write the definition of the log likelihood function). Does the log likelihood function depend on the order of the random variables?
2. [8 Points] Consider the following sequence of 10 samples: X = (0,1,0,1,1,0,0,1,1,1).
Compute the maximum likelihood estimate for the 10 samples. Show all of your work (hint: recall that if x∗ maximizes f(x), then f′(x∗) = 0).
3. [8 Points] Now we will consider a related distribution. Suppose we observe the values of m iid random variables Y1,....,Ym drawn from a single Binomial distribution B(n,θ). A Binomial distribution models the number of 1’s from a sequence of n independent Bernoulli variables with parameter. In other words,
Write a formula for the log likelihood function, l(θˆ). Your function should depend on the random variables Y1, . . . , Ym and the hypothetical parameter θˆ.
4. [8 Points] Consider two Binomial random variables Y1 and Y2 with the same parameters, n = 5 and θ. The Bernoulli variables for Y1 and Y2 resulted in (0,1,0,1,1) and (0,0,1,1,1), respectively. Therefore, Y1 = 3 and Y2 = 3. Compute the maximum likelihood estimate for the 2 samples. Show your work.
5. [8 Points] How do your answers for parts 1 and 3 compare? What about parts 2 and 4? If you got the same or different answers, why was that the case?
3 Implementing Naive Bayes [40 pts]
• train.csv: Training data. Each row represents a document, each column separated by commas represents features (word counts). There are 4527 documents and 5180 words.
• train labels.txt: labels for the training data
• test.csv: Test data, 1806 documents and 5180 words
• test labels.txt: labels for the test data
• word indices: words corresponding to the feature indices.
Train your classifier on the training set that is given. For each of the classifier, report training accuracy, testing accuracy and the amount of time spent training the classifier.