Starting from:

$35

VE492- Homework 8 Solved

Question 1: Maximum Likelihood Estimation
We will begin with a short derivation. Consider a probability distribution with a domain that consists of @ different values. We get to observe total samples from this distribution. We use to represent the number of the samples for which outcome occurs. Our goal is to estimate the probabilities o b b @ b of each of the events. The probability of the last outcome, @, equals b @ bob .

In maximum likelihood estimation, we choose the that maximize the likelihood of the observed samples,

For this derivation, it is easiest to work with the log of the likelihood. Maximizing log-likelihood also maximizes likelihood, since the quantities are related by a monotonic transformation. Taking logs we obtain

Setting derivatives with respect to           equal to zero, we obtain @        b equations in

That is, the maximum likelihood estimation of can be found by solving a linear system of @ b equations in @ b unknowns. Doing so shows that the maximum likelihood estimate corresponds to simply the count for each outcome divided by the total number of samples. I.e., we have that:

Notice: Please write each sub-question in one row, that is, there will be 3 rows for this question. And please use irreducible fractions for your answer.

Part 1.

Now, consider a sampling process with 3 possible outcomes: R, G, and B. We observe the following sample counts:

outcome
R
G
B
 
count
3
1
7
 
1)        What is the total sample count    ?

2)        What are the maximum likelihood estimates for the probabilities of each
outcome? o o o

Part 2.
 
Now, use Laplace smoothing with strength outcome.

ܣწ o

ܣწ o

ܣწ o

Part 3.
o წ to estimate the probabilities of each
Now, consider Laplace smoothing in the limit . Fill in the corresponding probability estimates.
 
 
 
 
 
 
o           ܣo                 ܣo                 ܣ

Question 2: Poisson Parameter Evaluation
We will now consider maximum likelihood estimation in the context of a different probability distribution. Under the Poisson distribution, the probability of an event occurring @ o times is:

@ o             t

Here     is the parameter we wish to estimate. The distribution is plotted for several values of         below.

On a sheet of scratch paper, work out the maximum likelihood estimate for , given observations of several .

Hints: start by taking the product of the equation above over all the , and then taking the log. Then, differentiate with respect to , set the result equal to 0, and solve for in terms of the .

You observe the samples b o t b o წ წ o t o o b . What is your maximum likelihood estimate of ?

Question 3: Naive Bayes
In this question, we will train a Naive Bayes classifier to predict class labels as a function of input features .

We are given the following 15 training points:

b
1
1
1
1
1
1
1
0
1
1
1
0
1
1
0
b
0
0
0
1
0
0
0
0
0
1
0
0
0
0
1

0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
 
A
A
A
A
A
A
A
A
A
A
B
B
B
B
C
Note: Please write your answer for each table in one row, that is, there will be 10 rows for this question. Besides, please use values rounded to 3 decimal places.

What is the maximum likelihood estimate of the prior ?
 
 
A
 
B
 
C
 
What are the maximum likelihood estimates of the conditional probability distributions? Fill in the tables below (the second and third are done for you).
 

b
 
b
0
A
 
1
A
 
0
B
 
1
B
 
0
C
 
1
C
 
 
 
Now consider a new data point ( b o n b o n წ o b). Use your classifier to determine the joint probability of causes and this new data point, along with the posterior probability of given the new data:
 
b on
b on
წ ob
A
 
 
 
B
 
 
 
C
 
 
 
 
b on
b on
წ ob
A
 
 
 
B
 
 
 
C
 
 
 
What label does your classifier give to the new data point? (Break ties alphabetically). Write capital letters only.
Now use Laplace Smoothing with strength o წ to estimate the prior for the same data.
 
 
A
 
B
 
C
 
Use Laplace Smoothing with strength o წ to estimate the conditional probability distributions below (again, the second two are done for you).
 

b
Y
b
0
A
 
1
A
 
0
B
 
1
B
 
0
C
 
1
C
 
 
Now consider again the new data point ( b o n b o n     წ o b ). Use the
Laplace-Smoothed version of your classifier to determine the joint probability of causes and this new data point, along with the posterior probability of given the new data:

 
b o n
b o n
წ o b
A
 
 
 
B
 
 
 
C
 
 
 
 
b o n
b o n
წ o b
A
 
 
 
B
 
 
 
C
 
 
 
 
 
 
 
 
 
What label does your (Laplace-Smoothed) classifier give to the new data point? (Break ties alphabetically). Write a single capital letter.
Question 4: Datasets
When training a classifier, it is common to split the available data into a training set, a hold-out set, and a test set, each of which has a different role.

 

Which data set is used to learn the conditional probabilities?Training Data
Hold-Out Data
Test Data
Which data set is used to tune the Laplace Smoothing hyperparameters?Training Data
Hold-Out Data
Test Data
Which data set is used for quantifying performance results?Training Data
Hold-Out Data
Test Data

Question 5: Linear Separability
Consider the data in the figure below.

The data is plotted as a function of two features, b and b. As plotted, the data is not linearly separable. Which of the following candidate features წ , when added, would cause the data to be linearly separable? Choose all possible answer(s).

წ o b
წ o sin  b
წ o bb bb
წ o bb
წ o b
წ o b
წ o b b
წ o b if b 䁮 䁮쳌 and b    䁮 䁮 n            香䁥   ā

More products