Q1. Let x be a d-dimensional binary vector with a multivariate Bernoulli distribution
d
P(x|θ) = Yθixi(1 − θi)1−xi (1)
i=1
where θ = (θ1,...,θd)t is an unknown parameter vector, θi being the probability that xi = 1. Show that the maximum likelihood estimate for θ is
(2)
Q2. Show that if our model is poor, the maximum likelihood classifier we derive is not the best by exploring the following example. Suppose we have two equally probable categories. Further, we know that p(x|ω1)N˜(0,1) but assume that p(x|ω2)N˜(µ,1). Image however that the true underlying distribution is p(x|ω2)N˜(1,106).
a. What is the value of maximum likelihood estimation µML in our poor
model, given a large amount of data?
b. What is the decision boundary arising from this maximum likelihood
estimate in the poor model?
c. Give an expression for the optimal Bayes decision boundary given thetrue underlying distributions p(x|ω1)N˜(0,1) and p(x|ω2)N˜(µ,1). Compare this with part b.
Q3. Suppose we employ a novel method for estimating the mean of a data set D = x1,x2,...,xn; we assign the mean to be the value of the first point in the set ie. x1.
a. Show that this method is unbiased.
b. State why this method is nevertheless highly undesirable.
Ungraded
Q4. Derive MLE for Binomial distribution X Bin(N,µ)
P(X = m) = (NCm)µm(1 − µ)N−m (3)
The distribution gives the probability of observing m successes (say heads) in N independent Bernoulli trials. Since the heads can appear anywhere, NCm is the normalizing factor.