Recall the guidance regarding plagiarism in the course introduction: this applies to this homework and if evidence of plagiarism is detected it may result in penalties ranging from loss of marks to suspension.
1 [50 Marks] Expectation Maximisation
Consider a model with continuous observed variables x ∈ RD and hidden variables t ∈ {0,1}K and z ∈ RQ. The hidden variable t is a K-dimensional binary random variable with a 1-of-K representation, where tk ∈ {0,1} and Pk tk = 1, i.e. exactly one component of tk is equal to 1 while all others are equal to 0. The prior distribution over t is given by
p(tk = 1|θ) = πk, (1)
where mixing weights satisfy 0 ≤ πk ≤ 1 and = 1. This can also be
written in the form
K p(t|θ) = Yπktk.
(2)
k=1
Hidden variable z is a Q-dimensional continuous random variable with prior distribution
p(z|θ) = p(z) = N(0,I). (3)
The conditional likelihood of x given z and tk = 1 is a Gaussian defined as
p(x|z,tk = 1,θ) = N(x|Wkz + bk,Ψ), (4)
where Wk ∈ RD×Q, bk ∈ RD and Ψ ∈ RD×D is a diagonal covariance matrix. Another way to express this is
K
p(x|z,t,θ) = YN(x|Wkz + bk,Ψ)tk. (5)
k=1
Let us collectively denote the set of all observed variables by X and hidden variables by Z and T . The joint distribution is denoted by p(Z,T,X|θ), and is governed by the set of model parameters .
In the questions below, unless otherwise stated explicitly, you must show all your working. Omission of details or derivations may yield a reduction in the corresponding marks.
a) [5 marks] Draw the graphical representation for this probabilistic model, making sure to include the parameters θ in the graph. (Non-random variables can be included similarly to random variables, except that circles are not drawn around them).
b) [5 marks] In terms of K,D,Q, give an expression for the number of parameters we are required to estimate under this model.
c) [10 marks] In the E-step of the expectation maximization (em) algorithm, we are required to compute the expected sufficient statistics of the posterior over hidden variables. The posterior responsibility of mixture component k for a data-point n is expressed as
def old
rnk = p(tnk = 1|xn,θ ) = Ep(tnk|x,θold)[tnk]. (6)
The conditional posterior over local hidden factor zn is a Gaussian with mean mnk and covariance Cnk,
p(zn|tnk = 1,xn,θold) = N(zn|mnk,Cnk). (7)
The covariance is given by
C , (8)
where
def
mnk = Ep(zn|tnk=1,xn,θold)[zn], and S. (9) i) [5 marks] Give analytical expressions for the responsibilities rnk and the expected sufficient statistics mnk and Snk in terms of the old model parameters θold.
ii) [1 marks] To de-clutter notation and simplify subsequent analysis, it is helpful to introduce augmented factor loading matrix and hidden factor vector,
W˜ , and z˜ . (10)
Accordingly, give expressions for the sufficient statistics of the conditional posterior on augmented hidden factor vectors,
def E n nk n old ˜ m˜ nk = p(z˜ |t =1,x ,θ )[z˜n], and S
Note you need only express this in terms of mnk and Snk.
iii) [4 marks] Show that the sufficient statistics of the joint posterior factorise as follows,
.
d) [10 marks] Write down the full expression for the expected complete-data log likelihood (also known as auxiliary function) for this model,
Q(θ,θold) def= Ep(Z,T|X,θold)[logp(Z,T,X|θ)]. (12) e) [20 marks] Optimize the auxiliary function Q w.r.t. model parameters θ to obtain M-step updates. Show all your working and highlight each individual update equation.
2 [50 Marks] Practical Part
See Jupyter notebook comp9418 ass2.ipynb.