$30
This assignment concerns the scenario considered in the slides, where a coin Z is tossed to choose between one of two other coins A and B. The chosen coin is then tossed N times. The whole procedure is repeated D times.From the complete data set it would be easy to determine the probabilities of Z indicating A, and for each of A and B, their probabilities of turning up heads. The hidden variable variant is where as data you just know on each trial what the N coin tosses yielded with the chosen coin, but you don’t know which coin was being tossed: the outcome on Z is hidden.
Suppose the following data set, where there are just 2 trials, and the chosen coin is tossed just 2 times (this was considered in the slides):
d Z tosses of chosen coin
1 ? H H
2 ? T T
Below there is a detailed working through of a first iteration of the EM procedure for estimating parameters applied to this. For the assignment you have to give a similar working through of the second iteration. Suppose the following notation
θa
P(Z = a)
θb
P(Z = b)
θh|a
P(h|a) ie. the head prob of coin A
θt|a
P(t|a) ie. the tail prob of coin A
θh|b
P(h|b) ie. the head prob of coin B
θt|b
P(t|b) ie. the tail prob of coin B
#(d,h)
num of heads in trial d
#(d,t)
num of tails in trial d
If Xd is a particular trial – ie. outcomes of the N tosses of a chosen coin – the probability of the version where the chosen coin was A is given by
P(Z = a,Xd) = P(Z = a) × P(h|a)#(d,h) × P(t|a)#(d,t)
= θa × θh#(|ad,h) × θt#(|ad,t)
and likewise the probability of the version where the chosen coin was B is given by
P(Z = b,Xd) = P(Z = b) × P(h|b)#(d,h) × P(t|b)#(d,t)
= θb × θh#(|bd,h) × θt#(|bd,t)
From these joint probability formula can work out the conditional probabilities for the hidden variable:
P(Z = a|Xd)
=
P(Z = a,Xd)
Pc P(Z = c,Xd)
P(Z = b|Xd)
=
P(Z = b,Xd)
Pc P(Z = c,Xd)
In the slides we used the notation γd(Z) for this, where d is index of the data item.
On the particular data set at hand the joint probability formulae are particularly simple
P(Z = a,X1) = θa × θh2|a P(Z = b,X1) = θb × θh2|b
P(Z = a,X2) = θa × θt2|a
P(Z = b,X2) = θb × θt2|b
and thus the conditional probalities are:
γ1(a) =
θa × θh2|a
θa × θh2|a + θb × θh2|b
γ1(b) =
θb × θh2|b
θa × θh2|a + θb × θh2|b
γ2(a) =
θa × θt2|a
θa × θt2|a + θb × θt2|b
γ2(b) =
θb × θt2|b
θa × θt2|a + θb × θt2|b
To carry out an EM estimation of the parameters given the data we need some initial setting of the parameters. We will suppose this is:
,
θh|a = 43, θt|a = 14 θh|b = 21, θt|b = 21
ITERATION 1
For each piece of data have to first compute the conditional probabilities of the hidden variable given the data:
d = 1 : p(Z = A,HH) = 0.5 × 0.75 × 0.75 = 0.28125 d = 1 : p(Z = B,HH) = 0.5 × 0.5 × 0.5 = 0.125 d = 1 :→ sum = 0.40625 d = 1 :→ γ1(A) = 0.692308 d = 1 :→ γ1(B) = 0.307692 d = 2 : p(Z = A,TT) = 0.5 × 0.25 × 0.25 = 0.03125 d = 2 : p(Z = B,TT) = 0.5 × 0.5 × 0.5 = 0.125 d = 2 :→ sum = 0.15625
d = 2 :→ γ2(A) = 0.2
d = 2 :→ γ2(B) = 0.8
Armed with these γ values we now treat each data item Xd as if it splits into two versions, one filling out Z as a, and with ’count’ γd(a), and one filling out Z as b, and with ’count’ γd(b).
We then go through this virtual corpus accumulating counts of certain kinds of event. For events of hidden variable being Z = a and Z = b we get
E(A) = γ1(a) + γ2(a) = 0.692308+ 0.2 = 0.892308
E(B) = γ1(b) + γ2(b) = 0.307692+ 0.8 = 1.10769
Then we need to go through the Z = a cases and count types of coin toss, and likewise for Z = b cases
E(A,H) = Pd γd(a)#(d,h) = (0.692308× 2 + 0.2 × 0) = 1.38462
E(A,T) = Pd γd(a)#(d,t) = (0.692308 × 0 + 0.2 × 2) = 0.4
E(B,H) = Pd γd(b)#(d,h) = (0.307692× 2 + 0.8 × 0) = 0.615385 E(B,T) = Pd γd(b)#(d,t) = (0.307692× 0 + 0.8 × 2) = 1.6 Then from these ’expected’ counts we re-estimate parameters
est(θa) = E(A)/2 = 0.892308/2 = 0.446154 est(θb) = E(B)/2 = 1.10769/2 = 0.553846 est(θh|a) = E(A,H)/PX[E(A,X)] = 1.38462/(1.38462+ 0.4) = 1.38462/1.78462 = 0.775862 est(θt|a) = E(A,T)/PX[E(A,X)] = 0.4/(1.38462+ 0.4) = 0.4/1.78462 = 0.224138 est(θh|b) = E(B,H)/PX[E(B,X)] = 0.615385/(0.615385+1.6) = 0.615385/2.21538 = 0.277778 est(θt|b) = E(B,T)/PX[E(B,X)] = 1.6/(0.615385+ 1.6) = 1.6/2.21538 = 0.722222
Note the denominator 2 in the re-estimation formula for θa. We could have written the denominator as E(A) + E(B), but this is Pd γd(a) + Pd γd(b) = Pd[γd(a) + γd(b)] = Pd[1] = 2