$25
Implement the EM algorithm for a p-dimensional Gaussian mixture model with G components:
G
X
pk · N(x;µk,Σ). k=1
Store the parameters as a list in R with three components
• prob: a G-dimensional probability vector;
• mean: a p-by-G matrix with the k-th column being µk, the p-dimensional mean for the k-th Gaussian component;
• Sigma: a p-by-p covariance matrix shared by all G components.
Your code should have the following structure.
Estep <- function(data, G, para){
# Your Code
# return the n-by-G probability matrix
}
Mstep <- function(data, G, para, post.prob){
# Your Code
# Return the updated parameters
}
myEM <- function(data, T, G, para){ for(t in 1:T){ post.prob <- Estep(data, G, para) para <- Mstep(data, G, para, post.prob)
} return(para)
}
You should test your code on the faithful data from the R package mclust with G = 2. The estimated parameters from your algorithm and the one from mclust after T = 10 iterations should be the same.
library(mclust)
n <- nrow(faithful) Z <- matrix(0, n, 2)
Z[sample(1:n, 120), 1] <- 1 Z[, 2] <- 1 - Z[, 1]
ini0 <- mstep(modelName="EEE", faithful, Z)$parameters
# Output from my EM alg
para0 <- list(prob = ini0$pro, mean = ini0$mean,
Sigma = ini0$variance$Sigma)
myEM(T=10, para=para0)
# Output from mclust
Rout <- em(modelName = "EEE", data = faithful, control = emControl(eps=0, tol=0, itmax = 10), parameters = ini0)$parameters
list(Rout$pro, Rout$mean, Rout$variance$Sigma)