CSE176-Lab 2 Dataset and Gaussian CLassifier Solved
Construct your own toy datasets in 1D and 2D, such as Gaussian classes with more or less overlap, or classes with curved
shapes as in the 2moons dataset. You will also use the MNIST dataset of handwritten digits .
I Using a Gaussian classifier In a Gaussian classifier, for each class k = 1,...,K, the class-conditional probability distribution p(Ck|x) for a point x ∈ RD is a Gaussian distribution with parameters (µk,Σk) (mean vector of D × 1 and covariance matrix of D × D). Also, each class has another parameter, its proportion πk = p(Ck) ∈ [0,1] (its prior distribution). We will consider 6 types of covariance matrix Σk:
Non-shared (separately for each class) Shared (equal for all classes) full ’F’: Σk ’f’: Σk = Σ ∀k diagonal ’D’: Σk = Dk ’d’: Σk = D ∀k isotropic ’I’: Σk = σk2I ’i’: Σk = σ2I ∀k To use a Gaussian classifier, we need to solve two problems:
1. Training: to learn the parameters from a training set. This is given by the maximum likelihood estimate (MLE). For the prior distribution and mean vectors, this is:
prior distribution: mean vector: µ
class k
where the sum above is over the data points in class k. For the covariance matrix, the MLE depends on the type of covariance:
’F’: Σk = N1 Pclass k (xn − µk)(xn − µk)T (the covariance of points in class k) ’f’:
’D’: Dk = diag(Σk) (the diagonal elements of Σk) ’d’: diag(Σk).
’I’: , where σkd2 = dth diagonal element of Σk ’i’:
So the shared-case MLE is the weighted average of the non-shared MLE over the classes using πk as weights. Also, we ensure each covariance matrix is full-rank by adding to its diagonal a small positive number, e.g. 10−10.
2. Testing: to compute the posterior probabilities for a test point x (typically not in the training set, although it can be any point in RD). This is done by GMpdf.m, see lab02.m.
Given these posterior probabilities, we can classify x as argmaxk∈{1,...,K} p(C = k|x), or construct an ROC curve (for K = 2) or confusion matrix (for any K) over a test set.
For each classifier, we plot the following figures:
1
• For 1D datasets: p(x|C), p(x|C)p(C), p(C|x), for each class C = 1,...,k, and maxC p(C|x) for each x value.
• For 2D datasets: contour plot of p(x|C) for each class and class boundaries.
• For any dataset: we plot either the confusion matrix (for K 2) or the ROC curve (for K = 2) and give the area-under-the-curve (AUC) value.
Explore the classifiers and plots with different datasets, number of classes, classes with more or less overlap, etc. See the end of file lab02.m for suggestions of things to explore.