Starting from:

$30

DataAnalytics-3333 Assignment 3 -Solved

333
Given the following data, please apply the Fisher linear discriminant method. There are two classes, C1 and 
C2. The class C1 has five observations: 

 
The class C2 has six observations: 

 
a) Comute the mean of the first class µ1, and the mean of the second class mu2. 
b) Compute the within class variation Sw = S1 + S2, where S1 and S2 are the variations within C1 and C2, 
respectively. 
c) Find the optimum projection v which can lead to the maximum separation of the projected observations. 
d) Find the cutoff point 
1

vT µ1 + 
1

vT µ2. 
e) Given a new observation (5, 3), which class does it belong to? 
Solution 
Part A 
mu_1 = c(mean(c1[,1]), mean(c1[,2])) 
mu_2 = c(mean(c2[,1]),mean(c2[,2])) 
cbind(mu_1, mu_2) 
## mu_1 mu_2 
## [1,] 4 4.5 
## [2,] 8 3.0 
As shown we are taking the mean of each column for C1 and C2. 
1Solution 
3333 Assignment 3 Ravish Kamath 213893664 
Part B 
S_1 = 4*cov(c1) 
S_2 = 5*cov(c2) 
S_w = S_1 + S_2 
S_w 
## [,1] [,2] 
## [1,] 27.5 35 
## [2,] 35.0 62 
As shown above we have Sw be: 
 27
.5 35 
35 62 

Part C 
S_w_inv = solve(S_w) 
v = t(S_w_inv%*%(mu_1 - mu_2)) 

## [,1] [,2] 
## [1,] -0.4291667 0.3229167 
Here we have our v matrix be: 
 −0
.
4291667 
0
.
3229167  
Part D 
(1/2)*v%*%(mu_1 + mu_2) 
## [,1] 
## [1,] -0.04791667 
Here the cutoff point is -0.04791667. 
Part E 
new = c(5,3) 
v%*%new 
## [,1] 
## [1,] -1.177083 
Here we have our final output be -1.177083 which will belong to Class 2. 
2 23333 Assignment 3 Ravish Kamath 213893664 
Question 2 
In the forensic glass example, we classify the type of the glass shard into six categories based on three 
predictors. The categories are: WinF, WinNF, Veh, Con Tabl and Head. The three predictors are the mineral 
concentrations of Na, Mg, and Al. Attached is the R output of the multinomial logistic regression. The R 
function vglm considers the last group as the baseline category. The estimates of the five intercepts and 
the estimates of the 15 slopes are provided in the output. The model contains 20 parameters, which are 
estimated on 214 cases. 
a) Let pij denote the probability that the ith observation belongs to class j. Formulate the logistic model for 
the five log odds: log pi

pi6 
, log p
i

p
i6 
,log p
i

p
i

, log p
i

p
i

, log p
i

p
i6 

b) The i th piece of glass shard is obtained and the Na, Mg, Al concentrations are: 0.20, 0.06, and 0.11, 
respectively. Calculate the probabilities pi1, pi2,pi3, pi4, pi5, and pi6. Based on the predicted class probability, 
which type of glass does this piece of glass belong to? 
Solution 
Part A 
log 
p
i1 
pi6 
= 1.613703 + (−2.483557)N a + (3.842907)Mg + (−3.719793)Al 
log 
p
i2 
pi6 
= 3.444128 + (−2.031676)N a + (1.697162)Mg + (−1.704689)Al 
log 
p
i3 
pi6 
= 0.999448 + (−1.409505)N a + (3.291350)Mg + (−3.006102)Al 
log 
p
i4 
pi6 
= 0.067163 + (−2.382624)N a + (0.051466)Mg + (0.263510)Al 
log 
p
i5 
pi6 
= 0.339579 + (0.151459)N a + (0.699274)Mg + (−1.394559)Al 
3 3Solution 
3333 Assignment 3 Ravish Kamath 213893664 
Part B 
x <- c( 1, 0.2, 0.06, 0.11) 
intercept = c(1.613703, 3.444128, 0.999448, 0.067163, 0.339579) 
na = c(-2.483557, -2.031676, -1.409505, -2.382624, 0.151459) 
mg = c( 3.842907, 
1.697162, 3.29135, 0.051466, 0.699274) 
al = c(-3.719793, -1.704689, -3.006102 ,0.26351, -1.394559) 
theta = cbind(intercept, na, mg, al) 
xtheta = theta%*%x 
sum = 1/ (1 + exp(xtheta[1]) 
+ exp(xtheta[2]) 
+ exp(xtheta[3]) 
+ exp(xtheta[4]) 
+ exp(xtheta[5])) 
p_i = rep(0,5) 
for (i in 1:5){ 
p_i[i] = exp(xtheta[i]) * sum 

p_i = round(p_i,4) 
data.frame(p_i) 
## p_i 
## 1 0.0965 
## 2 0.7231 
## 3 0.0678 
## 4 0.0259 
## 5 0.0489 
As we can see from the result of the above code, we have pi1 = 0.0965, pi2 = 0.7231, pi3 = 0.0678, pi4 = 0.0259 
and pi5 = 0.0489. Based on the predicted class probability, this would be associated with WinNF. 
4 43333 Assignment 3 Ravish Kamath 213893664 
Question 3 
a. In this question, we consider the discrimant analysis method for multivariate normal data. Given 
C1, C2, ..., CK classes, we assign the prior probabilities to each class P(Cj ), j = 1, ..., K. Given that X 
belongs to class Cj , the conditional distribution of X is a multivariate normal with the mean µj , and 
the covariance matrix Σj . Then based on the Bayes formula, 
P(Cj |X) = 
P(Cj )P(X|Cj ) 
P Kj=1 P(Cj′ )P(X|C
j′ 

Then we can use P(Cj |X) as the discriminant function. We assign 

to class 
j if P(Cj |X) > P(Cj′ |X), for 
any other classes. As the denominator is a constant which does not depend on j, we can use P(Cj )P(X|Cj ) 
as the discriminant function. Or equivalently we can use logP(X|Cj ) + logP(Cj ). The discriminant function 
is denoted by gj (X). 
gj (X) = logP(X|Cj ) + logP(Cj ) 

−1

(X − µj )T Σ−j 1(X − µj ) − 
1

log|Σj | + logP(Cj ) 
Consider the case that Σj = σ2I. In this case, all the predictors are independent with different means and 
equal variances σ2 . Please simplify gj (X) and show that it is a linear function of X. 
b. In this example, we have three classes, each is a 2-dim Gaussian distribution, with µ1 = (2, −1)T , 
µ2 = (4, 3)T , µ3 = (2, 3)T , Σ1 = Σ2 = Σ3 = 2I2 where I2 is an identity matrix of dimension 2 × 2. We 
assume the priors P(C1) = P(C2) = 14 , and P(C3) = 
1

. Let X = (0.5, 0.4)T . Calculate g1(X), g2(X), 
and g3(X). Classify the observation X to one of the classes. 
Solution 
Part A 
gj (X) = logP(X|Cj ) + logP(Cj ) 

−1

(X − µj )T Σ−j 1(X − µj ) − 
1

log|Σj | + logP(Cj ) 

−1 
2σ2 
(X − µj )T (X − µj ) + 1

logP(Cj ) 

−1 
2
σ2 
(XT X − 2µTj X + µTj µj ) + logP(Cj ) 

1
σ
2 µTj X − 
1

2 µTj µj + logP(Cj ) + c 
c is a constant 

1
σ
2 µTj X + (− 
1

2 µTj µj + logP(Cj )) + c 
= wjT X + wj0 
5 5Solution 
3333 Assignment 3 Ravish Kamath 213893664 
Part B 
mu_1 = c(2,-1) 
mu_2 = c(4,3) 
mu_3 = c(2,3) 
p_C1 = 1/4 
p_C2 = 1/4 
p_C3 = 1/2 
x = c(0.5, 0.4) 
g1 = round(1/2*(t(mu_1)%*%x) - 1/(2*2)*(t(mu_1)%*%mu_1) + log(1/4), 4) 
g2 = round(1/2*(t(mu_2)%*%x) - 1/(2*2)*(t(mu_2)%*%mu_2) + log(1/4), 4) 
g3 = round(1/2*(t(mu_3)%*%x) - 1/(2*2)*(t(mu_3)%*%mu_3) + log(1/2), 4) 
cbind(g1, g2, g3) 
## [,1] [,2] [,3] 
## [1,] -2.3363 -6.0363 -2.8431 
As shown above we have g1(X) = −2.3363, g2(X) = −6.0363 and g3(X) = −2.8431. Observation X will be 
classified to Class 1. 
6 63333 Assignment 3 Ravish Kamath 213893664 
Question 4 
Analyze the student math performance test. Apply the linear discriminant analysis and quadratic discriminant 
analysis on the dataset. The response variable is ”schoolsup” and the three predictors are ”G1”, ”G2” and 
”G3”. Please randomly select 300 observations as the training set and use your two models to predict the 
default status of the remaining students. Repeat this cross-validation five times and calculate the average 
misclassification errors of the two models. Which method performs better for this data set, the linear 
discriminant analysis or the quadratic discriminant analysis? 
Solution 
set.seed(10) 
#Linear discriminant model 
model1 = lda(schoolsup ~ G1 + G2 + G3, data = df) 
model1 
## Call: 
## lda(schoolsup ~ G1 + G2 + G3, data = df) 
## 
## Prior probabilities of groups: 
## no yes 
## 0.8708861 0.1291139 
## 
## Group means: 
## G1 G2 G3 
## no 11.180233 10.883721 10.561047 
## yes 9.078431 9.568627 9.431373 
## 
## Coefficients of linear discriminants: 
## LD1 
## G1 -0.52054302 
## G2 0.07328696 
## G3 0.17578114 
#Quadratic discriminant model 
model2 = qda(schoolsup ~ G1 + G2 + G3, data = df) 
model2 
## Call: 
## qda(schoolsup ~ G1 + G2 + G3, data = df) 
## 
## Prior probabilities of groups: 
## no yes 
## 0.8708861 0.1291139 
## 
## Group means: 
## G1 G2 G3 
## no 11.180233 10.883721 10.561047 
## yes 9.078431 9.568627 9.431373 
7 7Solution 
3333 Assignment 3 Ravish Kamath 213893664 
rep = 1000 
errlin = dim(rep) 
errqua = dim(rep) 
for (i in 1: 5){ 
training = sample(1:395, 300) 
trainingset = df[training,] 
testingset = df[-training,] 
# linear discriminant analysis 
m1 = lda(schoolsup ~ G1 + G2 + G3, data = trainingset) 
pred_lin = predict(m1, testingset)$class 
tablin = table(testingset$schoolsup, pred_lin) 
errlin[i] = (95 - sum(diag(tablin)))/95 
#Quadratic discriminant analysis 
m2 = qda(schoolsup ~ G1 + G2 + G3, data = trainingset) 
pred_quad = predict(m2, testingset)$class 
tablquad = table(testingset$schoolsup, pred_quad) 
errqua[i] = (95 - sum(diag(tablquad)))/95 

merrlin = mean(errlin) 
merrqua = mean(errqua) 
cbind(merrlin, merrqua) 
## merrlin merrqua 
## [1,] 0.1389474 0.1410526 
Based on the results of doing cross validation, we can see that performing a linear discriminant analysis leads 
to less classifications than the quadratic discriminant analysis. 
8 83333 Assignment 3 Ravish Kamath 213893664 
Question 5 
Suppose we have 2-classes observations with p-dimensional predictors. We have samples x1, ..., xn, with n1 
samples from Class 1 and n2 samples from Class 2. Let v be a unit vector. The projection of sample xi onto 
a line in direction v is given by the inner product of yi = vT xi .Let µ1 and µ2 be the means of class 1 and 
class 2. Let 
˜
µ
1 and 
˜
µ
2 be the mean of the projections of class 1 and class 2. Denote the variance of the 
projected samples of class 1 is S˜
2

= P xi∈C1 (yi − 
˜
µ
1)2 and the variance of the projected samples of class 2 is 

2

= P xi∈C2 (yi − 
˜
µ
2)2 . The Fisher linear discriminant is to project to a direction 
v which maximizes: 
J(v) = 

µ1 − 
˜
µ
2)2 
s˜2
1 + ˜
s2

Let the variance of the original samples of class 1 be S
2



x

∈ 
C
1(xi − µ1)(xi − µ1)T and the variance of 
the original samples of class 2 be S22 = P xi∈C2 (xi − µ2)(x
i − 
µ
2
)T 
. Define the within class variation: 
Sw = S1 + S2 
Define the between the class variation: Sb = (µ1 − µ2)(µ1 − µ2)T . Prove the objective function can be 
simplified as: 
J(v) = 
vT S
bv 
v
T Sw

Solution 
Sb measures the separation between the 2 classes before projection. 

µ1 − 
˜
µ
2)2 = (vT − vT µ2)2 
= (vT (µ1 − µ2))((µ1 − µ2)T v) 
= vT (µ1 − µ2)(µ1 − µ2)T v 
= vT Sbv 
J(v) = 

µ1 − 
˜
µ
2)2 

2

+ S˜
2


vT S
b

˜
S
2


˜
S
2


vT S
B

v
T S1v + 
v
T S2v 
s˜2
1 = X 
x
i∈C1
(vT xi − vT µ1)2 = vT S1vT 

vT S
B

v
T (S1 + 
S
2)v 
s˜2
2 = X 
x
i∈C2
(vT xi − vT µ1)2 = vT S2vT 

vT S
B

v
T S
wv 
9 9

More products