Starting from:

$30

MATH4630 Assignment 1 -Solved

Question 1: Let 
A = 0
B

210 
141 
012 
1
CA 
a. For the following questions, you have to clearly show your work. 
1. Find the eigenvalues and eigenvectors for A. 
2. Find the square root of A using Cholesky decomposition. 
3. Find the square root of A using spectral decomposition. 
b. Is A a positive defifinite matrix? Why or why not? 
c. Use any software to verify your answers in part (a). 
Question 2: Let A be a (p ⇥ p) matrix and is partitioned into 
A = A
11 A
12 
A21 
A22 

where A11 is a (p1 ⇥ p1) matrix, A22 is a (p2 ⇥ p2) matrix, and p1 + p2 = p. Similarly, let B 
be a (p ⇥ p) matrix and is partitioned into 
B = B
11 B
12 
B21 
B22 

where B11 is a (p1 ⇥ p
1) matrix, B22 is a (p2 ⇥ p2) matrix, and p1 + p2 = p. Assume 
A11, A22, B11, and B22 
are non singular matrices. 
a. Denote Ip be the (p ⇥ p) identity matrix. Let AB = Ip. Express Bij in terms of Aij 
for all i, j = 1, 2. 
b. Let BA = Ip. Express Bij in terms of Aij for all i, j = 1, 2. 
c. Show that 
|A| = |A22| |A11 − A12A
−1 
22 
A21| = |A11| |A22 − A21A− 11 1A12| 
d. Show that 
B11 = A− 11 1 + A− 11 1A12(A22 − A21A− 11 1A12)−1A21A− 11 1 
and 
B22 = A
−1 
22 
+ A
−1 
22 
A21(A11 − A12A
−1 
22 
A21)−1A12A
−1 
22 
1Question 3: Let 
X

= X1 
g
X



⇠ 
N

 
12 
⌃21 
⌃22 
!! 
where X1 

is a p1-dimensional vector, 
X


is a 
p2-dimensional vector, 
⌃11 is a (p1 ⇥p1) matrix, 
⌃22 is a (
p

⇥ p2
) matrix, and 
p


p2 

p
. Show that the conditional mean and variance of 
X1 

given X



x


are 
µ1 

+ ⌃12⌃
−1 
22 
(x2 

− µ2 

) and ⌃11 − ⌃12⌃
−1 
22 
⌃21 
respectively. 
Question 4: Consider the following data set: 
x1:3 3 4 5 6 8 
x2: 17.95 15.54 14.00 12.95 8.94 7.49 
For the following questions, you have to clearly show your steps. Computer commanda and 
print out is not accepted. 
a. Find the sample mean vector. 
b. Find the sample unbiased variance matrix. 
c. Report the squared statistical distances (x
ej − 
¯
x
e
)0S−1(x
ej − 
¯
x

) for j = 1,..., 6. 
d. Assume the data set is from a bivariate normal distribution. 
1. Describe how you would estimate the 50% probability contour of the population 
mean vector. 
2. At 5% level of signifificance, is there signifificant evidence that the population mean 
vector is di↵erent from (3, 10)0. 
2Question 5: Data are given in the excel fifile. 
a. Using a graphical method to check if the data of East is a sample from the normal 
distribution. How about data of South, West, and North? 
b. Regardless of your result in part (a), obtain the 95% confifidence interval for the mean of 
(1) North (2) South (3) East (4) West . 
Clearly state the necessary assumptions needed for your answer. 
c. Considering the data set as a multivariate data set. Use a software and report the 
sample mean vector, sample covariance matrix and sample correlation matrix. 
d. Use a graphical method to check if the data set is a sample from a multivariate normal 
distribution. 
e. Obtain the equation for obtaining the 95% confifidence region for the population mean 
vector, µ

= (µN , µS, µE, µW )0. (No calculations needed. Just the equations.) Clearly 
state the necessary assumptions needed for your answer. 
f. At 5% level of signifificance, test 
H0 : µ

= (1450, 1900, 1700, 1700)0 vs Ha : µ
e 6
= (1450
, 1900
, 1700
, 1700)0. 
g. Based on the your answer in part (f), is 
µ

= (1450
, 1900, 1700
, 1700)
0 falls within the 
95% confifidence region of µ

obtained in part (e)? Why or why not? 
34630 Assignment 1 R Code 
Ravish Kamath: 213893664 
02 October, 2022 
## *** Package RVAideMemoire v 0.9-81-2 *** 
Question 1 
Let 
A = Q

260 
141 
012 
R

(a) For the following questions, you have to clearly show your work. 
1. Find the eigenvalues and eigenvectors for A. 
2. Find the square root of A using Cholesky decomposition. 
3. Find the square root of A using spectral decomposition. 
(b). Is A a positive defifinite matrix? Why or why not? 
(c). Use any software to verify your answers in part (a). 
Solution 
Part A 
Please refer to the handwritten solution 
Part B 
Yes A is a positive defifinite matrix because its’ eigenvalues are strictly positive values. Hence, 
⁄1 = 2, ⁄2 = 
3 + 
Ô3, ⁄3 = 3 ≠ Ô3 > 0. 
Part C 
A = matrix(c(2,1,0,1,4,1,0,1,2), nrow = 3, 
ncol = 3, byrow = TRUE) 
1Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
Getting the eigen values and the eigen vectors 
eigen(A) 
## eigen() decomposition 
## $values 
## [1] 4.732051 2.000000 1.267949 
## 
## $vectors 
## [,1] [,2] [,3] 
## [1,] 0.3250576 7.071068e-01 0.6279630 
## [2,] 0.8880738 -3.140185e-16 -0.4597008 
## [3,] 0.3250576 -7.071068e-01 0.6279630 
Here is the Cholesky Decomposition 
t(chol(A)) 
## [,1] [,2] [,3] 
## [1,] 1.4142136 0.0000000 0.000000 
## [2,] 0.7071068 1.8708287 0.000000 
## [3,] 0.0000000 0.5345225 1.309307 
Finally, here is the Spectral Decomposition 
ev = eigen(A) 
L = ev$values 
V = ev$vectors 
D = diag(L) 
sqrtD = sqrt(D) 
sqrtD 
## [,1] [,2] [,3] 
## [1,] 2.175328 0.000000 0.000000 
## [2,] 0.000000 1.414214 0.000000 
## [3,] 0.000000 0.000000 1.126033 
sqrtA = V%*%sqrtD%*%t(V) 
sqrtA 
## [,1] [,2] [,3] 
## [1,] 1.38099412 0.3029054 -0.03321944 
## [2,] 0.30290545 1.9535856 0.30290545 
## [3,] -0.03321944 0.3029054 1.38099412 
all.equal(A, zapsmall(sqrtA%*%t(sqrtA)) ) 
## [1] TRUE 
24630 Assignment 1 R Code Ravish Kamath 213893664 
Question 2 
Let A be a (p ◊ p) matrix and is partitioned into 
A = 3 A
11 
A
12 
A21 
A22 

where A11 is a (p1 ◊ p1) matrix, A22 is a (p2 ◊ p2) matrix, and p1 + p2 = p. Similarly, let B be a (p ◊ p) 
matrix and is partitioned into 
A = 3 B
11 
B
12 
B21 
B22 

where B11 is a (p1 ◊ p1) matrix, B22 is a (p2 ◊ p2) matrix, and p1 + p2 = p. Assume A11, A22, B11, and B22 
are non singular matrices. 
(a) Denote Ip be the (p ◊ p) identity matrix. Let AB = Ip. Express Bij in terms of Aij for all i, j = 1, 2. 
(b) Let BA = Ip.Express Bij in terms of Aij for all i, j = 1, 2. 
(c) Show that 
|A| = |A22||A11 ≠ A12A
≠1 
22 
A21| = |A11||A22 ≠ A21A≠ 11 1A12| 
(d) Show that 
B11 = A≠ 11 1 + A≠ 11 1A12(A22 ≠ A21A≠ 11 1A12)≠1A21A≠ 11 1 
and 
B22 = A
≠1 
22 
+ A
≠1 
22 
A21(A11 ≠ A12A
≠1 
22 
A21)≠1A12A
≠1 
22 
Solution 
Part A 
Please refer to the handwritten notes 
Part B 
Please refer to the handwritten notes 
Part C 
Please refer to the handwritten notes 
Part D 
Please refer to the handwritten notes 
34630 Assignment 1 R Code Ravish Kamath 213893664 
Question 4 
Consider the following data set: 
x1: 3 3 4 5 6 8 
x2: 17.95 15.54 14.00 12.95 8.94 7.49 
For the following questions, you have to clearly show your steps. Computer commanda and print out is not 
accepted. 
(a) Find the sample mean vector. 
(b) Find the sample unbiased variance matrix. 
(c) Report the squared statistical distances (xj ≠ 
˜
x
)ÕS≠1(xj ≠ 
˜
x
) for j = 1, ..., 6. 
(d) Assume the data set is from a bi variate normal distribution. 
1. Describe how you would estimate the 50% probability contour of the population mean vector. 
2. At 5% level of signifificance, is there signifificant evidence that the population mean vector is difffferent 
from (3, 10)Õ . 
Solution 
X = matrix(c(3, 17.95, 3, 15.54, 4, 14, 5, 12.95, 6, 8.94, 
8, 7.49), nrow = 6, ncol = 2, byrow = TRUE) 
n = 6 
p = 2 
Part A 
Please refer to the handwritten notes, but here is the optional R code as well. 
vec1 = matrix(1, 6, 1) 
xbar = 1/6*t(X)%*%vec1 
xbar 
## [,1] 
## [1,] 4.833333 
## [2,] 12.811667 
Part B 
Please refer to the handwritten notes, but here is the optional R code as well. 
M = t(X)%*%X 
L = xbar%*%t(xbar) 
N = 6*L 
S = 1/5*(M-N) 

## [,1] [,2] 
## [1,] 3.766667 -7.351667 
## [2,] -7.351667 15.717497 
Part C 
Please refer to the handwritten notes, but here is the optional R code as well. 
4Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
S_inv = solve(S) 
S_inv 
## [,1] [,2] 
## [1,] 3.048645 1.4259664 
## [2,] 1.425966 0.7306017 
vec1 = matrix(1, 6, 1) 
r = X[,1] - xbar[1,] 
t = X[,2] - xbar[2,] 
centered_mat = cbind(r,t) 
distance = centered_mat%*%S_inv%*%t(centered_mat) 
diag(distance) 
## [1] 2.6705244 1.4200802 0.3246180 0.1644184 2.2190979 3.2012612 
5Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
Part D.1 
plot(xbar[1], xbar[2], type="p", xlim=c(0, 10), 
ylim=c(5, 20), xlab="mu1", ylab="mu2") 
mu1 = matrix(seq(-5, 20, 0.05), ncol=1, byrow=T) 
nmu1 = nrow(mu1) 
mu2 = matrix(seq(5, 20, 0.05), ncol=1, byrow=T) 
nmu2 = nrow(mu2) 
for (i in 1:nmu1) { 
for (j in 1:nmu2) { 
mu = matrix(c(mu1[i, 1], mu2[j, 1]), ncol=1, byrow=T) 
Fcomp = c((n-p)/((n-1)*2)*(n*t(xbar-mu)%*%solve(S)%*%(xbar-mu))) 
Fcrit = qf(0.50, p, n-p) 
if (Fcomp < Fcrit) points(mu1[i, 1], mu2[j, 1], pch="*") 


points(xbar[1], xbar[2], col='red') 





10 
mu1 
*

Part D.2 
Let 
H0 : µ = 3 

10 
4 Ha : µ 




10 

mu0 = matrix(c(3, 10), ncol=1, byrow=T) 
Tobs = n*t(xbar-mu0)%*%S_inv%*%(xbar-mu0) 
Tobs 
## [,1] 
## [1,] 184.341 


10 
15 
20 
mu2Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
Fcriticalvalue = (n-1)*p/(n-p)*qf(p = 0.05, df1 = p, df2 = n-p, lower.tail = FALSE) 
Fcriticalvalue 
## [1] 17.36068 
pvalue = 1-pf((n-p)/((n-1)*2)*Tobs, p, n-p) 
pvalue 
## [,1] 
## [1,] 0.0006973496 
As we can see that since our observed Hotelling squared statistic is larger than the critical value, we can say 
that we will reject H0 and say that there is evidence that the population mean is difffferent from µ0 = (3, 10)Õ . 
74630 Assignment 1 R Code Ravish Kamath 213893664 
Question 5 
Data are given in the excel fifile. 
(a) Using a graphical method to check if the data of East is a sample from the normal distribution. How 
about data of South, West, and North? 
(b) Regardless of your result in part (a), obtain the 95% confifidence interval for the mean of 
(1)North 
(2)South 
(3)East 
(4)W est 
Clearly state the necessary assumptions needed for your 
(c) Considering the data set as a multivariate data set. Use a software and report the sample mean vector, 
sample covariance matrix and sample correlation matrix. 
(d) Use a graphical method to check if the data set is a sample from a multivariate normal distribution. 
(e) Obtain the equation for obtaining the 95% confifidence region for the population mean vector, 
˜
µ = 
(µN , µS, µE, µW )Õ . (No calculations needed. Just the equations.) Clearly state the necessary assumptions 
needed for your answer. 
(f) At 5% level of signifificance, test 
H0 : 
˜
µ = (1450, 1900, 1700, 1700)Õ vs Ha : 
˜
µ 

= (1450
, 1900, 1700, 1700)Õ . 
(g) Based on the your answer in part (f), is 
˜
µ = (1450, 1900, 1700, 1700)Õ falls within the 95% confifidence 
region of 
˜
µ obtained in part (e)? Why or why not? 
Solution 
Let it be known that the excel dataset is called df. 
df = data.frame(df) 
X = data.matrix(df) 
n = dim(df)[1] 
p = dim(df)[2] 
Part A 
par(mfrow = c(2,2)) 
qqnorm(df$East, main = 'EAST') 
qqline(df$East) 
qqnorm(df$South, main = 'South') 
qqline(df$South) 
qqnorm(df$West, main = 'West') 
qqline(df$West) 
qqnorm(df$North, main = 'North') 
qqline(df$North) 
7Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
−2 
−1 



EAST 
Theoretical Quantiles 
−2 
−1 



South 
Theoretical Quantiles 
−2 
−1 



West 
Theoretical Quantiles 
−2 
−1 



North 
Theoretical Quantiles 
I would advise that the sample from East is not from a normal distribution, however the rest of the direction 
variables does appear to be normally distributed based offff the above QQ-plots. 
Part B 
Our assumptions are that the data from each direction is normally distributed and the variance is unknown. 
onemat = matrix(1, n, 1) 
xbar =1/n*t(X)%*%onemat 
xbar 
## [,1] 
## North 1463.95 
## South 1888.60 
## East 1734.40 
## West 1701.95 
alpha = 0.05 
degrees.freedom = n - 1 
t.score= qt(p = alpha/2, df = degrees.freedom, lower.tail = F) 
t.score 
## [1] 2.093024 
North C.I. 
sample.sd = sd(df$North) 
sample.se = sample.sd/sqrt(n) 
sample.se 
## [1] 67.8844 

1500 
Sample Quantiles 
1500 
3000 
Sample Quantiles 
1200 
2200 
Sample Quantiles 
1000 
2000 
Sample QuantilesSolution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
lower_bound = xbar[1] - t.score*sample.se 
upper_bound = xbar[1] + t.score*sample.se 
c(lower_bound, upper_bound) 
## [1] 1321.866 1606.034 
Therefore the C.I.for the mean of North would be (1321.866, 1606.034). 
South C.I. 
sample.sd = sd(df$South) 
sample.se = sample.sd/sqrt(n) 
sample.se 
## [1] 77.30495 
lower_bound = xbar[2] - t.score*sample.se 
upper_bound = xbar[2] + t.score*sample.se 
c(lower_bound, upper_bound) 
## [1] 1726.799 2050.401 
Therefore the C.I. for the mean of South would be (1726.799, 2050.401). 
East C.I. 
sample.sd = sd(df$East) 
sample.se = sample.sd/sqrt(n) 
sample.se 
## [1] 76.48465 
lower_bound = xbar[3] - t.score*sample.se 
upper_bound = xbar[3] + t.score*sample.se 
c(lower_bound, upper_bound) 
## [1] 1574.316 1894.484 
Therefore the C.I. for the mean of East would be (1574.315, 1894.484). 
West C.I. 
sample.sd = sd(df$West) 
sample.se = sample.sd/sqrt(n) 
sample.se 
## [1] 76.20324 
lower_bound = xbar[4] - t.score*sample.se 
upper_bound = xbar[4] + t.score*sample.se 
c(lower_bound, upper_bound) 
## [1] 1542.455 1861.445 
Therefore the C.I. for the mean of West would be (1542.455, 1861.445). 
9Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
Part C 
Sample Mean Vector 
xbar =1/n*t(X)%*%onemat 
xbar 
## [,1] 
## North 1463.95 
## South 1888.60 
## East 1734.40 
## West 1701.95 
Sample Variance-Covariance Matrix 
M = t(X)%*%X 
L = xbar%*%t(xbar) 
N = n*L 
S = 1/(n - 1)*(M-N) 

## North South East West 
## North 92165.84 91525.08 76724.18 93988.10 
## South 91525.08 119521.09 108840.91 103275.98 
## East 76724.18 108840.91 116998.04 85358.18 
## West 93988.10 103275.98 85358.18 116138.68 
Sample Correlation Matrix 
variances = diag(S) 
D = matrix(diag(variances),ncol=4) 
D_sqrt = sqrt(D) 
D_sqrt_inv = solve(D_sqrt) 
samp_cor = D_sqrt_inv%*%S%*%D_sqrt_inv 
samp_cor 
## [,1] [,2] [,3] [,4] 
## [1,] 1.0000000 0.8720329 0.7388529 0.9084467 
## [2,] 0.8720329 1.0000000 0.9204084 0.8765740 
## [3,] 0.7388529 0.9204084 1.0000000 0.7322635 
## [4,] 0.9084467 0.8765740 0.7322635 1.0000000 
10Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
Part D 
library(RVAideMemoire) 
mqqnorm(X, main = 'Multi-normal Q-Q plot') 




10 
Multi−normal Q−Q plot 
χ2  quantiles 

11 
## [1] 7 11 
Part E 
Please check the handwritten notes. Here is the R code for retrieving the S≠1. Our assumptions are that the 
data is multivariate normally distributed and the variance-covariance matrix is unknown. 
S_inv = solve(S) 
S_inv 
## North South East West 
## North 7.349858e-05 -3.158687e-05 8.816218e-06 -3.787165e-05 
## South -3.158687e-05 1.435897e-04 -8.270559e-05 -4.133832e-05 
## East 8.816218e-06 -8.270559e-05 6.738748e-05 1.688334e-05 
## West -3.787165e-05 -4.133832e-05 1.688334e-05 6.361024e-05 
11 
2 Mahalanobis distances Solution 
4630 Assignment 1 R Code Ravish Kamath 213893664 
Part F 
mu0 = matrix(c(1450,1900,1700,1700), ncol=1, byrow=T) 
Tobs = n*t(xbar-mu0)%*%S_inv%*%(xbar-mu0) 
Tobs 
## [,1] 
## [1,] 3.967353 
Fcriticalvalue = (n-1)*p/(n-p)*qf(p = 0.05, df1 = p, df2 = n-p, lower.tail = FALSE) 
Fcriticalvalue 
## [1] 14.28286 
pvalue = pf((n-p)/((n-1)*p)*Tobs, p, n-p,lower.tail = FALSE) 
pvalue 
## [,1] 
## [1,] 0.5224764 
Part G 
Based of the R code for Part E, since the p-value is greater than 0.05, we would say that the vector 
˜
µ = (1450
, 1900
, 1700
, 1700)Õ would fall within the 95% confifidence region. Furthermore, we can say that since 
the Hotelling 
T2 
observed statistic is not greater than the F critical value, we cannot reject 
H0 and we shall say 
that there is no evidence to show that population mean vector is difffferent from 
˜
µ = (1450, 1900, 1700, 1700)Õ . 
12

More products