Starting from:

$30

CS7641-Homework 1 Linear Algebra, Expectation, Co-variance and Independence and Optimization Solved

1         Linear Algebra [
1.1        Determinant and Inverse of Matrix
Given a matrix M:

 

(a)   Calculate the determinant of M in terms of r.

(b)   For what value(s) of r does M−1 not exist? Why? What does it mean in terms of rank and singularity of M for these values of r?

(c)    Calculate M−1 by hand for r = 4. ] (Hint 1: Please double check your answer and make sure MM−1 = I)

(d)                 Find the determinant of M−1 for r = 4. Characteristic Equation

Consider the eigenvalue problem:

Ax = λx,x 6= 0

where x is a non-zero eigenvector and λ is eigenvalue of A. Prove that the determinant |A − λI| = 0.

1.3 Eigenvalues and Eigenvectors  Given a matrix A:

 

(a)   Calculate the eigenvalues of A as a function of x 

(b)   Find the normalized eigenvectors of matrix A 

2                Expectation, Co-variance and Independence]
Suppose X,Y and Z are three different random variables. Let X obey a Bernouli Distribution. The probability disbribution function is

 

c is a constant here. Let Y obey a standard Normal (Gaussian) distribution, which can be written as Y ∼ N(0,1). X and Y are independent. Meanwhile, let Z = XY .

(a)   Show that Z also follows a Normal (Gaussian) distribution. Calculate the Expectation and Variance of Z. (Hint: Sum rule and conditional probability formula)

(b)   How should we choose c such that Y and Z are uncorrelated(which means Cov(Y,Z) = 0)?

(c)    Are Y and Z independent? Make use of probabilities to show your conclusion. Example: P(Y ∈ (−1,0)) and P(Z ∈ (2c,3c))

3         Optimization
Optimization problems are related to minimizing a function (usually termed loss, cost or error function) or maximizing a function (such as the likelihood) with respect to some variable x. The Kuhn-Tucker conditions are first-order conditions that provide a unified treatment of constraint optimization. In this question, you will be solving the following optimization problem:

max f(x,y) = 2x2 + 3xy x,y

s.t.  

(a)   Specify the Legrange function

(b)   List the KKT conditions

(c)    Solve for 4 possibilities formed by each constraint being active or inactive

(d)   List all candidate points

(e)   Check for maximality and sufficiency

4         Maximum Likelihood
4.1        Discrete Example
Suppose we have two types of coins, A and B. The probability of a Type A coin showing heads is θ. The probability of a Type B coin showing heads is 2θ. Here, we have a bunch of coins of either type A or B. Each time we choose one coin and flip it. We do this experiment 10 times and the results are shown in the chart below. (Hint: The probabilities aforementioned are for the particular sequence below.)

Coin Type
Result
A
Tail
A
Tail
A
Tail
A
Tail
A
Tail
A
Head
A
Head
B
Head
B
Head
B
Head
(a)   What is the likelihood of the result given θ?

(b)   What is the maximum likelihood estimation for θ?

4.2        Normal distribution
Suppose that we observe samples of a known function g(t) = t3 with unknown amplitude θ at (known) arbitrary locations t1,...,tN, and these samples are corrupted by Gaussian noise. That is, we observe the sequence of random variables

                                               Xn = θt3n + Zn,            n = 1,...,N

where the Zn are independent and Zn ∼ Normal  

(a)   Given X1 = x1,...,XN = xN, compute the log likelihood function

`(θ;x1,...,xN) = logfX1,...,XN (x1,...,xN;θ) = log(fX1 (x1;θ)fX2 (x2;θ)···fXN (xN;θ))

Note that the Xn are independent (as the last equality is suggesting) but not identically distributed (they have different means).

(b)  Compute the MLE for θ.

4.3        Bonus for undergrads
The C.D.F of independent random variables X1,X2,...,Xn is

                                      ,              x β

where α ≥ 0, β ≥ 0.

(a)   Write down the P.D.F of above independent random variables.

(b)  Find the MLEs of α and β.

5         Information Theory
5.1        Marginal Distribution
Suppose the joint probability distribution of two binary random variables X and Y are given as follows.

X|Y
1
2
0
1

3
1

3
1
0
1

3
(a)   Show the marginal distribution of X and Y , respectively.

(b)   Find mutual information for the joint probability distribution in the previous question

5.2        Mutual Information and Entropy
Given a dataset as below.

Sr.No.
Age
Immunity
Travelled?
UnderlyingConditions
Self − quarantine?
1
young
high
no
yes
no
2
young
high
no
no
no
3
middleaged
high
no
yes
yes
4
senior
medium
no
yes
yes
5
senior
low
yes
yes
yes
6
senior
low
yes
no
no
7
middleaged
low
yes
no
yes
8
young
medium
no
yes
no
9
young
low
yes
yes
no
10
senior
medium
yes
yes
yes
11
young
medium
yes
no
yes
12
middleaged
medium
no
no
yes
13
middleaged
high
yes
yes
yes
14
senior
medium
no
no
no
We want to decide whether an individual working in an essential services industry should be allowed to work or self-quarantine. Each input has four features (x1, x2, x3, x4): Age, Immunity, Travelled, Underlying Conditions. The decision (quarantine vs not) is represented as Y .

(a)   Find entropy H(Y ).

(b)   Find conditional entropy H(Y |x1), H(Y |x4), respectively.

(c)    Find mutual information I(x1,Y ) and I(x4,Y ) and determine which one

(x1 or x4) is more informative. []

(d)   Find joint entropy H(Y,x3).

5.3        Entropy Proofs [7pts]
(a)   Suppose X and Y are independent. Show that H(X|Y ) = H(X).

(b)   Suppose X and Y are independent. Show that H(X,Y ) = H(X)+H(Y ).

 

(c)    Prove that the mutual information is symmetric, i.e., I(X,Y ) = I(Y,X) and xi ∈ X,yi ∈ Y 

6         Bonus for All
(a)   If a random variable X has a Poisson distribution with mean 8, then calculate the expectation E[(X + 2)2]

(b)   A person decides to toss a fair coin repeatedly until he gets a head. He will make at most 3 tosses. Let the random variable Y denote the number of heads. Find the variance of Y.

(c)    Two random variables X and Y are distributed according to

                                     ,   otherwise

What is the probability P(X+Y ≤ 1)?

More products