Starting from:

$35

 STAT215 - Assignment 1 Solved




Problem 1: The negative binomial distribution.

Consider a coin with probability p of coming up heads. The number of coin flips before seeing a ‘tails’ follows a geometric distribution with pmf

Pr(X = k; p) = pk (1 − p).

The number of coin flips before seeing r tails follows a negative binomial distribution with parameters r and p.

(a)     Derive the probability mass function Pr(X = k; r, p) of the negative binomial distribution. Explain your reasoning.

(b)    The geometric distribution has mean p/(1 − p) and variance p/(1 − p)2. Compute the mean and variance of the negative binomial distribution. Plot the variance as a function of the mean for fixed p and varying r. How does this compare to the Poisson distribution?

(c)     Rewrite the negative binomial pmf in terms of the mean µ and the dispersion parameter r. Show that as r → ∞ with µ fixed, the negative binomial converges to a Poisson distribution with mean µ.

(d)    The gamma distribution is a continuous distribution on (0, ∞) with pdf

βα

p(x; α, β) = Γ(α) xα−1e−β x,

where Γ(·) denotes the gamma function, which has the property that Γ(n) = (n−1)! for positive in­tegers n. Show that the negative binomial is the marginal distribution over X where X ∼ Poisson(µ) and µ ∼ Gamma(r, (1 − p)/p), integrating over µ. In other words, show that the negative binomial is equivalent to an infinite mixture of Poissons with gamma mixing measure.

(e)     Suppose Xn ∼ NB(r, p) for n = 1,. . . , N are independent samples of a negative binomial distribution. Write the log likelihood 2(r, p). Solve for the maximum likelihood estimate (in closed form) of pˆ for fixed r. Plug this into the log likelihood to obtain the profile likelihood 2 (r, ˆp(r)) as a function of r alone.

 Problem 2: The multivariate normal distribution.

(a) In class we introduced a multivariate Gaussian distribution via its representation as a linear transformation x = Az + µ where z is a vector of independent standard normal random variates. Using the change of variables formula, derive the multivariate Gaussian pdf,

{                                                                                                                                                          }

p(x; µ, Σ) = (2π)−D/2|Σ|−1/2 exp −1 2(x − µ)TΣ−1(x − µ) , where µ ∈ RD and Σ = AAT ∈ RD×D is a positive semi-definite covariance matrix.

~D

(b) Let r = liz112 = ( d=1 z2 d)1/2 where z is a vector of standard normal variates, as above. We will derive its density function.

(i)      Start by considering the D = 2 dimensional case and note that p(r) dr equals the probability mass assigned by the multivariate normal distribution to the infinitesimal shell at radius r from the origin.

(ii)     Generalize your solution to D > 2 dimensions, using the fact that the surface area of the D-dimensional ball with radius r is 2rD−1πD/2/Γ(D/2).

(iii)    Plot this density for increasing values of dimension D. What does this tell your about the distribution of high dimensional Gaussian vectors?

(iv)    Now use another change of variables to derive the pdf of r2, the sum of squares of the Gaussian variables. The squared 2-norm follows a χ2 distribution with D degrees of freedom. Show that it is a special case of the gamma distribution introduced in Problem 1.

(c) Rewrite the multivariate Gaussian density in natural exponential family form with parameters J and h. How do its natural parameters relate to its mean parameters µ and Σ? What are the sufficient statistics of this exponential family distribution? What is the log normalizer? Show that the derivatives of the log normalizer yield the expected sufficient statistics.

(d) Consider a directed graphical model on a collection of scalar random variables (x1,.. . , xD). Assume that each variable xd for d > 1 has exactly one parent in the directed graphical model, and let the index of the parent of xd be denoted by pard ∈ {1, . . . ,d − 1}. The joint distribution is then given by,

x1 ∼ jV (0,β−1),

xd ∼ JV (xpard + bd;β−1)   ford = 2,..., D.

The parameters of the model are β, {bd}D d=2. Show that the joint distribution is a multivariate 
Gaussian and find a closed form expression the precision matrix, J. How does the precision matrix 
change in the two-dimensional model where each xd ∈ R2, β−1 is replaced by β− 1I, and bd ∈ R2?

 Problem 3: Bayesian linear regression.

Consider a regression problem with datapoints (xn, yn) ∈ RD × R. We begin with a linear model,

yn = wTxn + εn; εn ∼ j (0,β−1),

where w ∈ RD is a vector of regression weights and β ∈ R+ specifies the precision (inverse variance) of the errors εn.

(a)   Assume a standard normal prior wi ∼ i (0, α−1). Compute the marginal likelihood p({x, yn}1; α, β) =f p(w; α) p({(xn, yn)}Nn=1 | w; β)dw.

(b)   Now consider a “spike-and-slab” prior distribution on the entries of w. Let z ∈ {0, 1}D be a binary vector specifying whether the corresponding entries in w are nonzero. That is, if zi = 0 then wi is deterministically zero; otherwise, wi ∼ i (0, α−1) as above. We can write this as a degenerate Gaussian prior

D

p(w | z) =                  j (wi | 0,ziα−1).

i=1

Compute the marginal likelihood p({(xn, yn)}Nn=1 | z, α, β). How would you find the value of z that maximizes this likelihood?

(c)   Suppose that each datapoint has its own precision βn. Compute the posterior distribution p(w | {(xn, yn, βn)}N n=1, α).

How does the posterior mean compare to the ordinary least squares estimate?

(d)   Finally, assume the per-datapoint precisions βn are not directly observed, but are assumed to be independently sampled from a gamma prior distribution,

βn ∼ Gamma(a, b),

which has the property that E[βn] = a/b and E[lnβn] = ψ(a) − ln b where ψ is the digamma function. Then, the errors εn are marginally distributed according to the Student’s t distribution, which has heavier tails than the Gaussian and hence is more robust to outliers.

Compute the conditional distribution p(βn | xn, yn, w, a, b), and compute the expected log joint 2(w') =         | x,y,w,a,b) [ log p({(x, yn, βn)}1, w'; α, a, b)].

What value of w maximizes the expected log joint probability? Describe an EM procedure to search for,

w∗ = argmax p(w | {(xn, yn)}Nn=1,α,a, b).

 Problem 4: Multiclass logistic regression applied to larval zebrafish behavior data.

Follow the instructions in this Google Colab notebook to implement a multiclass logistic regression model and fit it to larval zebrafish behavior data from a recent paper: https://colab.research. google.com/drive/1moN5CYNsyxeOSUOmN-QMyqEZwgLSBsjY. Once you’re done, save the notebook in .ipynb format, print a copy in .pdf format, and submit these files along with the rest of your written assignment.

More products