$30
Probability and Statistics
1. (Bayes Rule, from Murphy exercise 2.4.) After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease, and that the test is 99% accurate (i.e., the probability of testing positive given that you have the disease is 0.99, as is the probability of testing negative given that you dont have the disease). The good news is that this is a rare disease, striking only one in 10,000 people. What are the chances that you actually have the disease? (Show your calculations as well as giving the final result.)
2. For any two random variables X,Y the covariance is defined as Cov(X,Y ) = E[(X −E[X])(Y −E[Y ])]. You may assume X and Y take on a discrete values if you find that is easier to work with.
a. [1 points] If E[Y |X = x] = x show that Cov(X,Y ) = E[(X − E[X])2].
b. [1 points] If X,Y are independent show that Cov(X,Y ) = 0.
3. Let X and Y be independent random variables with PDFs given by f and g, respectively. Let h be the PDF of the random variable Z = X + Y .
a. Show that . (If you are more comfortable with discrete probabilities, you can instead derive an analogous expression for the discrete case, and then you should give a one sentence explanation as to why your expression is analogous to the continuous case.).
b. If X and Y are both independent and uniformly distributed on [0,1] (i.e. f(x) = g(x) = 1 for x ∈ [0,1] and 0 otherwise) what is h, the PDF of Z = X + Y ?
4. [1 points] A random variable X ∼ N(µ,σ2) is Gaussian distributed with mean µ and variance σ2. Given that for any a,b ∈ R, we have that Y = aX + b is also Gaussian, find a,b such that Y ∼ N(0,1).
5. [2 points] For a random variable Z, its mean and variance are defined as E[Z] and E[(Z−E[Z])2], respectively. Let X1,...,Xn be independent and identically distributed random variables, each with mean µ and variance σ2. If we define , what is the mean and variance of )?
6. If f(x) is a PDF, the cumulative distribution function (CDF) is defined as . For any function g : R → R and random variable X with PDF f(x), recall that the expected value of g(X) is defined
. For a boolean event A, define 1{A} as 1 if A is true, and 0 otherwise. Thus, 1{x ≤ a} is 1 whenever x ≤ a and 0 whenever x > a. Note that F(x) = E[1{X ≤ x}]. Let X1,...,Xn be independent and identically distributed random variables with CDF F(x). Define b .
Note, for every x, that Fbn(x) is an empirical estimate of F(x). You may use your answers to the previous problem.
a. [1 points] For any x, what is )]?
b. [1 points] For any x, the variance of ]. Show that Variance(Fbn(x)) =
.
c. [1 points] Using your answer to b, show that for all x ∈ R, we have .
7. [1 points] Let X1,...,Xn be n independent and identically distributed random variables drawn unfiromly at random from [0,1]. If Y = max{X1,...,Xn} then find E[Y ].
8. [1 points] Let X be random variable with E[X] = µ and E[(X − µ)2] = σ2. For any x ≥ 0, use Markov’s inequality to show that P(X ≥ µ + σx) ≤ 1/x2.
Linear Algebra and Vector Calculus
9. (Rank) Let and . For each matrix A and B,
a. [2 points] what is its rank?
b. [2 points] what is a (minimal size) basis for its column span?
10. (Linear equations) Let , and .
a. [1 points] What is Ac?
b. [2 points] What is the solution to the linear system Ax = b? (Show your work).
11. (Hyperplanes) Assume w is an n-dimensional vector and b is a scalar. A hyperplane in Rn is the set {x : x ∈ Rn, s.t. wTx + b = 0}.
a. [1 points] (n = 2 example) Draw the hyperplane for w = [−1,2]T, b = 2? Label your axes.
b. [1 points] (n = 3 example) Draw the hyperplane for w = [1,1,1]T, b = 0? Label your axes.
c. [2 points] Given some x0 ∈ Rn, find the squared distance to the hyperplane defined by wTx + b = 0. In other words, solve the following optimization problem:
minkx0 − xk2 x
s.t. wTx + b = 0
(Hint: if xe0 is the minimizer of the above problem, note that . What is wTxe0?)
12. For possibly non-symmetric A,B ∈ Rn×n and c ∈ R, let f(x,y) = xTAx + yTBx + c. Define ∇zf(x,y) =
T
.
a. [2 points] Explicitly write out the function f(x,y) in terms of the components Ai,j and Bi,j using appropriate summations over the indices.
b. [2 points] What is ∇xf(x,y) in terms of the summations over indices and vector notation?
c. [2 points] What is ∇yf(x,y) in terms of the summations over indices and vector notation?
13. [1 points] The trace of a matrix is the sum of the diagonal entries; Tr(A) = Pi Aii. If A ∈ Rn×m and B ∈ Rm×n, show that Tr(AB) = Tr(BA).
14. [1 points] Let v1,...,vn be a set of non-zero vectors in Rd. Let V = [v1,...,vn] be the vectors concatenated.
a. What is the minimum and maximum rank of ?
b. What is the minimum and maximum rank of V ?
c. Let A ∈ RD×d for D > d. What is the minimum and maximum rank of ?
d. What is the minimum and maximum rank of AV ? What if V is rank d?
Programming
15. For the A,b,c as defined in Problem 8, use NumPy to compute (take a screen shot of your answer): a. [2 points] What is A−1?
b. [1 points] What is A−1b? What is Ac?
16. [4 points] Two random variables X and Y have equal distributions if their CDFs, FX and FY , respectively, are equal, i.e. for all x, |FX(x) − FY (x)| = 0. The central limit theorem says that the sum of k independent, zero-mean, variance-1/k random variables converges to a (standard) Normal distribution as k goes off to infinity.
We will study this phenomenon empirically (you will use the Python packages Numpy and Matplotlib). Define where each Bi is equal to −1 and 1 with equal probability. From your solution to problem A.5, we know that is zero-mean and has variance 1/k.
a. For i = 1,...,n let Zi ∼ N(0,1). If F(x) is the true CDF from which each Zi is drawn (i.e., Gaussian) and ), use the answer to problem A.6 above to choose n large enough such that,
for all 0025, and plot ) from −3 to 3.
(Hint: use Z=numpy.random.randn(n) to generate the random variables, and import matplotlib.pyplot as plt; plt.step(sorted(Z), np.arange(1,n+1)/float(n)) to plot).
b. For each k ∈ {1,8,64,512} generate n independent copies Y (k) and plot their empirical CDF on the same plot as part a.
(Hint: np.sum(np.sign(np.random.randn(n, k))*np.sqrt(1./k), axis=1) generates n of the Y (k) random variables.)
Be sure to always label your axes. Your plot should look something like the following (Tip: checkout seaborn for instantly better looking plots.)