$34.99
In this homework you will review some basic linear algebra, probability, and optimization concepts. Each sub problem is worth 10 points.
Problem 1.1. Show that RD is a subspace.
Problem 1.2. Subspaces spaces are, by definition, closed under linear combinations. For example, when you add or multiply elements of RD, you end up with an element of RD (as shown in Problem 1.1). In other words, you cannot fall out of RD by adding or multiplying. Subspaces are not necessarily closed under all mathematical operations.
(a) Show that RD is not closed under element-wise square roots.
(b) Give an example of a subspace that is closed under element-wise square roots (besides being closed under linear combinations, as all subspaces must be).
Problem 1.3. Let u1,...,uR ∈ RD. Show that U = span[u1,...,uR] is a subspace.
Problem 1.4 (Diabetes testing). With 9.3% of the U.S. population having diabetes, there is an increasing interest in studying this disease. Geneticists have determined that 95% of the people that develop diabetes have the following genes inactive:
• TCF7L2. Affects insulin secretion and glucose production.
• ABCC8. Helps regulate insulin.
• GLUT2. Helps move glucose into the pancreas.
(a) If you sequence your genome and find out that these genes are inactive, what is the probability that you develop diabetes?
(b) What other information would you need to know?
(c) Based on this information, when should you be concerned?
Problem 1.5 (Snapchat’s delays). Suppose that you are sending pics to your girlfriend/boyfriend overseas.
Each time you send a picture through the Internet it takes a certain amount of time to reach your gf/bf. Assume that you can measure the time delay. The delay won’t be constant, since it depends on the traffic of the Internet (in particular at the routers that handle your messages). You and your gf/bf measure the delays of several packet transmissions. It appears that there is a minimal time delay, say t0 (msec). Based on your observations, it seems that larger delays are rarer than shorter ones. Propose a probabilistic model for the delays with a single free parameter θ. The value of θ should govern the expected delay characteristics. Let x denote a random variable that represents the delay. The observations you have made are assumed to be independent realizations of this random variable. Let P(x|θ) denote the probability density of x. Give an explicit form for P(x|θ) and explain the rationale of your model.
1-1
Homework 1: Review 1-2
Problem 1.6 (Simulating random variables). In this problem you will simulate random variables and study their distributions.
(a) Generate N = 1000 i.i.d. Uniform(0,1) random variables x1,...,xN, and plot their histogram. Does it look fairly uniform?
(b) Let
1 if xi ≤ p
yi =
0 otherwise.
What is the distribution of yi?
(c) Plot the histogram of the yi’s with p = 1/4, 1/2, 3/4. Do these histograms match the distribution of your answer from (b)?
(d) Let zk be the sum of the kth batch of n yi’s. What is the distribution of zk?
(e) Plot the histogram of the zk’s with n = 10 and p = 1/4, 1/2, 3/4. Do these histograms match the distribution of your answer from (d)?
Problem 1.7 (Logistic Gradient). The following expression describes the log-likelihood of the logistic regression model:
N
.
The goal in logistic regression is to maximize this quantity.
(a) Derive an expression for its gradient (w.r.t. θ).
(b) Derive an expression for its Hessian.
(c) Is `(θ) a scalar, a vector, or a matrix? What about its gradient? What about its Hessian?