Starting from:

$24.99

CS7643 Homework 0 Solution

TAs: Prabhav Chawla, Yihao Chen, Sameer Dharur, Hrishikesh Kale,
Michael Piseno, Joanne Truong, Tianyu Zhan
Discussions: https://piazza.com/gatech/fall2020/cs48037643
Instructions
1. We will be using a Gradescope Online Assignment for PS0. If you are not registered for this course, please fill the following form in order to be added to Gradescope and be able to submit PS0: https://forms.gle/jDoqyKa3zTeMbwqT7.
2. While we work through adding students to Gradescope, feel free to refer to this PDF to look at questions that are a part of PS0. However, PS0 submissions will be through the Gradescope Online Assignment only.
3. PS0 will not count towards your final grade; however, you are still required to make a submission.
Exception: PS0 is meant to serve as a background preparation test. You must NOT collaborate on PS0.
1 Multiple Choice Questions
1. (1 point) true/false We are machine learners with a slight gambling problem (very different from gamblers with a machine learning problem!). Our friend, Bob, is proposing the following payout on the roll of a dice:
payout (1)
where x ∈ {1,2,3,4,5,6} is the outcome of the roll, (+) means payout to us and (−) means payout to Bob. Is this a good bet i.e. are we expected to make money?
True False
2. (1 point) X is a continuous random variable with the probability density function:
(2)
Which of the following statements are true about equation for the corresponding cumulative density function (CDF) C(x)?
[Hint: Recall that CDF is defined as C(x) = Pr(X ≤ x).]
for 0 ≤ x ≤ 1/2 for 1/2 ≤ x ≤ 1
All of the above
None of the above
3. (2 point) A random variable x in standard normal distribution has the following probability density
(3)
Evaluate following integral
(4)
[Hint: We are not sadistic (okay, we’re a little sadistic, but not for this question). This is not a calculus question.]
a + b + c c a + c b + c
4. (2 points) Consider the following function of x = (x1,x2,x3,x4,x5,x6):
(5)
where σ is the sigmoid function
(6)
Compute the gradient ∇xf(·) and evaluate it at at xˆ = (−1,3,4,5,−5,7).

5. (2 points) Which of the following functions are convex?

x for x ∈ Rn, and a finite set of arbitrary vectors: {a1,...,ak}
for w ∈ Rd
All of the above
6. (2 points) Suppose you want to predict an unknown value Y ∈ R, but you are only given a sequence of noisy observations x1,...,xn of Y with i.i.d. noise ( ). If we assume the noise is I.I.D. Gaussian ( ), the maximum likelihood estimate (yˆ) for Y can be given by:
A: yˆ = argmin
B: yˆ = argmin C:
Both A & C
Both B & C
2 Proofs
7. (3 points) Prove that
loge x ≤ x − 1, ∀x > 0 (7)
with equality if and only if x = 1.
[Hint: Consider differentiation of log(x) − (x − 1) and think about concavity/convexity and second derivatives.]
8. (6 points) Consider two discrete probability distributions p and q over k outcomes:
k k X X
pi = qi = 1 (8a)
i=1 i=1
pi > 0,qi > 0, ∀i ∈ {1,...,k} (8b)
The Kullback-Leibler (KL) divergence (also known as the relative entropy) between these distributions is given by:
(9)
It is common to refer to KL(p,q) as a measure of distance (even though it is not a proper metric). Many algorithms in machine learning are based on minimizing KL divergence between two probability distributions. In this question, we will show why this might be a sensible thing to do.
[Hint: This question doesn’t require you to know anything more than the definition of KL(p,q) and the identity in Q7]
(a) Using the results from Q7, show that KL(p,q) is always non-negative.
(b) When is KL(p,q) = 0?
(c) Provide a counterexample to show that the KL divergence is not a symmetric function of its arguments: KL(p,q) 6= KL(q,p)
9. (6 points) In this question, we will get familiar with a fairly popular and useful function, called the log-sum-exp function. For x ∈ Rn, the log-sum-exp function is defined (quite literally) as: (10)
(a) Prove that f(x) is differentiable everywhere in Rn.
[Hint: Multivariable functions are differentiable if the partial derivatives exist and are continuous.]
(b) Prove that f(x) is convex on Rn.
[Hint: One approach is to use the second-order condition for convexity.]
(c) Show that f(x) can be viewed as an approximation of the max function, bounded as follows:
max{x1,...,xn} ≤ f(x) ≤ max{x1,...,xn} + log(n) (11)

More products