CS433-Labs ex 6 Solved

Your shopping cart is empty.

Prove the following assertions.

1. The affine function f(x) = ax + b is convex, where a,b and x are scalars.

2. If multiple functions fn(x) are convex over a fixed domain, then their sum g(x) = Pn fn(x) is convex over the same domain.

3. Take f,g : R → R to be convex functions and g to be increasing. Then g ◦ f is also convex.

Note: A function g is increasing if a ≥ b ⇔ g(a) ≥ g(b). An example of a convex and increasing function is exp(x),x ∈ R.

4. If f : R → R is convex, then g : RD → R, where g(x) := f(w>x + b), is also convex. Here, w is a constant vector in RD, b is a constant in R and x ∈ RD.

5. Let f : RD → R be strictly convex. Let x? ∈ RD be a global minimizer of f. Show that this global minimizer is unique. Hint: Do a proof by contradiction.

2 Extension of Logistic Regression to Multi-Class Classification
Suppose we have a classification dataset with N data example pairs {xn,yn}, n ∈ [1,N], and yn is a categorical variable over K categories, yn ∈ {1,2,...,K}. We wish to fit a linear model in a similar spirit to logistic regression, and we will use the softmax function to link the linear inputs to the categorical output, instead of the logistic function.

We will have K sets of parameters wk, and define and compute the probability distribution of the output as follows,

.

Note that we indeed have a probability distribution, as . To make the model identifiable, we will fix wK to 0, which means we have K−1 sets of parameters to learn. As in logistic regression, we will assume that each yn is i.i.d., i.e.,

.

1. Derive the log-likelihood for this model.

2. Derive the gradient with respect to each wk.

3. Show that the negative of the log-likelihood is convex with respect to wk.

3 Mixture of Linear Regression
In Project-I, you worked on a regression dataset with two or more distinct clusters. For such datasets, a mixture of linear regression models is preferred over just one linear regression model.

Consider a regression dataset with N pairs {yn,xn}. Similar to Gaussian mixture model (GMM), let rn ∈ {1,2,...,K} index the mixture component. Distribution of the output yn under the kth linear model is defined as follows:

Here, wk is the regression parameter vector for the kth model with w being a vector containing all wk. Also, x˜ .

1. Define rn to be a binary vector of length K such that all the entries are 0 except a kth entry, i.e., rnk = 1, implying that xn is assigned to the kth mixture. Rewrite the likelihood p(yn|xn,w,rn) in terms of rnk.

2. Write the expression for the joint distribution p(y|X,w,r) where r is the set of all r1,r2,...,rN.

3. Assume that rn follows a multinomial distribution p(rn = k|π) = πk, with π = [π1,π2,...,πK]. Derive the marginal distribution p(yn|xn,w,π) obtained after marginalizing rn out.

4. Write the expression for the maximum likelihood estimator L(w,π) := −logp(y|X,w,π) in terms of data y and X, and parameters w and π.

5. Is L jointly-convex with respect to w and π? Is the model identifiable? Prove your answers.

2

Shopping cart

US$0

CS433-Labs ex 6 Solved

More products