$20
Problem 1.
Consider the training objectiveHow would the hypothesis class capacity, overfitting/underfittting, and bias/variance vary𝐽 = ||𝑋𝑤 − 𝑡||2 subject to ||𝑤||2 ≤ 𝐶 for some constant 𝐶.
according to 𝐶?
Larger
Smaller
Model capacity (large/small?)
_____ 𝐶
_____ 𝐶
Overfitting/Underfitting?
__fitting
__fitting
Bias variance (how/low?)
__ bias / __ variance
__ bias / __ variance
Note: No proof is needed Problem 2.
𝑡(𝑚) ∼ 𝑁(𝑤𝑥(𝑚), σ
𝑤Consider a one-dimensional linear regression model∼ 𝑁(0, σ 2). Show that the posterior of 𝑤 is also a Gaussian distribution, i.e.,ϵ2) with a Gaussian prior
𝑤|𝑥(1), 𝑡(1), 𝑤 ···, 𝑥(𝑀), 𝑡(𝑀) ∼ 𝑁(µ𝑝𝑜𝑠𝑡, σ𝑝𝑜𝑠𝑡2). Give the formulas for µ𝑝𝑜𝑠𝑡, σ𝑝𝑜𝑠𝑡2.
Note: If a prior has the same formula (but typically with different parameters) as the posterior, itHint: Work with 𝑃(𝑤|𝐷) ∝ 𝑃(𝑤)𝑃(𝐷|𝑤). Do not handle the normalizing term.
is known as a conjugate prior. The above conjugacy also applies to multi-dimensional Gaussian, but the formulas for the mean vector and the covariance matrix will be more complicated.
Problem 3.
equivalent toGive the prior distribution of𝑙1-penalized mean square loss.𝑤 for linear regression, such that the max a posteriori estimation is