In a 2-class problem with 1 feature, you are given the following data points:
S1 : −3, − 2, 0 S2 : −1, 2
(a) Give the k-nearest neighbor estimate of p(x S1) with k=3, for all x. Give both algebraic (simplest) form, and a plot. On the plot, show the location (x value) and height of (each) peak.
(b) Give the Parzen Windows estimate of p(x S2) with window function:
( ) ⎧⎪ 0.25, −2 ≤u< 2
Δ u =⎨
⎪⎩ 0, otherwise
Give both algebraic (simplest) form, and a plot. On the plot, show the x value of all significant points, and label the values of p(x S2) clearly.
(c) Estimate prior probabilities P(S1) and P(S2) from frequency of occurrence of the data points.
(d) Give an expression for the decision rule for a Bayes minimum error classifier using the density and probability estimates from (a)-(c). You may leave your answer in terms of pˆ(x S1), pˆ(x S2), Pˆ(S1), Pˆ(S2) , without plugging in for these quantities.
(e) Using the estimates you have made above, solve for the decision boundaries and regions of a Bayes minimum error classifier using the density and probability estimates from (a)-(c). Give your answer in 2 forms:
(i) Algebraic expressions of the decision rule, in simplest form (using numbers and variable x);
(ii) A plot showing the decision boundaries and regions.
Tip: you may find it easiest to develop the algebraic solution and plot and the same time.
(f) Classify the points: x=−0.5, 0.1, 0.5 using the classifier you developed in (e).
(g) Separately, use a discriminative 3-NN classifier to classify the points x=−0.5, 0.1, 0.5. (Hint: if this takes you more than a few steps for each data point,
you are doing more work than necessary.)
Assignment continues on next page…
2. [Comment: this problem is on parameter estimation, which is covered in Lecture 26 on Monday, 4/27.]
In a 1D problem (1 feature), we will estimate parameters for one class. We model the density p(xθ) as:
p(xθ)=⎨⎧⎪ θe−θx, x≥ 0 ⎪⎩ 0, otherwise
in which θ≥0 .
You are given a dataset Z : x1,x2,!,xN , which are drawn i.i.d. from p(xθ).
In this problem, you may use for convenience the notation:
1 N
m! ∑xi .
N i=1
(a) Solve for the maximum likelihood (ML) estimate θˆML , of θ, in terms of the given data points. Express your result in simplest form.
For parts (b) and (c) below, assume there is a prior for θ, as follows:
p(θ)=⎧⎪⎨ ae−aθ, θ≥ 0
⎪⎩ 0, otherwise
in which a≥0 .
(b) Solve for the maximum a posteriori (MAP) estimate θˆMAP , of θ, in terms of the given data points. Express your result in simplest form.
(c) Write θˆMAP as a function of θˆML and given parameters. Find σlθi→m∞θˆMAP , in which σθ is the standard deviation of the prior on θ. What does this limit correspond to in terms of our prior knowledge of θ?
Hint: the standard deviation of θ for the given p(θ) is: σθ= 1a .
3. [Extra credit] Comment: this problem is not more difficult than the regular-credit problems above; it is extra credit because the total length Problems 1 and 2 above is already sufficient and reasonable for one homework assignment.
In a 2-class problem with D features, you are to use Fisher’s Linear Discriminant to find an optimal 1D feature space. You are given that the scatter matrices for each class (calculated from the data for each class) are diagonal:
⎡ 2 ⎤ ⎡ ρ12 ⎤
S1 =⎢⎢⎢⎢ σ1 σ22 ⎥⎥⎥⎥, S2 =⎢⎢⎢⎢ ρ22 0 ⎥⎥⎥⎥ 0
⎢⎢⎢⎣ 0 ! 2 ⎥⎥⎥⎦ ⎢⎢ 0 ! ρD2 ⎥⎥⎥⎦ σD ⎢⎣
and you are given the sample means for each class:
⎛⎜ m1(1) ⎞⎟ ⎛⎜ m1(2) ⎞⎟ m1 =⎜⎜ m2(1) ⎟⎟ , m2 =⎜⎜ m2(2) ⎟⎟ .
⎜ ! ⎟ ⎜ ! ⎟
⎜⎜ (1) ⎟⎟ ⎜⎜⎝ m(D2) ⎟⎟⎠
⎝ mD ⎠
(a) Find the Fisher’s Linear Discriminant w . Express in simplest form.
(b) Let D=2. Suppose σ12 =4σ22 , and ρ12 =4ρ22 , and:
m1 =⎜⎝⎛ 22 ⎞⎟⎠ , m2 =⎛⎜⎝ −12 ⎞⎟⎠
Plot vectors m1, m2, (m1−m2), and w.
(c) Interpreting your answer of part (b), which makes more sense for a 1D feature space direction: (m1−m2) or w? Justify your answer.