Machine Learning-Homework II Solved

Starting from:

$35

− Submission Gxxx.PDF in Fenix where xxx is your group number. Please note that it is possible to submit several times on Fenix to prevent last-minute problems. Yet, only the last submission is considered valid

− Use the provided report template. Include your programming code as an Appendix

− Exchange of ideas is encouraged. Yet, if copy is detected after automatic or manual clearance, homework is nullified and IST guidelines apply for content sharers and consumers, irrespectively of the underlying intent − Please consult the FAQ before posting questions to your faculty hosts

I. Pen-and-paper [13v]

𝐴 𝐵 𝐴 𝐴 𝐵 𝐵 𝐴 𝐵

Four positive observations, {( ) , ( ) , ( ) , ( )}, and four negative observations, {( ) , ( ) , ( ) , ( )},

0 1 1 0 0 0 1 1

were collected. Consider the problem of classifying observations as positive or negative.

1) [4v] Compute the recall of a distance-weighted 𝑘NN with 𝑘 = 5 and distance 𝑑(𝐱1, 𝐱2) = 𝐻𝑎𝑚𝑚𝑖𝑛𝑔(𝐱1, 𝐱2) + using leave-one-out evaluation schema (i.e., when classifying one observation, use all remaining ones).

𝐵
An additional positive observation was acquired, (0), and a third variable 𝑦3 was independently monitored, yielding estimates 𝑦3|𝑃 = {1.2, 0.8,0.5,0.9,0.8} and 𝑦3|𝑁 = {1, 0.9,1.2, 0.8}.

2) [4v] Considering the nine training observations, learn a Bayesian classifier assuming:

i) 𝑦1 and 𝑦2 are dependent, ii) {𝑦1, 𝑦2} and {𝑦3} variable sets are independent and equally important, and ii) 𝑦3 is normally distributed. Show all parameters.

𝐴 𝐵 𝐵

Considering three testing observations, {(( 1 ) , Positive) , ((1) , Positive) , (( 0 ) , Negative)}.

0.8 1 0.9

3) [3v] Under a MAP assumption, compute 𝑃(Positive|𝐱) of each testing observation.

4) [2v] Given a binary class variable, the default decision threshold of 𝜃 = 0.5,

Positive 𝑃(Positive|𝐱) > 𝜃 𝑓(𝐱|𝜃) = {Negative otherwise

can be adjusted. Which decision threshold – 0.3, 0.5 or 0.7 – optimizes testing accuracy?

II. Programming and critical analysis [7v]

Considering the pd_speech.arff dataset available at the course webpage.

5) [3v] Using sklearn, considering a 10-fold stratified cross validation (random=0), plot the cumulative testing confusion matrices of 𝑘NN (uniform weights, 𝑘 = 5, Euclidean distance) and Naïve Bayes (Gaussian assumption). Use all remaining classifier parameters as default.

6) [2v] Using scipy, test the hypothesis “𝑘NN is statistically superior to Naïve Bayes regarding accuracy”, asserting whether is true.

7) [2v] Enumerate three possible reasons that could underlie the observed differences in predictive accuracy between 𝑘NN and Naïve Bayes.

More products

CS61A Lab 1-Variables & Functions, Control Solution

CS61A Homework 8- Regular Expressions, BNF Solution

$34.99

Add to cart