$30
I. Pen-and-paper [13v]
Four positive observations, {(๐ด
0) , (๐ต
1) , (๐ด1) , (๐ด
0)}, and four negative observations, {(๐ต
0) , (๐ต
0) , (๐ด1) , (๐ต
1)},
were collected. Consider the problem of classifying observations as positive or negative.
1) [4v] Compute the recall of a distance-weighted ๐NN with ๐ = 5 and distance ๐(๐ฑ1, ๐ฑ2) =
๐ป๐๐๐๐๐๐(๐ฑ1, ๐ฑ2)+
1
2
using leave-one-out evaluation schema (i.e., when classifying one
observation, use all remaining ones).
An additional positive observation was acquired, (๐ต
0), and
a
third
variable
๐ฆ
3
was
independently
monitored, yielding estimates ๐ฆ3|๐ = {1.2, 0.8, 0.5, 0.9,0.8
}
and
๐ฆ
3|๐ = {
1
,
0
.9, 1
.
2, 0.8}.
2) [4v] Considering the nine training observations, learn a Bayesian classifier assuming:
i) ๐ฆ1 and ๐ฆ2 are dependent, ii) {๐ฆ1, ๐ฆ2} and {๐ฆ3} variable sets are independent and equally
important, and ii) ๐ฆ3 is normally distributed. Show all parameters.
Considering three testing observations, {((0๐ด1
.8) , Positive
) ,(
(
๐ต
1
1
)
,
Positive
)
,
(
(
๐ต
0
0
.9
)
,
Negative
)}.
3) [3v] Under a MAP assumption, compute
๐
(Positive
|๐ฑ
)
of
each
testing
observation.
4) [2v] Given a binary class variable, the default
decision
threshold
of
๐
=
0
.5
,
๐(๐ฑ|๐) = {
Positive ๐(Positive
|๐ฑ) >
๐
Negative
otherwise
can be adjusted. Which decision threshold
– 0.3, 0.5 or
0.7 – optimizes
testing accuracy?
II. Programming and critical analysis [7v]
Considering the pd_speech.arff dataset available at the course webpage.
5) [3v] Using sklearn, considering
a
10
-fold
stratified
cross
validation
(random=0
), plot
the
cumulative
testing confusion matrices of
๐NN
(uniform
weights,
๐ =
5, Euclidean
distance)
and
Naïve
Bayes
(Gaussian assumption). Use all
remaining
classifier
parameters
as default.
6) [2v] Using scipy, test the hypothesis “๐NN is statistically superior to Naïve Bayes regarding
accuracy”, asserting whether is true.
7) [2v] Enumerate three
possible
reasons
that
could underlie the observed differences in predictive
accuracy between
๐NN
and Naïve
Bayes.
END