Exercise 1: Bayes’ rule . Suppose that 5% of competitive athletes use performanceenhancing drugs and that a particular drug test has a 2% false positive rate and a 1.5% false negative rate.
1. Athlete A tests positive for drug use. What is the probability that Athlete A is using drugs?
2. Athlete B tests negative for drug use. What is the probability that Athlete B is not using drugs?
Exercise 2: Bayesian decision theory: losses and risks. Consider a classification problem with K classes, using a loss λik ≥ 0 if we choose class i when the input actually belongs to class k, for i,k ∈ {1,...,K}.
1. Write the expression for the expected risk Ri(x) for choosing class i as the class for a pattern x, and the rule for choosing the class for x.
Consider a two-class problem with losses given by the matrix .
2. Give the optimal decision rule in the form “p(C1|x) ...” as a function of λ21.
3. Imagine we consider both misclassification errors as equally costly. When is class 1 chosen (for what values of p(C1|x))?
4. Imagine we want to be very conservative when choosing class 2 and we seek a rule of the form “p(C2|x) 0.99” (i.e., choose class 2 when its posterior probability exceeds 99%). What should λ21 be?
Exercise 3: association rules . Given the following data of transactions at a supermarket, calculate the support and confidence values of the following association rules: meat → avocado, avocado → meat, yogurt → avocado, avocado → yogurt, meat → yogurt, yogurt → meat. What is the best rule to use in practice?
transaction #
items in basket
1
meat, avocado
2
yogurt, avocado
3
meat
4
yogurt, meat
5
avocado, meat, yogurt
6
meat, avocado
Exercise 4: true- and false-positive rates . Consider the following table, where xn is a pattern, yn its ground-truth label (1 = positive class, 2 = negative class) and p(C1|xn) the posterior probability produced by some probabilistic classification algorithm:
n
1
2
3
4
5
yn
1
2
2
1
2
p(C1|xn)
0.6
0.7
0.5
0.9
0.2
We use a classification rule of the form “p(C1|x) θ” where θ ∈ [0,1] is a threshold.
1. Give, for all possible values of θ ∈ [0,1], the predicted labels and the corresponding confusion matrix and classification error.
2. Plot the corresponding pairs (fp,tp) as an ROC curve.
Exercise 5: ROC curves. Imagine we have a classifier A that has false-positive and true-positive rates fpA,tpA ∈ [0,1] such that fpA tpA (that is, this classifier is below the diagonal on the ROC space). Now consider a classifier B that negates the decision of A, that is, whenever A predicts the positive class then B predicts the negative class and vice versa. Compute the false-positive and true-positive rates fpB,tpB for classifier B. Where is this point in the ROC space?
Exercise 6: least-squares regression . Consider the following model, with parameters Θ = {θ1,θ2,θ3} ⊂ R and an input x ∈ R:
h(x;Θ) = θ1 + θ2 sin2x + θ3 sin4x ∈ R.
1. Write the general expression of the least-squares error function of a model h(x;Θ) with parameters Θ given a sample .
2. Apply it to the above model, simplifying it as much as possible.
3. Find the least-squares estimate for the parameters.
4. Assume the values are uniformly distributed in the interval [0,2π]. Can you find a simpler, approximate way to find the least-squares estimate ? Hint: approximate by an integral.
Exercise 7: maximum likelihood estimate . A discrete random variable x ∈ {0,1,2...} follows a Poisson distribution if it has the following probability mass function:
where the parameter is θ 0.
1. Verify that
2. Write the general expression of the log-likelihood of a probability mass function p(x;Θ) with parameters Θ for an iid sample x1,...,xN.
3. Apply it to the above distribution, simplifying it as much as possible.
4. Find the maximum likelihood estimate for the parameter θ.
Exercise 8: multivariate Bernoulli distribution. Consider a multivariate Bernoulli distribution where θ ∈ [0,1]D are the parameters and x ∈ {0,1}D the binary random vector:
.
1. Compute the maximum likelihood estimate for θ given a sample X = {x1,...,xN}.
Let us do document classification using a D-word dictionary (element d in xn is 1 if word d is in document n and 0 otherwise) using a multivariate Bernoulli model for each class. Assume we have K document classes for which we have already obtained the values of the optimal parameters θk = (θk1,...,θkD)T and prior distribution p(Ck) = πk, for k = 1,...,K, by maximum likelihood.
2. Write the discriminant function gk(x) for a probabilistic classifier in general (not necessarily Bernoulli), and the rule to make a decision.
3. Apply it to the multivariate Bernoulli case with K classes. Show that gk(x) is linear on x, i.e., it can be written as gk(x) = wkT x + wk0 and give the expression for wk and wk0.
4. Consider K = 2 classes. Show the decision rule can be written as “if wT x + w0 0 then choose class 1”, and give the expression for w and w0.
5. Compute the numerical values of w and w0 for a two-word dictionary where π1 = 0.7,
) and ). Plot in 2D all the possible values of x ∈ {0,1}D and the boundary
corresponding to this classifier.
Exercise 9: Gaussian classifiers . Consider a binary classification problem for x ∈ RD where we use Gaussian class-conditional probabilities ) and
That is, they have the same mean and the covariance matrices are isotropic but different. Compute the expression for the class boundary. What shape is it?