Starting from:

$30

NTUCSIE-Homework 6 Solved

1. Consider an aggregation classifier G constructed by uniform blending on 11 classifiers . That is,

me that each gt is of test 0/1 error Eout(gt) = et. Which of the following is the tightest upper bound of Eout(G)? Choose the correct answer; explain your answer.

2.    Suppose that each x = (x1,x2) ∈ R2 is drawn uniformly from the region
{0 ≤ x1 ≤ 3,0 ≤ x2 ≤ 3}
and the target function is f(x) = sign(x2 − x1). Consider blending the following three hypotheses
linearly to approximate the target function.
 
                                                                        g1(x)     =
sign(x1 − 2)
                                                                        g2(x)     =
sign(x2 − 1)
                                                                        g3(x)     =
sign(x2 − 2)
That is,
with αt ∈ R. What is the smallest possible Eout(G)? Choose the correct answer; explain your answer.
(Hint: The “boundary” of G must be a “combination” of the boundaries of gt)
[a]  
[b]  
[c]  
[d]  
[e] none of the other choices

3. When talking about non-uniform voting in aggregation, we mentioned that α can be viewed as a weight vector learned from any linear algorithm coupled with the following transform:
When studying kernel methods, we mentioned that the kernel is simply a computational short-cut for the inner product (ϕ(x))T(ϕ(x′)). In this problem, we mix the two topics together using the decision stumps as our gt(x).
Assume that the input vectors contain only even integers between (including) 2L and 2R, where
L < R. Consider the decision stumps , where

                                                        i            ∈ {1,2,··· ,d},

d           is the finite dimensionality of the input space, s       ∈ {−1,+1}, θ             is an odd integer between (2L,2R).                        !
              Define ϕds(x) =                        g+1,1,2L+1(x),g+1,1,2L+3(x),...,g+1,1,2R−1(x),...,g−1,d,2R−1(x) .                         What is

Kds(x,x′) = (ϕds(x))T(ϕds(x′))? Choose the correct answer; explain your answer.

[a]    2d(R − L) − 2∥x − x′∥1

[b]   2d(R − L)2 − 2∥x − x′∥21

[c]    2d(R − L) − 2∥x − x′∥2

[d]   2d(R − L)2 − 2∥x − x′∥22
[e]    none of the other choices
Adaptive Boosting


4.    Consider applying the AdaBoost algorithm on a binary classification data set where 99% of the examples are positive. Because there are so many positive examples, the base algorithm within
AdaBoost returns a constant classifier g1(x) = +1 in the first iteration. Let un         be the individual example weight of each example in the second iteration. What is

Choose the correct answer; explain your answer.
[a] 99
[b] 1/99
[c]    1
[d]   100
[e]   1/100

5.    For the AdaBoost algorithm introduced in Lecture 12, let . How many of the following are guaranteed to be non-increasing from the t-th iteration to the (t+1)-th iteration? Choose the correct answer; explain each non-increasing case within your answer.

•   Ein(Gt) to Ein(Gt+1)

•   Eout(Gt) to Eout(Gt+1)


•  when gt is correct on (xn,yn)

•  when gt is incorrect on (xn,yn)

[a]   1

[b]   2

[c]    3

[d]   4

[e]   5

6.    For the AdaBoost algorithm introduced in Lecture 12, let . Assume that 0  for each hypothesis gt. What is  ? Choose the correct answer; explain your answer.

7.    Following the previous two problems, assume that , which of the following is the tightest upper bound on the number of iterations T required to ensure Ein(GT) = 0? Choose the correct answer; explain your answer. (Hint: use the fact that
Random Forest = Bagging + Decision Tree

8.  uppose we have a data set of size N = 1126, and we use bootstrapping to sample N′ examples. What is the minimum N′ such that the probability of getting at least one duplicated example (with # copies ≥ 2) is larger than 50%? Choose the correct answer; explain your answer.
[a]   25
[b]   30
[c] 35
[d] 40
[e] none of the other choices

9. If bootstrapping is used to sample exactly 2N examples out of N, what is the probability that an example is not sampled when N is very large? Choose the closest answer; explain your answer.

[a]   77.9%

[b]   60.7%

[c]    36.8%

[d]   13.5%

[e]   1.8%

10.    Suppose we have a set of decision trees. Each tree comes with 2 node, each equipped with a fixed branching function. The root node is of two branches, evaluating whether x1 ≥ 0. If x1 < 0, the node connects to a leaf with some constant output. Otherwise the node connects to another node of two branches, evaluating whether x2 ≥ 0. Each of the branches connects to a constant leaf. Consider three-dimensional input vectors. That is, x = (x1,x2,x3). Which of the following data set can be shattered by the set of decision trees? Choose the correct answer; explain your answer.

[a]   (1,1,−1),(−1,1,−1),(−1,−1,−1)

[b]   (1,1,−1),(−1,1,−1),(1,−1,−1)

[c]    (1,1,−1),(−1,1,−1),(1,−1,−1),(−1,−1,−1)

[d]   (1,1,−1),(−1,1,−1),(1,−1,−1),(−1,−1,1)

[e]   none of the other choices

Experiments with Adaptive Boosting

For Problems 11-16, implement the AdaBoost-Stump algorithm as introduced in Classes 12 and 13. Run the algorithm on the following set for training: https://www.csie.ntu.edu.tw/~htlin/course/ml21fall/hw6/hw6_train.dat and the following set for testing:

https://www.csie.ntu.edu.tw/~htlin/course/ml21fall/hw6/hw6_test.dat Use a total of T = 500 iterations (please do not stop earlier than 500), and calculate Ein and Eout with the 0/1 error.

For the decision stump algorithm, please implement the following steps. Any ties can be arbitrarily broken.

(1)   For any feature i, sort all the xn,i values to x[n],i such that x[n],i ≤ x[n+1],i.

(2)   Consider thresholds within −∞ and all the midpoints  . Test those thresholds with s ∈ {−1,+1} to determine the best (s,θ) combination that minimizes Einu using feature i.

(3)   Pick the best (s,i,θ) combination by enumerating over all possible i.

For those interested in algorithms (who isn’t? :-) ), step 2 can be carried out in O(N) time only!!

11.   (*) What is the value of Ein(g1)? Choose the closest answer; provide your code.

[a]   0.29

[b]   0.33

[c]    0.37

[d]   0.41

[e]   0.45

12.   (*) What is the value of max1≤500≤t Ein(gt)? Choose the closest answer; provide your code.

[a]   0.40

[b]   0.45

[c]    0.50

[d]   0.55

[e]   0.60

13.   (*) What is the smallest t within the choices below such that min1≤τ≤t Ein(Gτ) ≤ 0.05? Choose the correct answer; provide your code.

[a]   60

[b]   160

[c]    260

[d]   360

[e] 460
14.    (*) What is the value of Eout(g1)? Choose the closest answer; provide your code.

[a]   0.40

[b]   0.45

[c]    0.50

[d]   0.55

[e]   0.60

15.    (*) Define Guniform . What is the value of Eout(Guniform)? Choose the closest answer; provide your code.

[a]   0.23

[b]   0.28

[c]    0.33

[d]   0.38

[e]   0.43

16.    (*) What is the value of Eout(G500)? Choose the closest answer; provide your code.

[a]   0.14

[b]   0.18

[c]    0.22

[d]   0.26

[e]   0.30

More products