Starting from:

$30

NTUCSIE-Homework 5 Solved


1.    Consider N “linearly separable” 1D examples  . That is, xn ∈ R. Without loss of generality, assume that x1 ≤ x2 ≤ ...xM < xM+1 ≤ xM+2 ... ≤ xN, yn = −1 for n = 1,2,...,M, and yn = +1 for n = M + 1,M + 2,...,N. Which of the following represents a large-margin separating hyperplane? Choose the correct answer; explain your answer.
[e] none of the other choices

(Hint: The hard-margin SVM gets a specially-scaled version of the solution above.)

2.    Following the notations in the lecture for the hard-margin SVM in the Z-space. At the optimal (b,w) and α, how many of the following values are equal to the length of margin (the distance between the closest example and the decision boundary)? Choose the correct answer; explain why the values equal the margin in your chosen answer.

(1)   ∥w||−1/2

(2)   2∥w∥−1

(3)   ∥w∥−1

(4)   (PNn=1 αn)−1/2

(5)   PNn=1[αn = 1]

(6)   (2PNn=1 αn − ∥PNn=1 αnynzn∥2)−1/2

[a]   0

[b]   1

[c]    2

[d]   3 [e] 4

3.    Sometimes we hope to achieve a smaller margin for the positive examples, and a bigger margin for the negative ones. For instance, if we have very few negative examples on hand, we may hope to give them a larger margin to better protect them from noise. Consider an uneven-margin SVM that solves
  subject to
Consider the following examples

                                                                                       x1 = (0,1)            y1 = −1

                                                                                       x2 = (0,0)            y2 = −1

                                                                                    x3 = (0,−1)             y3 = −1

                                                                                       x4 = (1,0)            y4 = +1

Take ρ− = 4. What is the optimal w and b? Choose the correct answer; explain your answer. (Note: You can calculate your answer with a QP solver if you want, but you need to “explain” the solution that was found. We suggest you to visualize what happens.)
[a]   the optimal w = (1,0),b = −1

[b]   the optimal w = (2,0),b = −1

[c]    the optimal w = (5,0),b = −4

[d]   the optimal w  

[e]   none of the other choices

4.    The dual problem of the uneven-margin SVM defined in Problem 3 can be written as
 min□
α
                                                  subject to             X ynαn = 0

n=1 αn ≥ 0 for n = 1,2,...,N.

What is □? Choose the correct answer; explain your answer.

               J             K              J               K

[b]= −1 αn

               J               K              J               K

                                N J                  K            PN               ρ− Jyn = −1Kαn

                     [d] −Pn=1 yn = +1 αn +              n=1

                                      J                K                              J                K

[e] none of the other choices

5.    Let α∗ be an optimal solution of the original hard-margin SVM (i.e. even margin). Which of the following is an optimal solution of the uneven-margin SVM for a given ρ−? Choose the correct answer; explain your answer.
[a] α∗
[e] none of the other choices

Kernels
6.    Kernels are able to embed high-dimensional feature spaces. Consider the homogeneous polynomial kernel with degree Q,
where each x and x′ is in Rd, without padding x0 = 1. Now, decompose K(x,x′) as some Φ(x)TΦ(x′), where Φ(x) includes unique terms calculated from x. That is, x3x5 would be considered the same term as x5x3 (Note: this is different from the Φ(x) that we considered in class). What is dimension of Φ(x)? Choose the correct answer; explain your answer.
[e] none of the other choices

7.    For any feature transform Φ from X to Z, the squared distance between two examples x and x′ is ∥Φ(x)−Φ(x′)∥2 in the Z-space. The distance can be computed with the kernel trick. Consider the degree-2 quadratic kernel K2(x,x′) = (1 + xTx′)2. What is the tightest upper bound for the squared distance between two unit vectors x and x′ in the Z-space? Choose the correct answer; explain your answer.
[a]   0
[b]   1
[c]    2
[d   8
[e]   1126

Kernel Perceptron Learning Algorithm
8. In this problem, we are going to apply the kernel trick to the perceptron learning algorithm. If we run the perceptron learning algorithm on the transformed examples , the algorithm updates wt to wt+1 when the current wt makes a mistake on  :

wt+1 ← wt + yn(t)ϕ(xn(t))

Because every update is based on one (transformed) example, if we take w0 = 0, we can represent every wt as a linear combination of  . We can then maintain the linear combination coefficients instead of the whole w. Assume that we maintain an N-dimensional vector αt in the t-th iteration such that

N wt = Xαt[n]ϕ(xn)

n=1

for t = 0,1,2,..., where αt[n] indicates the n-th component of αt. Set α0 = 0 (N zeros) to match w0 = 0 (d˜+ 1 zeros). What should αt+1[n(t)] be when the current wt (represented by αt) makes a mistake on ? Choose the correct answer; explain your answer.

[a]   αt[n(t)] + 1

[b]   αt[n(t)] − 1

[c]    αt[n(t)] + yn(t)

[d]   αt[n(t)] − yn(t)

[e]   none of the other choices

Soft-Margin SVM
9.         Suppose we want to emphasize that some training examples are more important than others. Formally, consider a data set , where un is a non-negative weight that indicates the importance of the n-th example. The soft-margin SVM with this weighted classification problem solves the following constrained optimization problem:

 min w,b,ξ

subject to

We can then derive the dual version of the weighted soft-margin SVM problem that involves only α, y, X, and u:

min

α

subject to
N

X

ynαn = 0
n=1

0 ≤ αn ≤ un, for n = 1,2,··· ,N

Which of the following is the correct form of ♢? Choose the correct answer; explain your answer
[e]   none of the other choices

10.    As discussed in class, the primal optimization problem for the soft-margin SVM is equivalent to the following unconstrained optimization problem.
Let sn = wTxn +b and ρn = yn ·sn. The error function Ehinge(ρ) = max(1−ρ,0) is widely known as the hinge error. The hinge error is convex but is not differentiable everywhere. Therefore, it can be technically complicated to run gradient descent on the error. A possible workaround is to approximate the hinge error with a “smooth hinge error.” A good candidate of “smooth hinge error” is the following function
Esmooth 
Since Esmooth is differentiable everywhere, we can then apply gradient descent and related algorithms. Esmooth has a constant slope of −1 for all ρ ≤ 0 and is 0 for all ρ ≥ 1, just like Ehinge. Now, within (0,1), what is the uniformly-averaged squared difference between Esmooth and Ehinge? Choose the correct answer; explain your answer.
[e]   none of the other choices

Experiments with Soft-Margin SVM
For Problems 11 to 16, we are going to experiment with a real-world data set. Download the processed satimage data sets from LIBSVM Tools.

Training: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/satimage.scale

Testing: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/satimage.scale.t

We will consider binary classification problems of the form “one of the classes” (as the positive class) versus “the other classes” (as the negative class).

The data set contains thousands of examples, and some quadratic programming packages cannot handle this size. We recommend that you consider the LIBSVM package

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Regardless of the package that you choose to use, please read the manual of the package carefully to make sure that you are indeed solving the soft-margin support vector machine taught in class like the dual formulation below:

                                                subjectto          

In the following problems, please use the 0/1 error for evaluating Ein, Eval and Eout (through the test set). Some practical remarks include

(i)     Please tell your chosen package to not automatically scale the data for you, lest you should change the effective kernel and get different results.

(ii)   It is your responsibility to check whether your chosen package solves the designated formulation with enough numerical precision. Please read the manual of your chosen package for software parameters whose values affect the outcome—any ML practitioner needs to deal with this kind of added uncertainty.

11.   (*) Consider the linear soft-margin SVM. That is, either solve the primal formulation of soft-margin

SVM with the given xn, or take the linear kernel K(xn,xm) = xTnxm in the dual formulation. With C = 10, and the binary classification problem of “5” versus “not 5”, which of the following numbers is closest to ∥w∥ after solving the linear soft-margin SVM? Choose the closest answer; provide your command/code.

[a]   4.5

[b]   5.0

[c]    5.5

[d]   6.0

[e]   6.5

12.   (*) Consider the polynomial kernel  , where Q is the degree of the polynomial. With C = 10, Q = 3, which of the following soft-margin SVM classifiers reaches the largest Ein? Choose the correct answer; provide your command/code.

[a]   “2” versus “not 2”

[b]   “3” versus “not 3”

[c]    “4” versus “not 4”

[d]   “5” versus “not 5”

[e]   “6” versus “not 6”

13.   (*) Following Problem 12, which of the following numbers is closest to the maximum number of support vectors within those five soft-margin SVM classifiers? Choose the closest answer; provide your command/code.

[a]   450

[b]   500

[c]    550

[d]   600

[e]   650

14.   (*) Consider the Gaussian kernel  . For the binary classification problem of “1” versus “not 1”, when fixing γ = 10, which of the following values of C results in the lowest Eout? If there is a tie, please pick the smallest C. Choose the correct answer; provide your command/code.

[a]   0.01

[b]   0.1 [c] 1

[d]   10

[e]   100

15.   (*) Following Problem 14, when fixing C = 0.1, which of the following values of γ results in the lowest Eout? If there is a tie, please pick the smallest γ. Choose the correct answer; provide your command/code.

[a]   0.1

[b]   1

[c]    10

[d]   100

[e]   1000

16.   (*) Following Problem 14 and consider a validation procedure that randomly samples 200 examples from the training set for validation and leaves the other examples for training gsvm− . Fix C = 0.1 and use the validation procedure to choose the best γ among {0.1,1,10,100,1000} according to Eval. If there is a tie of Eval, choose the smallest γ. Repeat the procedure 1000 times. Which of the following values of γ is selected the most number of times? Choose the correct answer; provide your command/code.

[a]   0.1

[b]   1

[c]    10

[d]   100

[e]   1000

More products