1. we derived an expression for the signed distance d between an arbitrary point x (or p) and a hyperplane H given by g(x)= w0 + wT x = 0 , all in nonaugmented feature space. This question explores this topic further.
(a) Prove that the weight vector w is normal to H.
Hint: For any two points x1and x2 on H, what is g(x1)− g(x2)? How can you interpret the vector (x1− x2) ?
(b) Show that the vector w points to the positive side of H. (Positive side of H means the d 0 side.)
Hint: What sign does the distance d from H to x =(x1+ aw) have, in which x1 is a point on H?
(c) Derive, or state and justify, an expression for the signed distance r between an arbitrary point x(+) and a hyperplane g(x(+))= w(+)T x(+) = 0 in augmented feature space. Set up the sign of your distance so that w points to the positive-distance side of H.
(d) In weight space, using augmented quantities, derive an expression for the signed distance between an arbitrary point w(+) and a hyperplane g(x(+))= w(+)T x(+) = 0 , in
which the vector x(+) defines the positive side of the hyperplane.
2. For a 2-class learning problem with one feature, you are given four training data points (in augmented space):
(a) Plot the data points in 2D feature space. Draw a linear decision boundary H that correctly classifies them, showing which side is positive.
(b) Plot the reflected data points in 2D feature space. Draw the same decision boundary; does it still classify them correctly?
(c) Plot the reflected data points, as lines in 2D weight space, showing the positive side of each. Show the solution region.
(d) Also, plot the weight vector w of H from part (a) as a point in weight space. Is w in the solution region?
3. (a) Let p(x) be a scalar function of a D-dimensional vector x , and f ( p) be a scalar function of p. Prove that:
p. 1 of 2
⎣ ⎦
i.e., prove that the chain rule applies in this way. [Hint: you can show it for the ith component of the gradient vector, for any i. It can be done in a couple lines.]
(b) Use relation (18) of DHS A.2.4 to find ∇x(xT x).
(c) Prove your result of ∇x(xT x) in part (b) by, instead, writing out the components.
⎡( T )3⎤
(d) Use (a) and (b) to find ∇x⎢ x x ⎥ in terms of x .
⎣ ⎦
4. (a) Use relations above to find ∇w w 2 . Express your answer in terms of w 2 where
possible. Hint: let p = wTw; what is f ?
(b) Find: ∇w Mw− b 2. Express your result in simplest form. Hint: first choose p
(remember it must be a scalar).
5. [Extra credit] For C 2 , show that total linear separability implies linear separability, and show that linear separability doesn’t necessarily imply total linear separability. For the latter, a counterexample will suffice.