Starting from:

$30

Machine-Learning Homework 3 Solved

1.  Perceptron Lower Bound. Show that for any 0 < γ < 1 there exists a number d 0, vector w? ∈ Rd and a sequence of examples (x1,y1),...,(xm,ym) such that:

 .

(c) Perceptron makes   mistakes on the sequence.

(Hint: Choose  and let {xi}i be the standard basis of Rd)

2.   Halving Algorithm. Denote by AHal the Halving algorithm you have seen in class. Let d ≥ 6, X = {1,...,d} and let H = {hi,j : 1 ≤ i < j ≤ d} where

  .

Show that M(AHal,H) = 2.

(Definition of mistake bound M(A,H): Let H be a hypothesis class and A an online algorithm. Given any sequence S = (x1,h?(x1)),...,(xm,h?(xm)) where m is an integer and h? ∈ H, let MA(S) be the number of mistakes A makes on the sequence S. Then M(A,H) = supS MA(S)).

3.   Interval growth function. The goal of this exercise is to compute the growth function of the interval hypothesis class H = {ha,b : a < b} where ha,b(x) = 1 if x ∈ [a,b] and 0 otherwise. In other words, your goal is to give an explicit expression to ΠH(m) = maxC:|C|=m |HC| where HC is the restriction of H on the set C.

4.   Sample complexity of agnostic PAC. Let H be a hypothesis class of functions from a domain X to {0,1} and let the loss function be the 0-1 loss. Assume that V Cdim(H) = d < ∞. Show that if

 

then

Pr[ 

1

2                                                                                              Handout Homework 3: April 5, 2020

To prove the above claim you can use the following lemma without proving it:

Lemma: Let a ≥ 1 and b 0. Then: x ≥ 4alog(2a) + 2b → x ≥ alog(x) + b.

You can also assume that δ is as small as you desire.

5.  Prediction with Log-Loss. Consider a prediction setting with input X and true label Y ∈ {0,1} (i.e., it has two possible values: zero and one). The predictor h(x;θ) returns a number in [0,1] via the function

                                                                                          .                                                     (1)

Here θ(x) is a function of x which defines h. Note that this construction makes sure that h(x) ∈ [0,1].

Consider the loss function (this is an instance of the well-known cross-entropy loss, which we will learn more about):

                                                                    ∆(y,yˆ) = −y log(ˆy) − (1 − y)log(1 − yˆ)                                    (2)

Find the value of θ(x) that minimizes E[∆(Y,h(X;θ))]. Use these to express h(x) as a simple function of E[Y |X].


           Handout Homework 3: April 5, 2020                                                                                               3

Programming Assignment
Submission Guidelines:

•     Download the file skeleton perceptron.py from Moodle. In each of the following questions you should only implement the algorithm at each of the skeleton files. Plots, tables and any other artifact should be submitted with the theoretical section.

•     In the file skeleton  perceptron.py there is an helper function. The function reads the examples labelled 0, 8 and returns them with 0-1 labels. If you are unable to read the MNIST data with the provided script, you can download the file from here: https://github.com/amplab/datasciencesp14/blob/master/lab7/mldata/mnist-original.mat.

•     Your code should be written with Python 3.

•     Make sure to comment out / remove any code which halts the code execution, such as matplotlib popup.

•     Your submission should include exactly one file: perceptron.py.

1. (20 points) Perceptron. Implement the Perceptron algorithm (in the file name perceptron.py). Do not forget to normalize the samples to have unit length (i.e., kxik = 1).

(a)    (8 points) Use only the first n = 5,10,50,100,500,1000,5000 samples as an input to Perceptron. For each n, run Perceptron 100 times, each time with a different random order of inputs, and calculate the accuracy of the classifier on the test set of each run. You should therefore have 100 accuracy measurement per n. Print a table showing, for each n, the mean accuracy across the 100 runs, as well as the 5% and 95% percentiles of the accuracies obtained.

(b)    (4 points) The weight vector w, returned by Percepton, can be viewed as a matrix of weights, with which we multiply each respective pixel in the input image. Run Perceptron on the entire training set, and show w, as a 28×28 image, for example with imshow(reshape(image, (28, 28)), interpolation=’nearest’). Give an intuitive interpretation of this image.

(c)     (4 points) Calculate the accuracy of the classifier trained on the full training set, applied on the test set.

(d)    (4 points) Choose one (or two) of the samples in the test set that was misclassified by Perceptron (with the full training set) and show it as an image (show the unscaled images). Can you explain why it was misclassified?

More products