Starting from:

$25

ML-Homework 2 Union Of Intervals Solved

. Union Of Intervals. In this question, we will study the hypothesis class of a finite union of disjoint intervals, and the properties of the ERM algorithm for this class.

To review, let the sample space be X = [0,1] and assume we study a binary classification problem, i.e. Y = {0,1}. We will try to learn using an hypothesis class that consists of k intervals. More explicitly, let I = {[l1,u1],...,[lk,uk]} be k disjoint intervals, such that 0 ≤ l1 ≤ u1 ≤ l2 ≤ u2 ≤ ... ≤ uk ≤ 1. For each such k disjoint

intervals, define the corresponding hypothesis as

(

                                                                                            1           if x ∈ [l1,u1],...,[lk,uk]

hI(x) =

                                                                                            0     otherwise

Finally, define Hk as the hypothesis class that consists of all hypotheses that correspond to k disjoint intervals:

Hk = {hI|I = {[l1,u1],...,[lk,uk]}, 0 ≤ l1 ≤ u1 ≤ l2 ≤ u2 ≤ ... ≤ uk ≤ 1}

We note that Hk ⊆ Hk+1, since we can always take one of the intervals to be of length 0 by setting its endpoints to be equal. We are given a sample of size m = hx1,y1i,...,hxn,ymi. Assume that the points are sorted, so that 0 ≤ x1 < x2 < ... < xm ≤ 1.



Explanation on intervals.py:

 

The file intervals.py includes a function that implements an ERM algorithm for Hk. Given a sorted list xs = [x1,...,xm], the respective labeling ys = [y1,...,ym] and k, the given function find best interval returns a list of up to k intervals and their error count on the given sample. These intervals have the smallest empirical error count possible from all choices of k intervals or less.

Note that in sections (d)-(f) you will need to use this function for large values of m. Execution in these cases could take time (more than 10 minutes for experiment), so plan ahead.

4                                                                                                                                                       

(a) Assume that the true distribution P[x,y] = P[y|x] · P[x] is: x is distributed uniformly on the interval [0,1], and

(

                                                                                    0.8       if x ∈ [0,0.2] ∪ [0.4,0.6] ∪ [0.8,1]

P[y = 1|x] =

                                                                                    0.1       if x ∈ [0.2,0.4] ∪ [0.6,0.8]

and P[y = 0|x] = 1 − P[y = 1|x].

Write a function that draws m pairs of (x,y) according to the distribution P. Use it to draw a sample of size m = 100 and create a plot:

i.        Plot the points and their label (have the y axis in range −0.1,1.1 for clarity of presentation).

ii.      Mark the lines x = 0.2,0.4,0.6,0.8 clearly on the plot.

iii.    Run the find_best_interval function on your sample with k = 3, and plot the intervals clearly.

(b)    Note that here, we know the true distribution P, so for every given hypothesis h ∈ Hk, we can calculate error(h) precisely. What is the hypothesis with the smallest error?

(c)    Write a function that, given a list of intervals, calculates the error of the respective hypothesis. Then, for k = 3, m = 10,15,20,...,100, perform the following experiment T = 100 times: (i) Draw a sample of size m and run the ERM algorithm on it; (ii) Calculate the empirical error for the returned hypothesis; (iii) Calculate the true error for the returned hypothesis. Plot the average empirical and true errors, averaged across the T runs, as a function of m. Discuss the results. Do the empirical and true error decrease or increase in m? Why?

(d)    Draw a data set of m = 1500 samples. Find the best ERM hypothesis for k = 1,2,...,20, and plot the empirical and true errors as a function of k. How does the error behave? Define k∗ to be the k with the smallest empirical error for ERM? Does this mean the hypothesis with k∗ intervals is a good choice?

(e)    Now we will use the principle of structural risk minimization (SRM), to search for a k that gives good test error. Let δ = 0.1:

•    Use your results from question 3 in the theoretical part and the VC confidence bound to construct a penalty function.

•    Draw a data set of m = 1500 samples, run the experiment in (d) again, but now plot two additional lines as a function of k: 1) the penalty for the best ERM hypothesis and 2) the sum of penalty and empirical error.

•    What is the best value for k in each case? is it better than the one you chose in (d)?

(f) Here we will use holdout-validation to search for a k that gives good test error. Draw a data set of m = 1500 samples and use 20% for a holdoutvalidation. Choose the best hypothesis based on 3 experiments. Discuss how close this gets you to finding the hypothesis with optimal true error.

More products