$25
1. (60 points) For this problem, you will be doing LFD Problem 4.4 parts (a) through (d) with some changes / help / instructions / requirements. First, you can find headers for all the code you need to implement in your SVN repository for the class. There is also a matlab script called runexpts.m which you can use as an example for how to run your code to return the results we want. Second, read Problem 4.3 carefully. You can (and will need to) use the recurrence defined there as well as the formula in 4.3(e).
(a) In addition to answering the question about why we need to normalize f, also prove that the term to normalize by is (hint: use the formula in 4.3(e)).
(b) Answer the question. For your implementation, we suggest you use glmfit with the additional options ’normal’,’constant’,’off’.
(c) Answer the question (hint: use the formula in 4.3(e)).
(d) Implement the framework and answer the questions, with the modification that youonly need to look at Qf 2 {5,10,15,20},N 2 {40,80,120}, 2 2 {0,0.5,1.0,1.5,2.0}. Compute both the median and the mean of the overfit measure applied to many (at least 500) different datasets for each choice of parameters, and report how these measures vary as a function of the complexity of the true hypothesis, the number of training
1
examples, and the level of stochastic noise (use line graphs). Explain your observations, and also comment on the differences you observe between the mean and median measures.
Here are some potentially useful notes and hints for this:
• You will be graded on your writeup.for credit, but we may look at and examine your code manually if needed.Correctness of the code in itself does not count
• You should use your judgment in selecting which graphs to show in support of youranswers and explanations. There are different acceptable ways to do this. For example, you could include 3-6 graphs, selected to show what you think is most interesting/relevant. For each one, you could hold one variable constant, and plot different lines for a second variable, while putting the third one on the X axis. Alternatively you could explore heatmaps/colormaps/colorbars.
• Do not use the Matlab built-in functions related to Legendre polynomials – those com-pute something different from what we are looking for.
• You may use or modifyof how you could do things, not to be the last word on the issue.runexpts.m as you see fit. It’s meant to provide an example
2. (20 points) LFD Problem 5.4
3. (20 points) You have been hired by a biologist to learn a decision tree to determine whether a mushroom is poisonous. You have been given the following data:
Color
Stripes
Texture
Poisonous?
Purple
No
Smooth
No
Purple
No
Rough
No
Red
Yes
Smooth
No
Purple
Yes
Rough
Yes
Purple
Yes
Smooth
Yes
Use ID3 to learn a decision tree from the data (this is a written exercise – no need to code it up):
(a) What is the root attribute of the tree? Show the computations. (b) Draw the decision tree obtained using ID3.
2