$39.99
0. Work through the 03-DTrees notebook
1. Consider the following dataset, which contains examples describing several cases of sunburn:
a) What is the entropy of this dataset with respect to the target class
Name Hair Height Build Lotion Result
1 Sarah blonde average light no sunburned
2 Dana blonde tall average yes none
3 Alex brown short average yes none
4 Annie blonde short average no sunburned
5 Emily red average heavy no sunburned
6 Pete brown tall heavy no none
7 John brown average heavy no none
8 Katie brown short light yes none
label Result?
b) Construct the decision tree that would be built with Information Gain for this dataset. Show your work for selection of the root feature in your tree. You can infer the rest of the tree from the data.
c) Using your decision tree from (b), how would you classify the following example X?
Hair Height Build Lotion Result
X blonde average heavy no ???
!
Credit History Debt Income Risk
1 bad low 0to30 high
2 bad high 30to60 high
3 bad low 0to30 high
4 unknown high 30to60 high
5 unknown high 0to30 high
6 good high 0to30 high
7 bad low over60 medium
8 unknown low 30to60 medium
9 good high 30to60 medium
10 unknown low over60 low
11 unknown low over60 low
12 good low over60 low
13 good high over60 low
14 good high over60 low
a) What is the entropy of this dataset with respect to the target class label Risk based on the 14 examples above?
b) Compute the entropy of each of the 3 descriptive features.
c) Which one of the descriptive features would be selected by ID3 at the root of a decision tree? Explain your answer. Show all the steps of the calculations.
!
3. For the datasets analysed in the 03 DTrees notebook, will the resulting trees be different if the feature selection criterion is ‘gini’ instead of ‘entropy’.
4. If a decision tree is allowed to be too bushy it is likely to overfit the training data. Consequently decision trees are often pruned to prevent overfitting.
In the Penguins example in the 03 DTrees notebook we use the min_samples_leaf attribute to control the size of the tree.
a) What does the Penguins tree look like when no pruning is enforced?
b) What other options does sklearn provide to manage the bushiness of the tree? (Check out the documentation)
c) Use two other pruning strategies to produce similar trees.
5. Download the zip file ‘03-BYO DTree-Python focus’. It contains a notebook that takes you through building your own Decision Tree classifier in Python with a significant focus on writing good Python code. Work through this notebook.