$30
Project
• This project is about the ID3 decision tree learning algorithm.
• Obtain two or more classification datasets from https://archivebeta.ics.uci.edu/ml/datasets. o It is up to you to choose whichever datasets you like but choose them wisely.
o Make sure that at least one of the datasets you choose has at least one attribute with continuous values. o Make sure that the target attribute (label) of at least one of the datasets you choose can have more than two possible values (not simply binary yes/no classification). For example, the instances in the wine datasets belong to one of three different classes.
o You will need to split the datasets into training sets and validation sets; make sure that there is enough data to do this.
• You are required to implement the ID3 algorithm yourself – do not use an existing implementation (or copy someone else’s work).
• Your implementation needs to support continuous-valued attributes.
• Experiment with your implementation on the datasets you have chosen and discuss your results.
• In your implementation make sure to include a method (whichever one you like) to deal with overfitting.
• Experiment with this overfitting countermeasure and discuss your results.
• If you need to, feel free to use any external libraries help you to import (read) the datasets. The datasets are plain text files, so reading them yourself shouldn’t be a big deal.