$20
1. Build a decision tree, and then classify the test set using it
3. Requirements
The program must meet the following requirements: l Execution file name: dt.exe
l Execute the program with three arguments: training file name, test file name, output file name n Example:
- Training file name=‘dt_train.txt’, test file name=‘dt_test.txt’, output file name=‘dt_result.txt’ - If using python, you are allowed to use 'dt.py' file instead of 'dy.exe'.
l Dataset n We provide you with 2 datasets
- Buy_computer: dt_train.txt, dt_test.txt
- Car_evaluation: dt_train1.txt, dt_test1.txt n You need to make your program that can deal with any datasets n We will evaluate your program with other datasets.
l File format for a training set
[attribute_name_1]\t[attribute_name_2]\t … [attribute_name_n]\n
[attribute_1]\t[attribute_2]\t … [attribute_n]\n
[attribute_1]\t[attribute_2]\t … [attribute_n]\n
[attribute_1]\t[attribute_2]\t … [attribute_n]\n
n [attribute_name_1] ~ [attribute_name_n]: n attribute names
n [attribute_1] ~ [attribute_n-1]
- n-1 attribute values of the corresponding tuple
- All the attributes are categorical (not continuous-valued) n [attribute_n]: a class label that the corresponding tuple belongs to n Example 1 (data_train.txt):
Figure 1. An example of the first training set.
n Example 2 (data_train1.txt):
Figure 2. An example of the second training set.
- Title: car evaluation database
- Attribute values l Buying: vhigh, high, med, low l Maint: vhigh, high, med, low l Doors: 2, 3, 4, 5more l Persons: 2, 4, more l Lug_boot: small, med, big l Safety: low, med, high
- Class labels: unacc, acc, good, vgood
- Number of instances: training set - 1,382; test set - 346
l Attribute selection measure: information gain, gain ratio, or gini index l File format for a test set
[attribute_name_1]\t[attribute_name_2]\t … [attribute_name_n-1]\n
[attribute_1]\t[attribute_2]\t … [attribute_n-1]\n
[attribute_1]\t[attribute_2]\t … [attribute_n-1]\n
[attribute_1]\t[attribute_2]\t … [attribute_n-1]\n
n The test set does not have [attribute_name_n] (class label) n Example 1 (dt_test.txt):
Figure 3. An example of the first test set.
n Example 2 (dt_test1.txt):
Figure 4. An example of the second test set.
l Output file format
[attribute_name_1]\t[attribute_name_2]\t … [attribute_name_n]\n
[attribute_1]\t[attribute_2]\t … [attribute_n]\n
[attribute_1]\t[attribute_2]\t … [attribute_n]\n
[attribute_1]\t[attribute_2]\t … [attribute_n]\n
n Output file name: dt_result.txt (for 1th dataset), dt_result1.txt (for 2nd dataset) n You must print the following values:
- [attribute_1] ~ [attribute_n-1]: given attribute values in the test set
- [attribute_n]: a class label predicted by your model for the corresponding tuple n Please DO NOT CHANGE the order of the tuples in each test set.
- You should print your outputs to match the order of correct answers.
n Please be sure to use \t to identify your attributes.