Starting from:

$20

ITE4005-Programming Assignment 2 Solved

1.  Build a decision tree, and then classify the test set using it 

 

3. Requirements
The program must meet the following requirements: l        Execution file name: dt.exe 

l  Execute the program with three arguments: training file name, test file name, output file name n       Example: 

  

- Training file name=‘dt_train.txt’, test file name=‘dt_test.txt’, output file name=‘dt_result.txt’ - If using python, you are allowed to use 'dt.py' file instead of 'dy.exe'. 

l  Dataset n          We provide you with 2 datasets 

-                    Buy_computer: dt_train.txt, dt_test.txt 

-                    Car_evaluation: dt_train1.txt, dt_test1.txt n  You need to make your program that can deal with any datasets n        We will evaluate your program with other datasets. 

l  File format for a training set 

[attribute_name_1]\t[attribute_name_2]\t … [attribute_name_n]\n 

[attribute_1]\t[attribute_2]\t … [attribute_n]\n 

[attribute_1]\t[attribute_2]\t … [attribute_n]\n 

[attribute_1]\t[attribute_2]\t … [attribute_n]\n 

n  [attribute_name_1] ~ [attribute_name_n]: n attribute names 

n  [attribute_1] ~ [attribute_n-1] 

-               n-1 attribute values of the corresponding tuple 

-               All the attributes are categorical (not continuous-valued) n     [attribute_n]: a class label that the corresponding tuple belongs to n      Example 1 (data_train.txt): 

 

                                                     Figure 1. An example   of the first training set. 

 

n  Example 2 (data_train1.txt): 

  

Figure 2. An example of the second training set. 

 

-               Title: car evaluation database 

-               Attribute values l  Buying: vhigh, high, med, low l          Maint: vhigh, high, med, low l   Doors: 2, 3, 4, 5more l         Persons: 2, 4, more l            Lug_boot: small, med, big l            Safety: low, med, high 

-               Class labels: unacc, acc, good, vgood 

-               Number of instances: training set - 1,382; test set - 346 

l  Attribute selection measure: information gain, gain ratio, or gini index l      File format for a test set 

[attribute_name_1]\t[attribute_name_2]\t … [attribute_name_n-1]\n 

[attribute_1]\t[attribute_2]\t … [attribute_n-1]\n 

[attribute_1]\t[attribute_2]\t … [attribute_n-1]\n 

[attribute_1]\t[attribute_2]\t … [attribute_n-1]\n 

n  The test set does not have [attribute_name_n] (class label) n            Example 1 (dt_test.txt): 

 

                                               Figure 3. An example of the first test set. 

n  Example 2 (dt_test1.txt): 

  

Figure 4. An example of the second test set. 

 

l  Output file format 

[attribute_name_1]\t[attribute_name_2]\t … [attribute_name_n]\n 

[attribute_1]\t[attribute_2]\t … [attribute_n]\n 

[attribute_1]\t[attribute_2]\t … [attribute_n]\n 

[attribute_1]\t[attribute_2]\t … [attribute_n]\n 

n  Output file name: dt_result.txt (for 1th dataset), dt_result1.txt (for 2nd dataset) n     You must print the following values: 

-               [attribute_1] ~ [attribute_n-1]: given attribute values in the test set 

-               [attribute_n]: a class label predicted by your model for the corresponding tuple n      Please DO NOT CHANGE the order of the tuples in each test set. 

-               You should print your outputs to match the order of correct answers. 

n  Please be sure to use \t to identify your attributes. 

 

More products