Assignment-2 Decision Trees, Random Forests and Perceptron
Instructions • Your submission should be a single zip file 2020xxx_HW1.zip (Where 2020xxx is your roll number). Include all the files (code and report with theory questions) arranged with proper names. A single .pdf report explaining your codes with results, relevant graphs, visualization and solution to theory questions should be there. The structure of submission should follow: 2020xxx_HW2 |− code_rollno.py/.ipynb |− report_rollno.pdf |− (All other files for submission) • Anything not in the report will not be graded. • Your code should be neat and well-commented. • You have to do either Section B or C. • Section A is mandatory.
1. (4 points) Section A (Theoretical) (b) (1 mark) Rahul decides to rely on a weather prediction app that claims to accurately forecast ’Rainy’ and ’Clear’ days. On any given day, the app predicts ’Rainy’ with a probability of 0.3 (30) and it predicts ’Clear’ with a probability of 0.7 (70percent )The app’s accuracy for predicting ’Rainy’ days is 80 percent, and its accuracy for predicting ’Clear’ days is 90percent. What is the probability that it’s going to rain on a day, given that the app predicts ’Rainy’?" Find the probability of all the possible outcomes and state the most likely outcome. 3. (15 points) Section B (Library Implementation) Decision Tree and Random Forests Perform classification task on the heart disease dataset using only the relevant attributes mentioned on the repository website. Dataset: Heart Disease - UCI Machine Learning Repository Page 2 4. (15 points) Section C (Algorithm implementation using packages) 1. Implement a Decision Tree from Scratch for Classification. Create a Decision Tree classifier from scratch using the NumPy and Pandas libraries. Design a class called MyDecisionTree (specifically for solving classification problems.) (a) cost_function(): Develop a cost function that could be either the Gini index or Information gain. This function should compute the impurity of a node or the gain achieved by a potential split. Gini index: 1- sum(proportion)2 proportion = number of values/count of rows (b) make_split(): Define the basic mechanism of splitting a node in the Decision Tree such that it selects the best feature and value to split on. (c) max_depth(): Define the maximum depth of the tree. (d) Pruning (optional): define a function to remove branches that doesn’t contribute much for improving accuracy. (e) predict() (f) score(): Create a function to evaluate the DT (performance metric) Provide proper documentation (well-commented codes) Page 3