SML-Assignment 5 Decision Trees, Bagging and Random Forest Solved
❖ Dataset:
The attached dataset is about PM2.5.
Training Data: Two alternate years can be taken as train data.
Testing Data: Two years of data from the remaining three can be taken as test data.
❖ Problem Statement:
You are supposed to perform two tasks for this dataset: Classification and Regression
1. Classification Task: Target Column: “Month”
Evaluation Metric: Accuracy
2. Regression Task: Target Column: “PM2.5”
Evaluation Metrics: MSE
Also, report the mean and standard deviation of the error.
Implement the above problem statement(both for Classification and Regression) from scratch using the following:
i. Decision Trees (DT) - ( You have to analyze yourself as told in class for different depths, width and other parameters of the tree and draw your inferences.)
ii. Bagged Decision Trees
iii. Random Forest
Implement these as taught in the class.
● Gaussian Processes:
In the data provided to you, you will find signal strength in dB vs distance. Assume the “Distance ” to be an independent variable and “Signal Strength” as a target. Compute the mean and variance prediction for signal strength at the following 5 points {Sr. No.: <2,4,6,8,10}. Use GPR to train using the remaining data points {Sr. No.: <1,3,5,7,9,11,12} from the table provided.