Starting from:

$25

CSE472 -Machine Learning Sessional - L-4 - T-2 - Assignment 1 - Ensemble Learning - Solved

In ensemble learning, we combine decisions from multiple weak learners to solve a classification problem. It is expected that the combined decision will perform better than the individual models in the process of one model correcting the errors of the other. There are many ensemble methods such as stacking, bagging and boosting. In this assignment you will implement the AdaBoost algorithm. The necessary details are as follows.

1.      As the weak/base learner use decision stump. A decision stump is a decision tree of depth one (i.e., it branches on only one attribute and then makes decision). 

2.      There are several implementations of AdaBoost algorithm. Follow the pseudocode given in the class.

3.      You should make your code as modular as possible. Namely, your main module of Ada-Boosting should treat the base learner as a blackbox (in this case a decision stump) and communicate with it via a generic interface that inputs weighted examples and outputs a classifier, which then can classify any instances. 

4.      To incorporate effect of weighted dataset, create training data by sampling with replacement strategy. Use information-gain as the evaluation criterion. 

5.      For each stump check if the total weighted error is less than .5 to proceed to next step. 

6.      Use original data as test set and update weight vector for test set.

7.      To train and test your model, use the file bank-additional-full.csv from the following link https://archive.ics.uci.edu/ml/datasets/Bank+Marketing. The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict whether the client will subscribe a term deposit (variable y) or not. 

8.      Analyze the expected performance of your model using k-fold cross validation for k=5, 10 and 20. Use F1 score as your evaluation metric.

 

Instructions for Report Writing

 

1.      Your final report will contain the following points:

Expected accuracies obtained using k-fold cross validation for k=5, 10 and 20.
Compare the accuracies obtained by decision stump and boosting with 30 rounds.
c.       Compare the accuracies obtained by boosting with 5, 10 and 20 rounds.

2.      Just answer the questions precisely. Make it as simple as possible. 

More products