Starting from:

$30

CS513- Final Exam Solved

 Problem 1 – data prep

 

The “IBM_attrition_v3” CSV dataset on CANVAS, shows whether an employee has left a company (“attrition=yes) or not. Create a dataset (Attrition_Modified) by engineering the following features:

a)       Delete all the rows with missing value.

b)      Create four categories (income1, income2, income3, income4) based on monthly income (“MonthlyIncome”) as:

                                                               i.      Monthly income <=2900

                                                             ii.      2900 < Monthly income <=5000

                                                           iii.      5000 < Monthly income <=8500

                                                           iv.      8500 < Monthly income  

c)       Create two categories (senior, not-senior) for years at the company (“YearsAtCompany”):

                                                               i.      Years at the company <=6

                                                             ii.      6 < years at the company

d)      Create two categories (young, mature) for age:

                                                               i.      age <=37

                                                             ii.      37<age

    Drop the original columns: MonthlyIncome, YearsAtCompany and age from the dataframe

Problem 2 – Random Forest

Use the Random Forest methodology to develop a classification model for attrition using the “Attrition_Modified “ dataset. Create test and training datasets, by selecting every fourth record, starting from the first observation, as the test dataset and the remaining records as the training dataset. Score the test dataset. What is the accuracy of your model?


Problem 3 – C5.0 

Use the C5.0   methodology to develop a classification model for attrition using the “Attrition_Modified “ dataset. Create test and training datasets, by selecting every fourth record, starting from the first observation, as the test dataset and the remaining records as the training dataset. Score the test dataset. What is the accuracy of your model?


Use Excel to solve the following two problems.

Problem # 4:

Using data in the table below, construct a Neural Network with one Output Layer (z) and one Hidden Layer (two nodes A and B).  Calculate the predicted outcome if the inputs to the input nodes are (Node 1=.4, Node 2=.7 Node 3= .7 and Node 4=.2)

Use the actual value of .75 and a learning factor of .1 to adjust the weight for A to z. (Extra credit for using Matrix multiplication)

From
To
Weight
X
A
0.5
Node 1
A
0.6
Node 2
A
0.8
Node 3
A
0.6
Node 4
A
0.2
x
B
0.7
Node 1
B
0.9
Node 2
B
0.8
Node 3
B
0.4
Node 4
B
0.2
xx
z
0.5
A
z
0.9
B
z
0.9
 

Problem # 5: 4.5

Use Excel and the C4.5 methodology to develop a classification model for the “admitted” outcome using the following training data (one level only):
Applicant
GRE
GPA
Admitted
1
Medium
High
Yes
2
Low
Low
No
3
High
Medium
Yes
4
Medium
Medium
No
5
Low
Medium
No
6
High
High
Yes
7
Low
Low
No
8
Medium
Medium
Yes
 

More products