There are a total of 4 (four) multi-part questions, with point values noted for each question.
Please show your calculations, or the details of your program(s), for each problem. Your program(s) should be commented so that each step is clearly explained.
Combine all of your answers/files into a single zipped file and post the zipped file.
Problems #1 and #2
Using an “Addiction” dataset, a researcher has prepared the following table of patient counts:
Ethnicity Age Category Alcohol Cocaine Heroin Row Total Black Old 30 48 17 95
Young 25 72 13 110 Hispanic Old 7 0 5 12
Young 8 7 19 34 White Old 60 2 17 79
Young 26 10 34 70 Column Total
156 139 105 400
Use the table above and Excel to classify patient addiction type (alcohol, cocaine, heroin) using Ethnicity and Age Category:
a. Construct a classification and regression tree (CART) (two levels only). (35 Points)
b. Construct a C4.5 decision tree (two levels only). (30 Points)
3) Use R/python to cluster (Algorithm=K-means; K=2) the seven (7) already normalized points in the accompanying table and answer a and b below: (20 points)
X Y Z a 1 1 1 b 5 3 4 c 4 4 5 d 4 3 4 e 1 2 1 f 4 4 4 g 2 1 2
a. What are the members of each cluster?
b. What are the coordinates for the cluster centers?
4) Using data in the table below, construct a Neural Network with one Output Layer (z) and one Hidden Layer (A and B). Calculate the predicted outcome if the inputs to the input nodes are (x=1, Node 1=.4, Node 2=.7 Node 3= .7 and Node 4=.2). (15 points)
From To Weight X A 0.5 Node 1 A 0.6 Node 2 A 0.8 Node 3 A 0.6 Node 4 A 0.2 x B 0.7 Node 1 B 0.9 Node 2 B 0.8 Node 3 B 0.4 Node 4 B 0.2 xx z 0.5 A z 0.9 B z 0.9