$25
You will use the Appliances energy prediction data set. You should ignore the first attribute, which is a date-time variable, and you should also remove the last attribute, which is a duplicate of the previous one. For Logistic Regression, use the first attribute (after removing the date-time variable), which denotes the Appliances Energy Use, as the response variables, with the remaining attributes as predictor variables. However, you have to discretize the response variable differently for the two sections as noted below, since CSCI4390 will implement binary logistic regression, whereas CSCI6390 will implement multiclass logistic regression.
CSCI4390: Binary Logistic Regression
You will implement the binary logistic regression algorithm as described in Algorithm 24.1 (Chapter 24, page 628).
Note that the Appliances Energy Use attribute takes values in the range [10,1080]. However, for binary regression, we need only two values, so for the purpose of this assignment you should consider energy use less than or equal to 50 as the positive class (1), and energy use higher than 50 as negative class (0). You need to do this conversion to create the binary response variable, before you select the train (70%) and test (30%) subsets.
CSCI6390: Multiclass Logistic Regression
You will implement the binary logistic regression algorithm as described in Algorithm 24.2 (Chapter 24, page 634).
Note that the Appliances Energy Use attribute takes values in the range [10,1080]. However, for multiclass regression, we will convert these into four classes as follows: energy use less than or equal to 30 is class c1, energy use greater than 30 but less than or equal to 50 is class c2, energy use greater than 50 but less than or equal to 100 is class c3, and finally energy use higher than 100 is class c4. You need to do this conversion to create the categorical response variable, before you select the train (70%) and test (30%) subsets.