Starting from:

$30

CSCI316  - Group 1  - Solved

 Big Data Mining Techniques and Implementation
The Task
 (10 marks) 

 

Dataset: Steel Industry Energy Consumption Dataset 

(Source: https://archive.ics.uci.edu/ml/datasets/Steel+Industry+Energy+Consumption+Dataset) 

 

The information gathered is from the DAEWOO Steel Co. Ltd in Gwangyang, South Korea. It produces several types of coils, steel plates, and iron plates. The information on electricity consumption is held in a cloud-based system. The information on energy consumption of the industry is stored on the website of the Korea Electric Power Corporation (pccs.kepco.go.kr), and the perspectives on daily, monthly, and annual data are calculated and shown.  

 

(Reference: Sathishkumar et al.’s Building Research & Information paper in the above link, which is accessible in the UOW digital library, i.e., https://www.uow.edu.au/library/) 

 

Objective 

The objective of this task is to develop an end-to-end data mining project by using the Python machine learning library Scikit-Learn. The output of the project is a classification model to predict the load type. 

 

Requirements 

(1)            Main steps of the project are (a) “discover and visualise the data”, (b) “prepare the data for machine learning algorithms”, (c) “select and train models”, (d) “hyperparameter fine-tuning” and (e) “evaluate the outcomes”. You can structure the project in your own way. Some steps may be performed more than once.  

(2)            Clearly explain your findings at each step. 

(3)            In the steps (c) and (d), select and train at least 3 classifiers (from 3 different algorithms). 

(4)            Use ~80% data for training and ~20% for testing the models. Stratified sampling must be used.  

(5)            Define some new features by using the User-Defined Transform functionality, which implements a 

parameter to use those new features or not in the model fine-tuning step (i.e., step (d)). 

More products