Starting from:

$25

DATA8001 -  Assignment 1 - Data ETL  - Solved

All code required to reproduce the data ETL process should be placed in the Python library file (at the bottom where indicated): lib/R00000000_util.py and able to be called from the Jupyter Notebook: R00000000_A1_Notebook.ipynb.

 

Original Data Headings 

Column Name 
Column Description 
car_reg
the car registration plate
purchase_date
the purchase date of the car
county
the county car was purchased & registered
make
the car manufacturers name
model
the car model name
type
the type of car (e.g., saloon, hatchback etc.)
colour
the colour of the car
tax_band
the tax band of the car
price
the purchase price of the car in Euros
 

                 

Processed Data Headings & Expected Data Types 

Column Name 
Column Description 
Data Type 
car_reg
Cleaned car registration plate
String (uppercase)
purchase_date
Cleaned purchase date of the car
Datetime
year
The year the car was purchased
Int
month
The month the car was purchased
Int
county
Cleaned county name
String (uppercase)
make
Cleaned car manufacturers name
String (uppercase)
model
Cleaned car model name
String (uppercase)
type
Cleaned car type
String (uppercase)
colour
Cleaned colour of the car
String (uppercase)
tax_band
Cleaned tax band of the car
String (uppercase)
price
the purchase price of the car in Euros
Float
 

Example 

   

Data Visualisation – 10%
Load the processed dataset (data/R00000000_processed.csv) into the assignment notebook:

R00000000_A1_Notebook.ipynb and answer the 5 questions including 1 (& only 1) visualisation of your choice that best answers each question. Show your workings in the Jupyter Notebook for each question.

 

Data Modelling – 10%
Create a Linear Regression model and any transformations required to give your model the best accuracy. Using the Python class provided in lib/R00000000_util.py, save the object to the model folder as: model/R00000000.pkl.

All code required to reproduce the modelling process should be placed in the Python library file: lib/R00000000_util.py and able to be called from the Jupyter Notebook: R00000000_A1_Notebook.ipynb.

The pickled model file should be loaded and called from the Jupyter Notebook and available to process unseen test data including any transformations required to ensure the model works. Note: the unseen test data will have the same headings & datatypes as your data/R00000000_processed.csv file.

 

Report & Questions (15%)
Write a max 2-page report outlining the steps taken to complete the assignment. Identify any areas you feel are worth mentioning during the ETL, visualisation of modelling steps including any insights developed.

Answer 2 exam type questions (max 300 words) each. Note – due to the “open-book” nature of this assignment, a clean, concise and well-thought-out answer of your “own” viewpoint is expected, this is not a “cut and paste” exercises

More products