$25
Overview
The purpose of this coursework is to create a normalised database and a data warehouse given a dataset. The coursework is formally assessed and is worth 10% of your final grade and is broken down to 10 points distributed below. You will receive some feedback as part of the marking of the coursework.
Part 1: Data Loading, Transformation & Normalization (7 out of 10 points)
(1.1) Backup Restoring (0.5 points)
(1.2) Data Loading (1.5 points)
(1.3) Data Preprocessing (3 points)
(1.4) Data Integration (0.5 points)
(1.5) Database Normalization (1.5 points)
Part 2: Constraint Implementation (2 out of 10 points)
(2.1) Status Consistency (1 point)
(2.2) Status Transition (1 point)
Part 3: Data Warehouse (1 out of 10 points)
(3.1) ER Diagram (0.5 point)
(3.2) SQL Statements (0.5 point)
Requirements
Working with raw crime data from a police department involves analysing a sequence of records about crimes. There are two different tables in this dataset. A table with crime data and a table with images of crime locations. The data is shipped in three different files: (i) crimes2015.sql containing information about crimes reported in 2015 and the images of the locations of all crimes, (ii) crimes2013.txt containing information about crimes reported in 2013, and (iii) crimes2014.csv containing information about crimes reported in 2014.
Each record in the crime table represents an crime, with the following information:
• date reported: date when the crime was reported to the police
• date occ: date when the crime occurred
• time occ: time when the crime occurred
• occurred: the timestamp when the crime occurred. This column replaces the date occ and time occ in the crimes2015.sql file.
• dr no: crime id
• area: code of the area where the crime occurred
• area name: name of the area where the crime occurred
• rd: road number
• crime no: code of the type of crime
• crime desc: description of the type of crime
• status: code of the crime status
• status desc: description of the crime status
• image no: code of the image of the place where the crime occurred
Each record in the image table contains the image of a crime location:
• image no: image id
• image: image stored a the BLOB value (it is a binary large object that can hold a variable amount of data)
In a big city, there is a large number of crimes, so being able to read and analyse the data efficiently is very important. For this assignment you will import raw crime data into a MySQL database. As is typical of big data, there are many inconsistencies and anomalies in the dataset, so you may need to pre-process and clean the data before it is usable.
Part 1: Data Loading, Transformation & Normalisation
Setup. In Part 1, you will use SQL commands to load, transform and normalise the crime data. On KEATs you will find a .zip file which contains template SQL files to edit for your Pa
Part 1.1: Backup Restoring. Write the required SQL statement/s to restore the data contained in the file crimes2015.sql in the provided template file: backupRestoring.sql
• 1.2.1. Table Creation. Write the required SQL DDL (Data Definition Language) statements (i.e. CREATE TABLE) to create two tables (named crimes2013 and crimes2014) that can held the data contained in the datasets crimes2013.txt and crimes2014.csv. Ensure that:
– table and attribute names do not conflict with SQL reserved words
– attribute data types are core SQL data types as described in the lectures
• 1.2.2. Data Load. Write the required SQL statements to load the data contained in the datasets crimes2013.txt and crimes2014.csv into the tables crimes2013 and crimes2014.
Write all of these SQL statements in the above order in the provided template file: dataLoading.sql
Part 1.3: Data Preprocessing. Write the required SQL statements to remove any duplicates, inconsistencies and anomalies in the data (that is in the tables crimes2013, crimes2014 and crimes2015). Among others, the following inconsistencies and anomalies need to be solved:
• The tables in the different datasets do not have the same number of columns.
• The column time occ in some of the tables may not be formatted properly (e.g., the value ’1’ should be interpreted as ’00:01’).
• The column image no in some of the tables may contain invalid values.
• The capitalization of the status codes (status) are not consistent in the three tables.
• Different formats may be used for the dates.
• The data about some area names are missing.
Write all of the SQL statements required to preprocess the data in the provided template file: dataPreprocesing.sql
Part 1.4: Data Integration. Write the required SQL statements to integrate the data in the tables crimes2015, crimes2014 andcrimes2013 in a single table named crimes. Write these SQL statements in the provided template file: dataIntegration.sql
Part 1.5: Database Normalization. Write the required SQL statements to normalize the crimes table into a set of tables in 3NF. Ensure that:
• every table has a primary key
• all foreign keys are properly declared
Write these SQL statements in the provided template file: normalization.sql
Part 2: Constraint Implementation
Setup. In Part 2, you will use SQL commands to implement domain constraints in the tables contained in the crime2015.sql backup. On KEATs you will find a .zip file which contains template SQL files to edit for your Part 2 submission. Again, be sure to:
1. Include your NAME at the top of the file in a SQL line comment.
2. Edit these files as text, not Word files or propriety SQL software.
3. Do NOT rename the files.
4. Include any comments using the SQL line comment (i.e. lines beginning with −− ).
You can assume that these files are executed in the order that they appear in the description that follows:
1. backupRestoring.sql
2. constraints.sql
Any file that is missing, renamed, or does not run will result in 0 points for that section. Write the SQL statements to perform the following operations:
Part 2.1: Status Description Consistency. Create MySQL stored program/s (e.g., trigger) to ensure that the status and status desc are consistent (e.g., it is not possible to have two different status descriptions for the same status code).
Part 2.2: Status Transition. Create MySQL stored program/s (e.g., trigger) to ensure that the status code of crimes behaves as indicated by the following transition diagram:
That is the crimes are created with the status UNK, then crimes can transition to status IC, from that status they can transition to AO or JO, etc. Crimes can only be deleted when they are in status AA or JA.
Write all of these SQL statements in the above order in the provided template file: constraints.sql