$24.99
This week’s tutorial is about how to load data into a relational database, such as PostgreSQL, from a CSV file, as well as how to transform and clean the data using Python code and Jupyter notebooks. We will also further train SQL and the data analytical operations which we covered in this week’s lecture.
• hostname: soitpw11d59.shared.sydney.edu.au
• username: y21s1d2x01 yourUnikey
• password: your SID
• port: 5432
(Note: this is not your normal unikey login/password!). For example, if your unikey is abcd1234 then your username would be y21s1d2x01 abcd1234.
For today’s exercises - you should modify the db2x01.json file to match your appropriate credentials.
Exercise 1. SQL+Python - Resources for today
Exercise 2. Data Loading and Database Creation with Python
This weeks Jupyter notebook has some instructions on how to connect to a database using Python. These will guide you in how to load the Organisations.csv file into the database.
Follow the instruction there except now load the Measurements.csv into a new table ’MeasurementsWk4’ in PostgreSQL.
(Note: the data provided is slightly simpler than last week to make this task easier; every attribute has its own column here.)
1
Exercise 3. Data Analysis in SQL + Querying a Database from Python
Still working in Jupyter, your task is to answer each of the following questions with an SQL query which you are issuing from Python, and whose result you give out here in the Jupyter notebook. The idea here is to train both SQL analytical queries, and how to query an existing database from a Python program.
(a) List the average water temperature per year.
(b) Find the minimum and the maximum water temperature per year.
(c) List the average water flow per station and year.
(d) List the number of temperature measurements per station, with the stations given by name and in descending order of the number of measurements.
(e) How many stations does each organisation have? List the organisations by name and in descending order of the number of associated stations.
Exercise 4. Data Importing via command line (ADV)
Follow the instructions on the jupyter notebook to load the data via psql.
This will be a similar login process to last weeks advanced exercises.
Exercise 5. Prescriptive Statistics with SQL (ADV)
The following set of SQL questions are for students in the advanced stream (DATA2901). They refer back to the advanced SQL content covered in the advanced seminar.
(a) Find the average water temperatures per year and per station, as well as the averages per station and the overall temperature values per year – using a single SQL query.
(b) Find the five statistical values needed for multiple Tukey Boxplots on the value distributions of the water temperature measurements at station ’Murray River at Corowa’ per year.
(c) Are there any outliers of water temperature measurements at ’Murray River at Corowa’ per year? If yes, list them (per year).
(d) Is there a correlation between the annual water temperature measurements at ’Murray River at Albury’ and at ’Murray River at Barham’?
Exercise 6. SQL Grok Exercises
Please wait for the announcement on Ed.
2