$30
Overview
· This exam consists of a ”theory” component, and a ”practical” component.
· This exam will count as 10% of your final course grade.
· All work must be done individually.
· This exam is open book and open notes. That includes any online documentation needed for the practical component.
The ”theory” component of the exam should be completed manually, either by printing this document, adding your solutions by hand, and submitting a scan or photo of your hand-written solutions, or by creating a document of some kind with your solutions. Please limit your submissions to text documents, pdf documents, or images (in other words, please refrain from submitting .doc or .docx type documents). The ”theory” component is worth 40 points.
The ”practical” component of the exam is described in the readme.md document included in the same directory of the course git repo as this file. The ”practical” component is worth 60 points.
For the purposes of this exam, assume you’ve been hired to work on a software application that will be used by a store selling used records and CD’s.
You’ve been informed that the database will need to use the following relations:
· Album(albumId, artistId, title, year, publisher)
· Song(title, trackNumber, albumId, length, genre, composer)
· Artist(artistId, name, birthdate)
· Inventory(albumId, numberInStock, price, upc)
· Streaming(songTitle, albumId, provider, url)
Album describes information related to an individual record or CD. Assume that 127 characters is sufficient for the title.
Song describes the individual song tracks on an Album. Length should be in seconds. Genre will be between one and four genre labels, separated by commas.
Artist describes the artist who performed the songs on the Albums. Assume that 255 characters might be needed for their names.
Inventory describes the inventory currently available in the store. upc is the “Universal Product Code”–it’s the numerical representation of the barcode you sometimes see on labels. The number is effectively random (there’s no ordering to the system). An Integer won’t be sufficient to hold it.
Streaming describes where the songs may be found (legally) online, should patrons be interested. Album.albumId and Artist.artistId are both 32 bit integers.
1. (a) (3 points) Because the primary key for Song is (title, albumId), Postgres automatically creates a B-Tree index for it. Assume that our database is running on a system with a block size of 8192 bytes and an address length of 8 bytes. How many key values can be represented in a leaf node of the B-Tree (remember that VARCHAR fields require an additional character for null-termination)?
(b) (5 points) Why is it preferable to define the key as PRIMARY KEY (title, albumId) rather than PRIMARY KEY(albumId, title)?
2. One of your coworkers wrote a Python web application to allow customers to find out what’s in stock based on album title and year. Part of the application sets up a connection to the database, creates a cursor on that connection, then passes the cursor to the following function along with values for the year and the name of the album (taken from user input on a webpage):
def find_albums(year, album_title, cursor):
query = \
"SELECT a.title, i.numberInStock, i.price" \
"FROM Album a join Inventory i on a.albumId = i.albumId" \ "WHERE a.title ilike '7" + album_title + "7'" \
"AND a.year > 1984"
cursor.execute(query, (year,))
return [
(title, number, price) for title, number, price in cursor.fetchall() ]
(a) (3 points) Which parameter of the find_albums function is vulnerable to SQL Injection?
(b) (5 points) Explain how you would fix your coworker’s code
3. Assume your database uses Redo logging for transaction management. (Postgres actually uses a form of Redo logging called ”Write-Ahead Logging,” but you should assume the simplified version covered in class and the textbook).
After a series of sales and processing of new inventory, a janitor accidentally unplugs the computer hosting the database server. When the system is powered back up, the Inventory relation and the log exist as defined below (upc and price have been intentionally omitted for simplicity). Note that the log update statements all have the form (t, a, n) where t is the transaction id, a the album id, and n the value for a.
Log
Start T1
Inventory
albumId numberInStock
(T1, 10, 3)
(T1, 18, 5)
Start T2
(T1, 22, 1)
(T2, 15, 3) Commit T1 (T2,30,7) Start T3
10
6
Start T4
15
2
Commit T2
22
5
(T4, 15, 4)
18
4
Start T5
30
6
(T5, 15, 2)
(T3, 22, 6)
(T3, 10, 6)
Start T6
Abort T4
(T5, 22, 5)
(T6, 18, 4) Commit T3
(T6, 30, 6)
(a) (3 points) Which transactions should have their actions should be reflected in the Inventory relation?
(b) (10 points) After the system uses the log to reconstruct the Inventory relation, what will the proper values be?
For the last two questions, don’t assume any connection to the Record Store scenario.
4. (4 points) Assume relations R and S stored on disk. R is clustered and fits in 1024 blocks. S is also clustered and fits in 4096 blocks. Your physical query plan requires a set union between R and S (R∪S). You have 256 blocks of memory available for your algorithm. How many Disk I/Os are required for the operation? Please show any work involved in arriving at your answer.
5. Assume the existence of a database for the purposes of gathering Census data.
(a) (4 points) Alice creates a schema including a table named Housing. She then delegates privileges to Bob by executing GRANT SELECT ON Housing TO Bob WITH GRANT OPTION;
What happens when Bob runs GRANT SELECT ON Housing TO Carol?
(b) (3 points) Bob leaves to take a job elsewhere, so Alice runs REVOKE SELECT ON Housing FROM Bob RESTRICT;
What happens, and why?