Starting from:

$25

B561 - Assignment 5 - Object-relational databases - Solved

Nested relational and semi-structured databases



For this assignment, you will need the material covered in the lectures

•    Lecture 13: Object-relational databases and queries

•    Lecture 14: Nested relational and semi-structured databases



1           Formulating Query in Object-Relational SQL
For the problems in the section, you will need to use the polymorphically defined functions and predicates that are defined in the document

SetOperationsAndPredicates.sql Functions

set union(A,B)
A ∪ B
set intersection(A,B)
A ∩ B
set difference(A,B)
A − B
add element(x,A)
{x} ∪ A
remove element(x,A)
A − {x}
make singleton(x)
{x}
choose element(A)
choose some element from A
bag union(A,B)
the bag union of A and B
bag to set(A)

Predicates
coerce the bag A to the corresponding set
is in(x,A)
x ∈ A
is not in(x,A)
x 6∈ A
is empty(A)
A = ∅
is not emptyset(A)
A 6= ∅
subset(A,B)
A ⊆ B
superset(A,B)
A ⊇ B
equal(A,B)
A = B
overlap(A,B)
A ∩ B 6= ∅
disjoint(A,B)
A ∩ B = ∅
We now turn to the problems in this section. You will need use the data provided for the Person, Company, companyLocation, worksFor, jobSkill, personSkill, and Knows relations. But before turning to the problems, we will introduce various object-relational views defined over these relations:[1]

•    The view companyHasEmployees(cname,employees) which associates with each company, identified by a cname, the set of pids of persons who work for that company.

create or replace view companyHasEmployees as

select cname, array(select pid from            worksfor w

where w.cname = c.cname order by 1) as employees

                 from         company c order by 1;

•    The view cityHasCompanies(city,companies) which associates with each city the set of cnames of companies that are located in that city.

create or replace view cityHasCompanies as

select city, array_agg(cname order by 1) as companies from        companyLocation group by city order by 1;

•    The view companyHasLocations(cname,locations) which associates with each company, identified by a cname, the set of cities in which that company is located.

create or replace view companyHasLocations as select cname, array(select city from               companyLocation cc

where c.cname = cc.cname order by 1) as locations from           company c order by 1;

•    The view knowsPersons(pid,persons) which associates with each person, identified by a pid, the set of pids of persons he or she knows.

create or replace view knowsPersons as select p.pid, array(select k.pid2 from     knows k where k.pid1 = p.pid order by pid2) as persons

                 from          person p order by 1;

•    The view isKnownByPersons(pid,persons) which associates with each person, identified by a pid, the set of pids of persons who know that person. Observe that there may be persons who are not known by any one.

create or replace view isKnownByPersons as

select distinct p.pid, array(select k.pid1

from           knows k where k.pid2 = p.pid) as persons

                 from          person p order by 1;

•    The view personHasSkills(pid,skills) which associates with each person, identified by a pid, his or her set of job skills.

create or replace view personHasSkills as

select distinct p.pid, array(select s.skill from personSkill s where s.pid = p.pid order by 1) as skills

                 from          person p order by 1;

•    The view skillOfPersons(skills,persons) which associates with each job skill the set of pids of persons who have that job skill.

create or replace view skillOfPersons as select js.skill, array(select ps.pid from    personSkill ps where ps.skill = js.skill order by pid) as persons

                   from              jobSkill js order by skill;

In the problems in this section, you are asked to formulate queries in objectrelational SQL. You should use the set operations and set predicates defined in the document SetOperationsAndPredicates.sql, the relations

Person
Company Skill worksFor

and the views

companyHasEmployees cityHasCompanies companyHasLocations knowsPersons isKnownByPersons personHasSkills skillOfPersons

However, you are not permitted to use the Knows, companyLocation, and personSkill relations in the object-relation SQL formulation of the queries. Observe that you actually don’t need these relations since they are encapsulated in these views.

Before listing the queries that you are asked to formulate, we present some examples of queries that are formulated in object-relational SQL using the assumptions stated in the previous paragraph. Your solutions need to be in the style of these examples. The goals is to maximize the utilization of the functions and predicates defined in document SetOperationsAndPredicates.sql.

Example 1 Consider the query “Find the pid of each person who knows a person who has a salary greater than 55000.”[2]

select distinct pk.pid from         knowsPersons pk, worksfor w where is in(w.pid, pk.persons) and w.salary > 55000 order by 1;

Note that the following formulation for this query is not allowed since it uses the relation Knows which is not permitted.

select distinct k.pid1 from          knows k, worksfor w where k.pid2 = w.pid and w.salary > 55000;

Example 2 Consider the query “Find the pid and name of each person p who (1) has both the AI and Programming and (2) knows at least 5 persons, and report the number of persons who know p.”[3]

select p.pid, p.pname, (select cardinality(kp.persons)

from            isKnownByPersons kp where kp.pid = p.pid) as ct_knownByPersons

from         person p where p.pid in (select ps.pid from personHasSkills ps

where subset(’"AI", "Programming"’, ps.skills)) and

cardinality((select kp.persons from knowsPersons kp where kp.pid = p.pid)) >= 5;

Example 3 Consider the query “Find the pid and name of each person along with the set of his of her skills that are not among the skills of persons who work for ‘Netflix’”.[4]

select p.pid, p.pname, set difference((select ps.skills from            personHasSkills ps where ps.pid = p.pid), array(select unnest(ps.skills) from   personHasSkills ps where is  in(ps.pid, (select employees from       companyHasEmployees where cname = ’Netflix’))))

from        person p;

1.    Formulate the following queries in object-relational SQL.

(a)    Find the cname and headquarter of each company that employs at least two persons who each have both the AI and the Programming job skills.

(b)    Find each skill that is not a job skill of any person who works for Yahoo or for Netflix.

(c)     Find the set of companies that employ at least 3 persons who each know at least five persons. (So this query returns only one object, i.e., the set of companies specified in the query.)

(d)    Find the pid and name of each person p along with the set of pids of persons who (1) know p and (2) who have the AI skill but not the Programming skill.

(e)    Find each pair (s1,s2) of different skills s1 and s2 such that the number of employees who have skill s1 and who make strictly more than 55000 is strictly less than the number of employees who have skill s2 and who make at most 55000.

(f)      (Practice Problem: not-graded).

Find each (c,p) pair where c is the cname of a company and p is the pid of a person who works for that company and who is known by all other persons who work for that company.

(g)    (Practice Problem: not-graded).

Find the pid and name of each person who has all the skills of the combined set of job skills of the highest paid persons who work for Yahoo.

2.    Find the following set of sets

{S | S ⊆ Skill ∧ |S| ≤ 3}.

I.e., this is the set consisting of each set of job skills whose size (cardinality) is at most 3.

3.    (Practice Problem: not-graded).

Reconsider Problem 2. Let

S = {S | S ⊆ Skill ∧ |S| ≤ 3}.

Find the following set of sets

{X | X ⊆ S ∧ |X| ≤ 2}.

4.    Let t be a number called a threshold. We say that a (unordered) pair of different person pids {p1,p2} co-occur with frequency at least t if there are at least t skills that are skills of both the person with pid p1 and the person with pid p2.

Write a function coOccur(t integer) that returns the (unordered) pairs {p1,p2} of person pid that co-occur with frequency at least t.

Test your coOccur function for t in the range [0,3].

5.    Let A and B be sets such that A ∪ B 6= ∅. The Jaccard index J(A,B) is defined as the quantity

 .

The Jaccard index is a frequently used measure to determine the similarity between two sets. Note that if A∩B = ∅ then J(A,B) = 0, and if A = B then J(A,B) = 1.

Let t be a number called a threshold. We assume that t is a float in the range [0,1].

Write a function JaccardSimilar(t float) that returns the set of unordered pairs {s1,s2} of different skills such that the set of persons who have skill s1 and the set of persons who have skill s2 have a Jaccard index of at least t.

Test your function JaccardSimilar for the following values for t: 0, 0.25, 0.5, 0.75, and 1.


2           Nested Relations and Semi-structured databases
Consider the lecture on Nested relational and semi-structured databases. In that lecture we considered the studentGrades nested relation and the jstudentGrades semi-structured database and we constructed these using a PostgreSQL query starting from the Enroll relation.

6. Write a PostgreSQL view courseGrades that creates the nested relation of type

(cno,gradeInfo{(grade,students{(sid)})})
This view should compute for each course, the grade information of the students enrolled in this course. In particular, for each course and for each grade, this relation stores in a set the sids students who obtained that grade in that course.

Test your view.

7.    Starting from the courseGrades view in Problem 6 solve the following queries:

(a)    Find each pair (c,S) where c is the cno of a course and S is the set of sids of students who received an ‘A’ or a ‘B’ in course c. The type of your answer relation should be (cno : text,Students : {(sid : text)}).

(b)    Find each (s,C) pairs where s is the sid of a students and C is the set of cnos of courses in which the student received an ‘A’. The type of your answer relation should be (sid : text,Courses : {(cno : text)}).

(c)    (Practice Problem: not-graded).

Find each cno c where c is a course in which all students received the same grade.

8.    Write a PostgreSQL view jcourseGrades that creates a semi-structured database which stores jsonb objects whose structure conforms with the structure of tuples as described for the courseGrades in Problem 6.

Test your view.

9.    Starting from the jcourseGrades view in Problem 8 solve the following queries. Note that the output of each of these queries is a nested relation.

(a)    Find each pair (c,s) where c is the cno of a course and s is the sid of a student who did not received an ‘A’ in course c. The type of your answer relation should be (cno:text, sid:text).

(b)    Find each pair ({c1,c2},S) where c1 and c2 are the course numbers of two different courses and S is the set of sids of students who received a ’B’ in both courses c1 and c2. The type of your answer relation should be (coursePair : {(cno : text)},Students : {(sid : text))}.


 
[1] The various order by clauses in these views are not essential: they simply aid to read the data more easily.
[2] In this example, focus on the is in predicate.
[3] In this example, focus on the set (array) construction ’{"AI", "Programming"}’ and the subset predicate. Also focus on the use of cardinality function.
[4] In this example, focus on (1) the set  difference operation and (2) the unnest operation followed by a set (array) construction.

More products