Starting from:

$30

EECS484- Homework 5 Solved

Location:

1017 Dow (Special considerations)

1670 BBB

STAMPS

(will announce the exact room assignment by Monday noon)

 

The exam will be closed book and closed notes. A single sheet is allowed per student (both sides can be used)
 

Requirements:

a.      Completely hand-written

b.      Your name should be on both sides

c.       Cannot make any copies or share with others until after the exam

d.      Cannot use other students’ sheets to make yours

 


Calculators Allowed
 

You are allowed to bring a SAT-like calculator. The use of any other electronics (cell phones, ipads, etc) is strictly prohibited and will be considered a violation of the honor code. You cannot share a calculator with other students during the exam.

 



Exam Format
 

There will be some multiple choice questions on the final exam. These questions will be answered on a scantron form and will be auto graded. Therefore, it is your responsibility to bring your own No. 2 pencil (and an eraser) for the scantron questions AS WELL AS  a pen for other questions. We will not be providing pens, pencils, erasers, etc.

 



The exam will cover: 

 

Everything we’ve done after the midterm but also includes indexing and B+ tree (lectures, discussion, h/w, projects)

 

 

 

1.1 Which of the following statements about static hashing is (are) true:

 

i)  number of primary bucket pages fixed

ii) allocated sequentially iii) never de-allocated iv) overflow pages if needed

v) use a directory of pointers to buckets vi) works in round

 

A.       i), ii), iii)

B.       i), ii), iii), iv)

C.      iii)

D.      iii), iv)

E.       iii), iv), v)

F.       iii), iv), vi)

1.1 Which of the following statements about static hashing is (are) true:

 

i)  number of primary bucket pages fixed

ii) allocated sequentially iii) never de-allocated iv) overflow pages if needed

v)    use a directory of pointers to buckets Extendible hashing

vi)   works in round Linear hashing

 

A.       i), ii), iii)

B.       i), ii), iii), iv)

C.      iii)

D.      iii), iv)

E.       iii), iv), v)

F.       iii), iv), vi)

1.1   Which of the following statements about static hashing is (are) true:


1.2   Which of the following statements is true about sorting?

 

A.      We need sorting for the first step to bulk-loading B+ tree, eliminating duplicates records, sort-merge join algorithm, the first step for grace hash join, etc.

B.      Both using more buffers and using replacement sort at each step can reduce the number of passes, thus making the external sorting faster.

C.     All of blocked I/O, double buffering and B+ tree indexing can make each pass cheaper, thus making external sorting faster. D.  None of above is true.

1.2 Which of the following statements is true about sorting?

 

A.      We need sorting for the first step to bulk-loading B+ tree, eliminating duplicates records, sort-merge join algorithm, the first step for grace hash join, etc.

B.      Both using more buffers and using replacement sort at each step can reduce the number of passes, thus making the external sorting faster.

C.     All of blocked I/O, double buffering and B+ tree indexing can make each pass cheaper, thus making external sorting faster. D.  None of above is true.

1.2   Which of the following statements is true about sorting?

                     


1.3   Consider: A relation with 1M tuples, 100 tuples on a page and 500 (key, rid) pairs on a page

 

A.      If the selectivity is 1%, the I/O cost by using non-clustered B+ tree index and the I/ O cost by using non-clustered B+ tree index and sorted RIDs are similar.

B.      If the selectivity is 10%, the I/O cost by using non-clustered B+ tree index and the I/ O cost by using non-clustered B+ tree index and sorted RIDs are similar.

C.     If non-clustered B+ tree index is used, no matter the selectivity is 1% or 10%, the I/ O cost is similar.

D.     When making choice of B+ tree index access plan, we consider the index selectivity and clustering. For hash index, the consideration is very different.

 

1.3 Consider: A relation with 1M tuples, 100 tuples on a page and 500 (key, rid) pairs on a page

                   


1.3 Consider: A relation with 1M tuples, 100 tuples on a page and 500 (key, rid) pairs on a page

 

A.      If the selection is 1%, the I/O cost by using non-clustered B+ tree index and the I/O cost by using non-clustered B+ tree index and sorted RIDs are similar.

B.      If the selection is 10%, the I/O cost by using non-clustered B+ tree index and the I/ O cost by using non-clustered B+ tree index and sorted RIDs are similar.

C.     If non-clustered B+ tree index is used, no matter the selection is 1% or 10%, the I/O cost is similar.

D.     When making choice of B+ tree index access plan, we consider the index selectivity and clustering. For hash index, the consideration is very different.

 

1.4 Which of the following statements is true about join algorithms?

 

A.      Sort-merge join can be optimized to 3(|R| + |S|) I/Os when smaller relation |S| <= B^2, while hash join costs 3(|R| + |S|) I/Os too when smaller relation |R| <= B^2.

B.      If the relation size differ greatly, or partitioning is skewed, hash join is better than sortmerge join.

C.      Both BNL join and INL join with B+tree index work when we have inequality conditions (e.g. R.rname < S.sname), and BNL is usually better.

D.      Hash join does not work if there are equalities over several attributes while BNL join,

INL join, and sort-merge join work.

1.4 Which of the following statements is true about join algorithms?

 

A.      Sort-merge join can be optimized to 3(|R| + |S|) I/Os when smaller relation |S| <= B^2, while hash join costs 3(|R| + |S|) I/Os too when smaller relation |R| <= B^2.

B.      If the relation size differ greatly, or partitioning is skewed, hash join is better than sortmerge join.

C.      Both BNL join and INL join with B+tree index work when we have inequality conditions (e.g. R.rname < S.sname), and BNL is usually better.

D.      Hash join does not work if there are equalities over several attributes while BNL join,

INL join, and sort-merge join work.

1.4 Which of the following statements is true about join algorithms?



1.4 Which of the following statements is true about join algorithms?

                 


1.4 Which of the following statements is true about join algorithms?

                   


1.4 Which of the following statements is true about join algorithms?

                  


1.4   Which of the following statements is true about join algorithms?

                 


1.5   Which of the following RA equivalence is wrong?

 

A.      σP1⋀P2 … ⋀Pn (R) ≡ σP1(σP2( … σPn(R)))  (cascading σ)

 

B.      ∏a1(R) ≡ ∏a1(∏a2(…∏ak (R)…)), ai+1 belongs to ai (cascading ∏)

 

C.     ΠA(σc (R)) ≡ σc (ΠA(R)) (if selection only involves attributes in the set A)

 

D.     σP (R X S) ≡ σP (R) X S (if p is only on R) 

 

1.5 Which of the following RA equivalence is wrong?

 

A.      σP1⋀P2 … ⋀Pn (R) ≡ σP1(σP2( … σPn(R)))  (cascading σ)

 

B.      ∏a1(R) ≡ ∏a1(∏a2(…∏ak (R)…)), ai+1 belongs to ai (cascading ∏)

 

C.     ΠA(σc (R)) ≡ σc (ΠA(R)) (if selection only involves attributes in the set A)

 

D.     σP (R X S) ≡ σP (R) X S (if p is only on R) 

 



1.          Is       it       recoverable? Is  it       in      ACA?

2.          Would        this   deadlock   under        2PL?

3.          Is       it       conflict      serializable?      

 

      



1.          Is       it       recoverable? Is  it       in      ACA? NO       NO

2.          Would        this   deadlock   under        2PL? NO 3.  Is       it       conflict serializable?       YES

      

4. Describe two ways to address or avoid deadlock among transactions.

 

 

 

 

4. Describe two ways to address or avoid deadlock among transactions.

 

Any two of:

1)  Simply abort and restart transactions that take too long

2)  Check the waits-for graph.  If there’s a cycle, abort one

3)  Wait-die:  Assign each tx a priority.  If Ti requests a lock held by Tj, then Ti only waits if it has a higher priority.  Otherwise Ti aborts 4) Wound-wait:  Assign each tx a priority.  If Ti has a higher priority, abort Tj. Otherwise Ti waits.

 

 

4. Describe two ways to address or avoid deadlock among transactions.

 



4.   Describe two ways to address or avoid deadlock among transactions.



5.   Imagine that an evil genie has told you that your database can be NO-FORCE or STEAL, but not both. Your entire database can easily fit into RAM. Which feature (NO-FORCE or STEAL) will likely yield the better performing system? Why?

 

 

 

 

 

 

 

 

5.  Imagine that an evil genie has told you that your database can be NO-FORCE or STEAL, but not both. Your entire database can easily fit into RAM. Which feature (NO-FORCE or STEAL) will likely yield the better performing system? Why?

 

You should choose NO-FORCE.

If your database can fit into memory, there will be little or no pressure on the buffer pool and thus so no need for STEALing.  NO-FORCE, on the other hand, will still be useful to make sure users’ transactions enjoy low latency.  

 

 

 

 

Why       is      wri%ng     to     the   log    faster        than simply      flushing    dirty pages       upon      commit?           Give 2       reasons.  

     

     

Why       is      wri%ng     to     the   log    faster        than simply      flushing    dirty pages       upon      commit?           Give 2       reasons.  

     

     

1)  Updates    to     log    are   smaller     than full   pages,      even when        considering      the  before/aOer    images    

2)  Log    addi%ons are   sequen%al      

Why is      wri%ng     to     the   log    faster        than simply      flushing    dirty pages       upon         commit?           Give 2       reasons.  



LSN    LOG   

1.       start      T1      

2.       start      T2      

3.       update: T1       writes to        P1      

4.       update: T2       writes to        P1      

5.       start      T3      

6.       update: T3       writes to        P2      

7.       update: T1       writes to        P1      

8.       update  T2       writes to        P2      

9.       T1           commit        

10.    T1           end    

11.    begin     checkpoint  

12.    end        checkpoint  

13.    update: T3       writes to        P1      

14.    update: T2       writes to        P3      

a)      Draw        the   transac%on      table         and  dirty page table         before      the   crash       


LSN    LOG   

1.       start      T1      

2.       start      T2      

3.       update: T1       writes to        P1      

4.       update: T2       writes to        P1      

5.       start      T3      

6.       update: T3       writes to        P2      

7.       update: T1       writes to        P1      

8.       update  T2       writes to        P2      

9.       T1           commit        

10.    T1           end    

11.    begin     checkpoint  

12.    end        checkpoint  

13.    update: T3       writes to        P1      

14.    update: T2       writes to        P3      

a)      Draw        the   transac%on      table         and  dirty page table         before      the   crash       

TX_ID
LSN
Status
T3
15
U
T2
14
U
Page ID
LSN
 
P1
3
 
P2
6
 
P3
14
 
LSN    LOG   

1.       start      T1      

2.       start      T2      

3.       update: T1       writes to        P1      

4.       update: T2       writes to        P1      

5.       start      T3      

6.       update: T3       writes to        P2      

7.       update: T1       writes to        P1      

8.       update  T2       writes to        P2      

9.       T1           commit        

10.    T1           end    

11.    begin     checkpoint  

12.    end        checkpoint  

13.    update: T3       writes to        P1      

14.    update: T2       writes to        P3      

b)      Draw        the   transac%on      table         and  dirty page table         aOer the   analysis    phase


LSN    LOG   

1.       start      T1      

2.       start      T2      

3.       update: T1       writes to        P1      

4.       update: T2       writes to        P1      

5.       start      T3      

6.       update: T3       writes to        P2      

7.       update: T1       writes to        P1      

8.       update  T2       writes to        P2      

9.       T1           commit        

10.    T1           end    

11.    begin     checkpoint  

12.    end        checkpoint  

13.    update: T3       writes to        P1      

14.    update: T2       writes to        P3      

b)       Draw        the   transac%on      table         and  dirty page table         aOer the   analysis    phase      

TX_ID
LSN
Status
T3
6
U
T2
8
U
Page ID
LSN
 
P1
3
 
P2
6
 

LSN  LOG   

1.       start   T1      

2.       start   T2      

3.       update:         T1       writes to        P1      

4.       update:         T2       writes to        P1      

5.       start   T3      

6.       update:         T3       writes to        P2      

7.       update:         T1       writes to        P1      

8.       update           T2       writes to        P2      

9.       T1       commit        

10.    T1       end    

11.    begin  checkpoint  

12.    end     checkpoint  

13.    update:         T3       writes to        P1       14.  update: T2       writes to        P3      

15.    update:         T3       writes to        P2      

16.    CRASH

 

c)        At     what LSN   will   the   ANALYSIS  phase        start processing         the   log?          

                                                     

d)       At     what LSN   will   the   REDO        phase        start processing         the   log? 

                                            

e)       Imagine    that  you   are   building    a       new  database  system      which        implements       ARIES-style        recovery.  Your tes%ng      reveals      that  recovery   is      taking        far    too       long. What         is      one   well-established        way  you   can   speed        up     the   ANALYSIS       phase?     

                                            

f)         Again,       you’re       building    a       new  database  system.     You   find  a       bug   in      the       buffer       manager   that  means       dirty pages        are   rarely        flushed.    Which       recovery   phase        ---     ANALYSIS, REDO,       OR    UNDO       ---     will   likely take  a       very       long  %me to     execute?  

                                                                       

         

LSN  LOG   

1.       start   T1      

2.       start   T2      

3.       update:         T1       writes to        P1      

4.       update:         T2       writes to        P1      

5.       start   T3      

6.       update:         T3       writes to        P2      

7.       update:         T1       writes to        P1      

8.       update           T2       writes to        P2      

9.       T1       commit        

10.    T1       end    

11.    begin  checkpoint  

12.    end     checkpoint  

13.    update:         T3       writes to        P1       14.  update: T2       writes to        P3      

15.    update:         T3       writes to        P2      

16.    CRASH

 

c)        At     what LSN   will   the   ANALYSIS  phase        start processing         the   log?          

                                                               LSN   11    

d)       At     what LSN   will   the   REDO        phase        start processing         the   log? 

                                                               LSN   3      

e)       Imagine    that  you   are   building    a       new  database  system      which        implements       ARIES-style        recovery.  Your tes%ng      reveals      that  recovery   is      taking        far    too       long. What         is      one   well-established        way  you   can   speed        up     the   ANALYSIS       phase?     

                                                               Checkpoint        more         frequently        

f)         Again,       you’re       building    a       new  database  system.     You   find  a       bug   in      the       buffer       manager   that  means       dirty pages        are   rarely        flushed.    Which       recovery   phase        ---     ANALYSIS, REDO,       OR    UNDO       ---     will   likely take  a       very       long  %me to     execute?  

                                                               REDO       

         

Ques%on 5:  Join  
Suppose we have two relations R and S we want to join on the condition R.a = S.a. R consists of 100 pages and S consists of 400 pages. Suppose we have 5 Buffer pages. Each page of R contains 75 tuples and each page of S contains 50 tuples. (using sort merge with no replacement sort optimization)

 

a)        What is the cost of sorting each relation?

 

b)        What is the cost of the join in the best case?

 

c)        What is the cost of the join in the worst case? When would this occur?

  

d)        What is one improvement to sort-merge join that would reduce the overall cost of the algorithm? How much is saved?

  

Ques%on 5:  Join  
Suppose we have two relations R and S we want to join on the condition R.a = S.a. R consists of 100 pages and S consists of 400 pages. Suppose we have 5 Buffer pages. Each page of R contains 75 tuples and each page of S contains 50 tuples. (using sort merge with no replacement sort optimization)

a)        What is the cost of sorting each relation?

 Sorting R= 2*(100) * (ceil(log4(100/5)) + 1) = 800 page I/Os

 Sorting S = 2*(400) * (ceil(log4(400/5)) + 1) = 4000 page I/Os

b)        What is the cost of the join in the best case?

 Best case would be only having to read through R and S once. 

         100 + 400 = 500 page I/Os

c)        What is the cost of the join in the worst case? When would this occur?

 Worst case would be when R.a and S.a are all equal.

 100*400 = 40,000 Page I/Os 

d)        What is one improvement to sort-merge join that would reduce the overall cost of the algorithm? How much is saved?

 Start the merge during the last pass of sorting rather than R/W the sorted relations. 

         Savings = 2*100+2*400 = 1000 I/Os (saves cost of last pass)

Thursday, April 20

7:00 - 9:00pm

 

 

 

ALL THE BEST FOR THE EXAM 
HW5 solution

T1
T2
T3
T4
 
 
R(Z)
 
 
 
W(Z)
 
 
R(X)
 
 
 
W(X)
 
 
 
R(Y)
 
 
R(X)
 
 
 
 
 
 
R(Y)
 
 
 
COMMIT
COMMIT
 
 
 
 
COMMIT
 
 
Question    1  

(A)  [4 points] Is this schedule recoverable? Explain.

(B)  [4 points] Does this schedule avoid cascading aborts? Explain.

(C)  [4 points] Could it be generated by non-strict 2PL? Explain.

(D)  [4 points] Could it be generated by strict 2PL? Explain.

 

Question    1  
A)     No. T1 read value from T2 and commits before T2 commits

B)     No. Not even recoverable. A schedule avoids cascading aborts if the transactions only read the changes of committed transactions.

C)    Yes. Every Tx can release its locks at the moment it completes its operation.

D)    No. T2 will not release its exclusive lock on X until it commits. So T1 could not have acquire shared lock.

Question    2  
Determine if the following schedules are: conflict serializable, and/or serializable. If it is, provide a possible equivalent serial schedule.

 


Question    2A

T1
T2
T3
R(A)
 
 
 
W(A)

COMMIT
 
 
 
R(A)
 
 
W(A)

COMMIT
W(A)

COMMIT
 
 
•          conflict serializable: no, T3 T1 cycle

•          serializable: no (Considering the final writes, only T2T3T1, T3T2T1 are possible. However both modify A so R(A) from T1 will be

affected)

 

Question    2B

T1
T2
T3
R(A)
 
 
 
 
R(B) R(A)
 
W(A)
 
R(B)
 
 
 
W(B)

COMMIT
 
W(B)

COMMIT
 
 
 
 
COMMIT
•          conflict serializable: no, T2 T1 cycle

•          serializable: no (final write on B is from T1, so T1 must be the last. But T2 will modify B that T1 read)

 

Question    2C

T1
T2
T3
 
 
R(A)
R(B)

W(B)

COMMIT
 
 
 
 
W(B)
 
R(B) W(A)
 
 
 
W(A)

COMMIT
 
ABORT
 
•          conflict serializable: Yes. Following the definition, ignore aborted transaction, and other Tx has no cycle in precedence graph T1 T3

•          serializable: yes T1 T3

 

Question    2D    

T1
T2
T3
T4
 
 
 
R(A)
R(B) R(D)
 
 
 
 
 
 
W(B)
 
 
R(B) W(C)
 
R(C)

W(B)

COMMIT
 
 
 
 
R(C)

W(B)

COMMIT
 
 
 
 
COMMIT
 
 
 
 
R(C)

W(C)

W(B)

COMMIT
•          conflict serializable: no, T3 T4 cycle, T1 T4 cycle

•          serializable: yes, [T1T2T3]T4

 


T1
T2
T3
T4
 
 
 
R(A)
R(B) R(D)
 
 
 
 
 
 
W(B)
 
 
R(B) W(C)
 
R(C)

W(B)

COMMIT
 
 
 
 
R(C)

W(B)

COMMIT
 
 
 
 
COMMIT
 
 
 
 
R(C)

W(C)

W(B)

COMMIT
Question    3  

•          B will not deadlock

•          C will not

•          D will deadlock, on T2, T3, T4


T1
T2
T3
T4
 
 
 
R(A)
R(B) R(D)
 
 
 
 
 
 
W(B)
 
 
R(B) W(C)
 
R(C)

W(B)

COMMIT
 
 
 
 
R(C)

W(B)

COMMIT
 
 
 
 
COMMIT
 
 
 
 
R(C)

W(C)

W(B)

COMMIT
Question    3  

•          B will not deadlock

•          C will not

•          D will deadlock, on T2, T3, T4


Question    4  
Consider the following example of operations on a DBMS:

•          Transaction 1 modifies Page 1 •       Transaction 1 modifies Page 3

•          Transaction 2 modifies Page 2

•          Crash!!!!

Assume that:

The memory can only accommodate 2 pages

Avoid writing pages back to disk if not necessary ( do not force )

No logging or any other kind of book keeping technique is used here

 

(A)  At timestamp 3, What does the DBMS have to do to make sure there is space to modify Page 2?

(B)  After timestamp 4, the DBMS came back without running any recovery protocol. Which property in ACID does it violates now? Explain it in 1~2 sentences.

Question    4  
Now consider another scenario:

•          Transaction 1 modifies Page 1

•          Transaction 1 modifies Page 3

•          Transaction 1 commits •   Transaction 2 modifies Page 2

•          Crash!!!!

Again, assume that:

The memory can accommodate 4 pages

Avoid writing pages back to disk if not necessary ( do not force )

No logging or any other kind of book keeping technique is used here

(A)  After timestamp 5., the DBMS came back without running any recovery protocol. Which property in ACID does it violates now? Explain it in 1~2 sentences.

(B)  How can we restore this property without using logging? Is there any tradeoff?

 

Question    4  
(A)  DBMS has to write a uncommitted page to disk(steal)

(B)  Atomicity, only a part of Tx1 persists 

(C) Durability, committed data loss

(D) Force write when a Tx commits. Higher cost.

 

 

LSN
LOG
1
T1 writes to P1
2
T2 writes to P2
3
T3 writes to P3
4
T1 writes to P1
5
T2 writes to P2
6
T1 commit
7
T1 end
8
begin checkpoint
9
end checkpoint (along with the dirty page table and transaction table)
10
T3 writes to P1
11
T3 commit
12
T3 end
13
T2 writes to P3
14
CRASH
Question    5  

Consider the following log file, which was found on disk, and the ARIES recovery protocol:

(A)  Describe the dirty page table at the end of the ANALYSIS phase of recovery.

(B)  Describe the transaction table at the end of the ANALYSIS phase of recovery

(C)  Which transactions are “loser” transactions

(D)  Are there any new compensation log records (CLRs) written during the REDO phase of recovery? Explain.

(E)  How many CLR records are added in the UNDO phase? Briefly describe them.

 

Question    5  
  Page ID
recLSN
P1
1
P2
2
P3
3
(A) 

 

 

 

 

XID
Last LSN
Status
T2
13
U
 

(B)  T2

(C)  No, REDO is applied only to logged actions, so there is no additional logging required. (A CLR is added only during the UNDO phase if there is a need to undo an update.)

(D)  3 - 3 CLRs for undoing T2’s update to P3, P2(twice)

More products