Starting from:

$30

CS2100 Tutorial 10-Pipelining  Solved




D1. Suppose the four stages in some 4-stage pipeline take the following timing: 2ns, 3ns, 4ns, and 2ns. Given 1000 instructions, what is the speedup (in two decimal places) of the pipelined processor compared to the non-pipelined single-cycle processor?

 

D2. Let’s try to understand pipeline processor by doing a detailed trace. Suppose the pipeline registers (also known as pipeline latches) store the following information:

IF / ID 
   

 
ID / EX 
   

 
EX / MEM    

 
MEM / WB 
 
 
MemToReg 
 
MToR 
 
MToR 
 
RegWr 
 
RegWr 
 
MemRd 
 
MemRd 
 
MemWr 
 
Branch 
 
MemWr 
 
RegWr 
 
RegDst 
 
ALUsrc 
 
Branch 
 
ALUop 
 
PC+4 
 
PC+4 
 
BrcTgt 
 
MemRes 
 
OpCode 
 
isZero? 
 
Rs 
 
ALUOpr1 
 
ALURes 
 
ALURes 
 
Rt 
 
ALUOpr2 
 
ALUOpr2 
 
Rd 
 
Rt 
 
DstRNum 
 
Funct 
 
Rd 
 
DstRNum 
 
Imm(16) 
 
Imm(32) 
 
 

 Show the progress of the following instructions through the pipeline stages by filling in the content of pipeline registers. Note that these are the same instructions from Tutorial #5 Question 1 so that you can reuse some of the answers here.  

                    i. 0x8df80000 # lw $24, 0($15)  #Inst.Addr = 0x100       ii. 0x1023000C # beq $1, $3, 12  #Inst.Addr = 0x100    iii.  0x0285c822 # sub $25, $20, $5 #Inst.Addr = 0x100 

 Assume that registers 1 to 31 have been initialized to a value that is equal to 101 + its register number. i.e. [$1] = 102, [$31] = 132 etc. You can put “X” in fields that are irrelevant for that instruction. Do note that in reality, these fields are actually generated but not utilized.  

              Part (i) has been worked out for you.

                       i. 0x8df80000 # lw $24, 0($15)      #Inst.Addr = 0x100 

IF / ID 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
ID / EX 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
EX / MEM 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
MEM / WB 
 
 
MToR 

MToR 

MToR 

RegWr 

RegWr 

MemRd 

MemRd 

MemWr 

Branch 

MemWr 

RegWr 

RegDst 

ALUsrc 

Branch 

ALUop 
00 
PC+4 
0x104 
PC+4 
0x104 
BrcTgt 

MemRes 
Mem(116) 
OpCode 
0x23 
isZero? 

Rs 
$15 
ALUOpr1 
116 
ALURes 
116 
ALURes 

Rt 
$24 
ALUOpr2 

ALUOpr2 

Rd 

Rt 
$24 
DstRNum 
$24 
Funct 

Rd 

DstRNum 
$24 
Imm(16) 

Imm(32) 

 

Tutorial Questions
1.      Given the following three formulas (See Lecture #20, Section 5 Performance):

𝑁

                                                                                              𝐶𝑇𝑠𝑒𝑞 = ∑        𝑇𝑘

𝑘=1

 

𝐶𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 = max(𝑇𝑘) + 𝑇𝑑

 

𝐶𝑇𝑠𝑒𝑞 × 𝐼𝑛𝑠𝑡𝑁𝑢𝑚

𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 =   

𝐶𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 × (𝑁 + 𝐼𝑛𝑠𝑡𝑁𝑢𝑚 − 1)

 

             For each of the following processor parameters, calculate CTseq, CTpipeline and

Speeduppipeline (to two decimal places) for 10 instructions and for 10 million instructions.

 
Stages Timing (for 5 stages, in ps)
Latency of pipeline register (in ps)
a.
300, 100, 200, 300, 100  (slow memory)
0
b.
200, 200, 200, 200, 200   
40
c.
200, 200, 200, 200, 200  (ideal)
0
        

2.      [AY2014/5 Semester 2 Exam]

             Refer to the following MIPS program:

     # register $s0 contains a 32-bit value 

     # register $s1 contains a non-zero 8-bit value      #       at the right most (least significant) byte      add  $t0, $s0, $zero     #inst A      add  $s2, $zero, $zero   #inst B  lp:  bne  $s2, $zero, done    #inst C       beq  $t0, $zero, done    #inst D      andi $t1, $t0, 0xFF      #inst E       bne  $s1, $t1, nt        #inst F       addi $s2, $s2, 1         #inst G  nt:  srl  $t0, $t0, 8         #inst H        j    lp                  #inst J  done: 
 

We assume that the register $s0 contains 0xAFAFFAFA and $s1 contains 0xFF.  

Given a 5-stage MIPS pipeline processor, for each of the parts below, give the total number of cycles needed for the first iteration of the execution from instructions A to H (i.e. excluding the “j lp” instruction). Remember to include the cycles needed for instruction H to finish the WB stage. Note that the questions are independent from each other.

a.    With only data forwarding mechanisms and no control hazard mechanism.  

 

b.    With data forwarding and “assume not taken” branch prediction. Note that there is no early branching.  

[Recall that early branching means branch decision is made at stage 2 (Decode stage); no early branch means branch decision is made at stage 4 (Memory stage).]

 

c.    By swapping two instructions (from Instructions A to H), we can improve the performance of early branching (with all additional forwarding paths). Give the two instructions that can be swapped. You only need to indicate the instruction letters in your answer.  

 

Give the total number of cycles needed for the execution of the whole code in the worst case for each of the following assumptions. You may assume that the jump instruction (j) computes the address of the instruction to jump to in the MEM stage.

d.    With only data forwarding mechanisms and no control hazard mechanism.

e.    With data forwarding and “assume not taken” branch prediction. Note that there is no early branching.

        

3.  [AY2017/8 Semester 2 Exam]

Refer to the MIPS code below. A and B are integer arrays whose base addresses are in $s0 and $s1 respectively. The arrays are of the same size n (number of elements). $s2 contains the value n. For this question, we will focus on the code from Instruction 1 onwards.

 .data  

A: .word 11, 9, 31, 2, 9, 1, 6, 10  B: .word 3, 7, 2, 12, 11, 41, 19, 35 n: .word 8 .text main: la   $s0, A     # $s0 is the base address of array A       la   $s1, B     # $s1 is the base address of array B       la   $t0, n     # $t0 is the addr of n (size of array) 

                       # $s2 is the content of n        beq  $s2, $zero, End   # Inst1       addi $t8, $s2, -1      # Inst2       sll  $t8, $t8, 2       # Inst3 Loop: add  $t0, $s0, $t8     # Inst4       add  $t1, $s1, $t8     # Inst5       lw   $t2, 0($t0)       # Inst6       lw   $t3, 0($t1)       # Inst7       andi $t4, $t3, 3       # Inst8       addi $t4, $t4, -3      # Inst9       beq  $t4, $zero, A1    # Inst10       add  $t2, $t2, $t3     # Inst11       j    A2                # Inst12 A1:   addi $t2, $t2, 1       # Inst13 A2:   sw   $t2, 0($t0)       # Inst14       addi $t8, $t8, -8      # Inst15       slt  $t7, $t8, $zero   # Inst16       beq  $t7, $zero, Loop  # Inst17 End: 
 

 Assuming a 5-stage MIPS pipeline system with forwarding and early branching, that is, the branch decision is made at the ID stage. No branch prediction is made and no delayed branching is used. For the jump (j) instruction, the computation of the target address to jump to is done at the ID stage as well.  

              Assume also that the first beq instruction begins at cycle 1.

        

 

a.     Suppose arrays A and B now each contains 200 positive integers. What is the minimum number and maximum number of instructions executed? (Consider only the above code segment from Inst1 to Inst17.)

b.     List out the instructions where some stall cycle(s) are inserted in executing that instruction in the pipeline. These include delay caused by data dependency and control hazard. You may write the instruction number InstX instead of writing out the instruction in full.  

c.      How many cycles does one iteration of the loop (from Inst1 to Inst17) take if the beq instruction at Inst10 branches to A1? You have to count until the WB stage of Inst17.  

d.     How many cycles does one iteration of the loop (from Inst1 to Inst17) take if the beq instruction at Inst10 does not branch to A1? You have to count until the WB stage of Inst17.  

 

A blank pipeline chart is shown in the next page for your use. The Microsoft word version of it is also available on LumiNUS > Files and the CS2100 website.

 

        


 









1


1


1


1


1


1


1


1


1


1


2


2


2


2


2


2


2


2


2


2


3


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

More products