CS2100 Tutorial 10-Pipelining Solved

Starting from:

$30

D1. Suppose the four stages in some 4-stage pipeline take the following timing: 2ns, 3ns, 4ns, and 2ns. Given 1000 instructions, what is the speedup (in two decimal places) of the pipelined processor compared to the non-pipelined single-cycle processor?

D2. Let’s try to understand pipeline processor by doing a detailed trace. Suppose the pipeline registers (also known as pipeline latches) store the following information:

IF / ID


ID / EX


EX / MEM

MEM / WB

MemToReg

MToR

MToR

RegWr

RegWr

MemRd

MemRd

MemWr

Branch

MemWr

RegWr

RegDst

ALUsrc

Branch

ALUop

PC+4

PC+4

BrcTgt

MemRes

OpCode

isZero?

Rs

ALUOpr1

ALURes

ALURes

Rt

ALUOpr2

ALUOpr2

Rd

Rt

DstRNum

Funct

Rd

DstRNum

Imm(16)

Imm(32)

Show the progress of the following instructions through the pipeline stages by filling in the content of pipeline registers. Note that these are the same instructions from Tutorial #5 Question 1 so that you can reuse some of the answers here.

i. 0x8df80000 # lw $24, 0($15) #Inst.Addr = 0x100 ii. 0x1023000C # beq $1, $3, 12 #Inst.Addr = 0x100 iii. 0x0285c822 # sub $25, $20, $5 #Inst.Addr = 0x100

Assume that registers 1 to 31 have been initialized to a value that is equal to 101 + its register number. i.e. [$1] = 102, [$31] = 132 etc. You can put “X” in fields that are irrelevant for that instruction. Do note that in reality, these fields are actually generated but not utilized.

Part (i) has been worked out for you.

i. 0x8df80000 # lw $24, 0($15) #Inst.Addr = 0x100

IF / ID

ID / EX

EX / MEM

MEM / WB

MToR
1
MToR
1
MToR
1
RegWr
1
RegWr
1
MemRd
1
MemRd
1
MemWr
0
Branch
0
MemWr
0
RegWr
1
RegDst
0
ALUsrc
1
Branch
0
ALUop
00
PC+4
0x104
PC+4
0x104
BrcTgt
X
MemRes
Mem(116)
OpCode
0x23
isZero?
X
Rs
$15
ALUOpr1
116
ALURes
116
ALURes
X
Rt
$24
ALUOpr2
X
ALUOpr2
X
Rd
X
Rt
$24
DstRNum
$24
Funct
X
Rd
X
DstRNum
$24
Imm(16)
0
Imm(32)
0

Tutorial Questions
1. Given the following three formulas (See Lecture #20, Section 5 Performance):

𝑁

𝐶𝑇𝑠𝑒𝑞 = ∑ 𝑇𝑘

𝑘=1

𝐶𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 = max(𝑇𝑘) + 𝑇𝑑

𝐶𝑇𝑠𝑒𝑞 × 𝐼𝑛𝑠𝑡𝑁𝑢𝑚

𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 =

𝐶𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 × (𝑁 + 𝐼𝑛𝑠𝑡𝑁𝑢𝑚 − 1)

For each of the following processor parameters, calculate CTseq, CTpipeline and

Speeduppipeline (to two decimal places) for 10 instructions and for 10 million instructions.

Stages Timing (for 5 stages, in ps)
Latency of pipeline register (in ps)
a.
300, 100, 200, 300, 100 (slow memory)
0
b.
200, 200, 200, 200, 200
40
c.
200, 200, 200, 200, 200 (ideal)
0


2. [AY2014/5 Semester 2 Exam]

Refer to the following MIPS program:

# register $s0 contains a 32-bit value

# register $s1 contains a non-zero 8-bit value # at the right most (least significant) byte add $t0, $s0, $zero #inst A add $s2, $zero, $zero #inst B lp: bne $s2, $zero, done #inst C beq $t0, $zero, done #inst D andi $t1, $t0, 0xFF #inst E bne $s1, $t1, nt #inst F addi $s2, $s2, 1 #inst G nt: srl $t0, $t0, 8 #inst H j lp #inst J done:

We assume that the register $s0 contains 0xAFAFFAFA and $s1 contains 0xFF.

Given a 5-stage MIPS pipeline processor, for each of the parts below, give the total number of cycles needed for the first iteration of the execution from instructions A to H (i.e. excluding the “j lp” instruction). Remember to include the cycles needed for instruction H to finish the WB stage. Note that the questions are independent from each other.

a. With only data forwarding mechanisms and no control hazard mechanism.

b. With data forwarding and “assume not taken” branch prediction. Note that there is no early branching.

[Recall that early branching means branch decision is made at stage 2 (Decode stage); no early branch means branch decision is made at stage 4 (Memory stage).]

c. By swapping two instructions (from Instructions A to H), we can improve the performance of early branching (with all additional forwarding paths). Give the two instructions that can be swapped. You only need to indicate the instruction letters in your answer.

Give the total number of cycles needed for the execution of the whole code in the worst case for each of the following assumptions. You may assume that the jump instruction (j) computes the address of the instruction to jump to in the MEM stage.

d. With only data forwarding mechanisms and no control hazard mechanism.

e. With data forwarding and “assume not taken” branch prediction. Note that there is no early branching.



3. [AY2017/8 Semester 2 Exam]

Refer to the MIPS code below. A and B are integer arrays whose base addresses are in $s0 and $s1 respectively. The arrays are of the same size n (number of elements). $s2 contains the value n. For this question, we will focus on the code from Instruction 1 onwards.

.data

A: .word 11, 9, 31, 2, 9, 1, 6, 10 B: .word 3, 7, 2, 12, 11, 41, 19, 35 n: .word 8 .text main: la $s0, A # $s0 is the base address of array A la $s1, B # $s1 is the base address of array B la $t0, n # $t0 is the addr of n (size of array)

# $s2 is the content of n beq $s2, $zero, End # Inst1 addi $t8, $s2, -1 # Inst2 sll $t8, $t8, 2 # Inst3 Loop: add $t0, $s0, $t8 # Inst4 add $t1, $s1, $t8 # Inst5 lw $t2, 0($t0) # Inst6 lw $t3, 0($t1) # Inst7 andi $t4, $t3, 3 # Inst8 addi $t4, $t4, -3 # Inst9 beq $t4, $zero, A1 # Inst10 add $t2, $t2, $t3 # Inst11 j A2 # Inst12 A1: addi $t2, $t2, 1 # Inst13 A2: sw $t2, 0($t0) # Inst14 addi $t8, $t8, -8 # Inst15 slt $t7, $t8, $zero # Inst16 beq $t7, $zero, Loop # Inst17 End:

Assuming a 5-stage MIPS pipeline system with forwarding and early branching, that is, the branch decision is made at the ID stage. No branch prediction is made and no delayed branching is used. For the jump (j) instruction, the computation of the target address to jump to is done at the ID stage as well.

Assume also that the first beq instruction begins at cycle 1.



a. Suppose arrays A and B now each contains 200 positive integers. What is the minimum number and maximum number of instructions executed? (Consider only the above code segment from Inst1 to Inst17.)

b. List out the instructions where some stall cycle(s) are inserted in executing that instruction in the pipeline. These include delay caused by data dependency and control hazard. You may write the instruction number InstX instead of writing out the instruction in full.

c. How many cycles does one iteration of the loop (from Inst1 to Inst17) take if the beq instruction at Inst10 branches to A1? You have to count until the WB stage of Inst17.

d. How many cycles does one iteration of the loop (from Inst1 to Inst17) take if the beq instruction at Inst10 does not branch to A1? You have to count until the WB stage of Inst17.

A blank pipeline chart is shown in the next page for your use. The Microsoft word version of it is also available on LumiNUS > Files and the CS2100 website.



1
2
3
4
5
6
7
8
9
1

0
1

1
1

2
1

3
1

4
1

5
1

6
1

7
1

8
1

9
2

0
2

1
2

2
2

3
2

4
2

5
2

6
2

7
2

8
2

9
3

0

More products

EE316-Lab7and8 Characteristics of BJT and Amplification Behavior Solved

$30

Add to cart

EE316-Lab5 AC Signals, Transformers and Bridge Rectifiers Solved

$30

Add to cart

EE316-Lab4 Implement the Digital to Analog Converter Using Op-Amp Solved

$30

Add to cart