$35
Why Pipelining?
The datapath design that we implemented for Project 1 was, in fact, grossly inefficient. By focusing on increasing throughput, a pipelined processor can get more instructions done per clock cycle. In the real world, that means higher performance, lower power draw, and most importantly, happy customers!
2 Project Requirements
In this project, you will make a pipelined processor that implements the Conte-200 ISA. There will be five stages in your pipeline:
IF - Instruction Fetch
ID/RR - Instruction Decode/Register Read
EX - Execute (ALU operations)
MEM - Memory (both reads and writes with memory)
WB - Writeback (writing to registers)
Before you move on, read Appendix A: Conte-200 Instruction Set Architecture to understand the ISA that you will be implementing. We provide you with a Brandonsim file with the some of the structure laid out.
3 Building the Pipeline
First, you will have to build the hardware to support all of your instructions. You will have to make each stage such that it can accommodate the actions of all instructions passing through it. Use the book (Ch. 5) to get an idea of what the pipeline looks like and to understand the function of each stage before you start building your circuits.
1. IF Stage
The IF stage is responsible for:
Getting the instruction from I-MEM at location PC
Updating the PC
For normal sequential execution, we would update the PC by incrementing it by 1. Notice, however, that this may not be the case when executing a SKP, CALL, RET, or GOTO instruction. Hence, you will likely need to multiplex which value is used to update the PC.
2. ID/RR Stage
The ID/RR stage is responsible for:
Decoding the instruction
Reading the appropriate registers
Resolving any CALL, RET, SKP, or GOTO instructions
Please look at Appendix A: Conte-200 Instruction Set Architecture in order to understand the instruction formats! You will have a dual ported register file (DPRF), which allows you to read from two registers and write one register all at the same time. As you will notice, the TAs have been very kind in making the DPRF and providing it to you.
Some of the instructions require both inputs into to the ALU to be values pulled from the DPRF. However, other instructions contain a value within the instruction, such as an immval20, offset20, or PCAddr24 field. You may either pass all of these possible values to the next stage (requires bigger buffer registers), or condense them into just the values needed to execute the instruction in the following cycles (requires more logic, but buffer size can be optimized).
3. EX Stage
The EX stage is responsible for:
Performing all necessary arithmetic and logic calculations
In the Execute (EX) stage, you will perform any arithmetic computations required by the instruction. This stage should host a complete ALU to perform the actual adding or NANDing as required by the instruction. For memory access instructions, this stage will perform the Base + Offset computation required to determine the memory address to access.
4. MEM Stage
The MEM stage is responsible for:
Reading from or writing a result to memory
All you need to do is to use the value calculated in the EX stage as the address for the RAM. Note that you must use the maximum address length for the RAM block - this is 24 bits. To accomplish this, simply take the lower 24 bits of the calculated address. Depending on the instruction, this stage will need to pass either the value read from memory or the value computed in EX to the WB stage.
5. WB Stage
The WB stage is responsible for:
Writing results back to the DPRF (dual-ported register file)
Depending on the instruction, you may need to write a value back to a register. To do this, your WB stage will attach to the data in and write enable inputs of the DPRF in ID/RR. Remember that the DPRF can write and read different registers in the same clock cycle, which is why WB and ID/RR can share the same register file. For instructions that do not write a register, your WB stage may not do anything at all.
4 General Advice
Subcircuits
For this project, we highly encourage using modular design and creating subcircuits when necessary. We strongly recommend using subcircuits when building your pipeline buffers as well as your forwarding unit.
Pipeline Buffers
For deciding what to pass through buffers, remember that we need to support the requirements of every possible instruction. Think of what each instruction needs to fulfill its duty, and pass a union of all those requirements. (By union we mean the mathematical union, for example say I1 needs PC and Rx, while I2 needs Rx and Ry, then you should pass PC, Rx and Ry through the buffer). You can also feel free to implement your hardware such that you re-use space in the buffer for different purposes depending on the instruction, but this is not required.
Control Signals
In the Project 1 datapath, recall that we had one main ROM that was the single source of all the control signals on the datapath. Now that we are spreading out our work across different stages of the pipeline, you have a choice of how to implement your signals!
There are two options:
You can either have a single large main ROM in ID/RR which calculates all the control signals for every stage.
OR
you can have a small(er) ROM in each stage which takes in the opcode and assert the proper signals for that operation.
Note that if you choose the first method, you will need to pass all the signals needed for later stages through the earlier stages, and in the second method, you will need to pass the instruction opcode though all the stages so that you know which signals to assert during that stage.
Stalling the Pipeline
One must stall the pipeline when an instruction cannot proceed to the next stage because a value is not yet available to an instruction. This usually happens because of a data hazard. For example, consider two instructions in the following program:
LW $t0, 5($t1)
ADDI $t0, $t0, 1
Without stalling the ADDI instruction in the ID/RR stage, it will get an out of date value for $t0 from the regfile, as the correct value for $t0 isn’t known the LW reaches the MEM stage! Therefore, we must stall. Consult the textbook (or your notes) for more information on data hazards. It is also important to note that through data forwarding, stalls can be lessened in penalty or in some cases avoided entirely. Data forwarding is discussed in the next section
To stall the pipeline, the stages preceding the stalled stage should disable writes into their buffers, i.e. they should continue to output the previous value into the next stage. The stalled stage itself will output NOOP (example, ADD $zero, $zero, $zero) instructions down the pipeline until the cause of the stall finishes.
Data Forwarding
If you really liked the busy-bit/read-pending signal forwarding described in lecture and in your book, feel free to use that. We present an alternate way to do forwarding in this section.
Forwarding is one way to increase the performance of the pipeline. This allows us to get values computed in stages beyond ID/RR back to ID/RR so that we do not have to stall the instruction. I would strongly recommend against using the busy bit/read pending bit strategy suggested in the book - this has some very nasty edge cases and requires much more logic than necessary.
I would recommend that you make a forwarding unit that implements various stock rules. The forwarding unit should take in the two register values you are reading, the output value from the EX stage, the output value from the MEM stage, and the output value from the WB stage. To forward a value from a future stage back to ID/RR, you must check to see if the destination register number from a particular stage is equal to your source register numbers in the ID/RR stage. If so, you must forward the value from that stage to your ID/RR stage.
You shouldn’t update the value of the register when you forward the value back - writes to the register file should only occur in the WB stage. Of course, forwarding cannot save you from one situation: when the destination register of a LW instruction is the source register of an instruction immediately after it. In this case, you must stall the instruction in the ID/RR stage. I will leave it to you to flesh out all of the stall rules.
Keep in mind: the zero register can never change, therefore it should not be considered for forwarding and stalling situations.
Flushing the Pipeline
For the CALL/RET/SKP/GOTO instructions, we calculate the target in the ID/RR stage of the pipeline. However, the next instruction the IF stage fetches while ID/RR is computing the target may not be the next instruction we want to execute. When this happens, we must have a hardware mechanism to “cancel” or “flush” the incorrectly-fetched instructions after we realize they are incorrect.
In implementing your flushing mechanism, we highly recommend avoiding the asynchronous clear feature of registers in Brandonsim, as this may cause timing issues. Instead, we suggest using a multiplexer to selectively send a NOOP into the buffer input.
Skip Prediction
When you encounter a SKP instruction, you should predict that the SKP is not taken. This means there should be no stalling, Fetch should simply go on and retrieve the next instruction at PC + 1.
Upon resolving the branch, the pipeline should continue normally in the case of a correct prediction, or flush the instruction following the SKP in the case of an incorrect prediction.
5 Testing
When you have constructed your pipeline, you should test it instruction by instruction to see if you have all the necessary components to ensure proper execution.
Be careful to only use the instructions listed in the appendix - there are some subtle points in having a separate instruction and data memory. Load the assembled program into both the instruction memory and the data memory and let your processor execute it. Any writes to memory will only affect the data memory.