$29.99
Advanced Computer Architecture
Lab 2: Dependency Tracking and Forwarding for
This is an individual assignment. You can discuss this assignment with other classmates but you should code your assignment individually. You are NOT allowed to see the code of (or show your code to) other students.
OBJECTIVE
The objective of the second programming assignment is to do a performance evaluation of a pipelined machine. In particular, you will equip the code to check data dependencies in a
pipeline, implement a forwarding path, and extend your pipeline to be an “N-wide” superscalar machine. The second part of the assignment deals with integrating a branch predictor with the superscalar pipeline.
PROBLEM DESCRIPTION
The five-stage pipeline that we discussed in class is shown in the Figure above, and it consists of Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory (MEM) and Writeback (WB) stages. For this assignment we will assume that the register file employs a write on falling edge, which means it is possible to write to the register file in the first half of the clock cycle, and read from the register file in the second half of the clock cycle. Therefore, there is no need to do data forwarding from the WB stage to the ID stage.
We will use a trace driven simulator that is strictly meant for doing timing simulation. To keep the framework simple, we will not be doing any functional simulation -- which means the trace records that is fed to the pipelined machine does not contain any data values, and your pipeline will not track any data values (in Registers, Memory, or PC) either. Furthermore, the traces only contain the committed path instructions. The purpose of our simulation is to figure out how many clock cycles it takes to execute the given instruction stream, for a variety of different machines such as with or without forwarding and varying superscalar width (N).
You will be provided with a trace reader, as well as a sample pipeline machine that simulates an N-wide superscalar machine, without any dependence tracking. Your job is to do the following:
A.1 (2 points) Implement data dependency tracking and related stalls for a scalar machine (N=1)
A.3 (2 points for ECE6100/CS6290, 3 points for ECE4100/CS4290) Implement Data Forwarding (from both MEM and EX). Note that an existence of a forwarding path does not necessarily mean that you can pass the value from a later instruction to an earlier instruction. For example, for a Load instruction, you would not have the value available until the MEM stage, so you cannot forward the value of Load from EX stage to the ID stage for an instruction dependent on this Load instruction. We will test A.3 for N=2, although your program should work for any reasonable value of N.
Part B: Extend your pipeline to support Branch Prediction. For this part, we will assume that the machine has an idealized Branch Target Buffer (BTB), which identifies the conditional branches (CBR) as soon as the instruction is fetched, and also provides the correct target address. Your job is to consult direction prediction on instruction fetch. If the prediction is correct, the fetch unit continues to fetch subsequent instructions otherwise the fetch unit stalls until the branch resolves.
B.1 (2 points for ECE6100/CS6290, 3 points for ECE4100/CS4290) Implement an
“AlwaysTaken” predictor, and integrate it with your pipeline. We will evaluate your machine from A.3 (with N=2)
B.2 (2 points) [Required for ECE6100/CS6290. Optional for ECE4100/CS4290] Implement a gshare predictor, shown in the following figure, with HistoryLength = 12 (we will assume that you use the bottom 12 bits of the Instruction Address to XOR with the Global History Register, GHR) and a PHT consisting of 2-bit counters, initialized to the weakly taken state (10).
Figure: gshare branch predictor
WHAT IS PROVIDED:
The simulator directory consists of sources and traces (note that these are different traces than Lab1, as we need to do dependency tracking). The src directory contains the source code that you will modify. The key files are as follows:
1. sim.cpp and trace.h
The sim.cpp file is responsible for opening the trace, initialization, instantiating and executing the pipeline till completion. The trace.h file serves similar purpose as in lab 1; however it has a few additional fields needed for this assignment.
2. pipeline.cpp/.h
These files contain the Pipeline class, internal structures and methods implementing the pipeline functionality. The simulator is a series of latches storing the operands on completion of the pipeline stages. The functions pipe_cycle_IF() … pipe_cycle_WB() need to be implemented by the students for providing pipeline functionality and handling of dependencies. Any additional structures required by students can be added.
3. bpred.cpp/.h
These files contain the branch predictor interfaces. The interface contains only two functions, one for getting the predicted value and another for updating the predictor. Students need to implement these functions as per the branch prediction policy.
How to run the simulator:
1) Download the tarball and type “tar -xvzf Lab_2.tar.gz”
2) Type “cd Lab_2/src”
3) Type “make”
4) Type “./sim -h” for command line options (pipewidth, bpredpolicy etc.)
5) “./sim ../traces/mcf.ptr.gz” (to test the current pipeline for N=1)
6) “./sim -pipewidth 2 ../traces/mcf.ptr.gz” (to test the current pipeline for N=2)
For implementing Part A, your job is to modify the pipe_cycle_?? functions in pipeline.cpp.
For implementing Part B, you will need to add the data structures in bpred.h, functionality in bpred.cpp, and the pipe_check_bpred function in pipeline.cpp. You will also need to implement the stall of fetch on branch mispredictions and release the stall when the branch resolves (when the branch is in the MEM stage, however you can fetch only in the next cycle)
WHAT to SUBMIT (on Canvas):
For Part A
- src_A.tar.gz (i.e., tarball of your src directory, no traces please)
- report_A.txt [rename report.txt to report_A.txt before uploading]. How to create the tarball: cd Lab_2
tar -cvzf src_A.tar.gz src
For Part B
- src_B.tar.gz
- report_B.txt [rename report.txt to report_B.txt before uploading].
Note for ECE4100/CS4290 students: You are not required to do B.2. However, you can still choose to do it for Extra Credit worth 2 points.
REFERENCE MACHINE:
We will use ece-linlabsrv01.ece.gatech.edu as the reference machine for this course. (https://help.ece.gatech.edu/labs/names).
Before submitting your code ensure that your code compiles on this machine and generates the desired output (without any extra printf statements or calls to pipe_print_state). Please follow the submission instructions. If you do not follow the submission file names, you will not receive the full credit.
NOTE: It is impractical for us to support other platforms such as Mac, Windows, Ubuntu, etc.
FAQ:
1. How should I get started?
We strongly recommend that you read through the header files first to get a sense of what data structures are available to you and what you must implement. The header files should provide documentation for everything you need to know to complete this lab.
2. Why aren't instruction addresses unique in the trace?
During Trace generation, the complex x86 instructions having multiple operations at a particular address were converted to simpler operations having the types provided in the trace header file. These simpler instructions would then have the same instruction address. The instruction address is thus not a unique identifier for an operation. (op_id is supposed to be used for that)
3. How do I implement Data Forwarding for operations with conditional codes or operations belonging to the OTHER op_type with a destination register?
Handling the data forwarding for the above is similar to the handling for ALU operations. Load instructions having cc_write can only forward their conditional codes in the MEM stage.
4. What are the *_needed fields in the trace structure?
These are binary 1 - 0 values, informing whether src1_reg, src2_reg and dest_reg fields are valid in an operation read from the trace file. If these are ‘1’ the corresponding values in the src1_reg, src2_reg, and dest_reg fields represent the register being read from or written to.
5. What are cc_read and cc_write? Consider the following operation:
if (condition operation)
The condition operation writes to a condition 'status' register. Such an operation would have cc_write set to 1. The following branch instruction based on the condition would have the cc_read set to 1.
cc_read and cc_write are 1 / 0 values. Only branches would perform a cc_read. The reading takes place in the Instruction Decode stage, similar to the source register values (Refer to: http://en.wikipedia.org/wiki/Status_register)
6. What is pipe_print_state()?
7. How can I test my code?
Reference outputs for gcc.ptr.gz and sml.ptr.gz for all five parts of the lab are provided in the ref directory as refoutput_gcc.pdf and refoutput_sml.pdf, respectively. You can run the script runtests.sh located in the scripts directory to compare your implementation’s output with these reference outputs.
8. I get an error when I try to execute the runall.sh script or the runtests.sh script. Check that you have execute permissions on both scripts. Add execute permissions to these files using the command:
chmod +x runall.sh runtests.sh