$30
Section 1. Handwritten (24%)
28 (12%)
29 (12%)
-
Section 2. Programming (86%)
Pipelined CPU (74%)
Report (12%)
-
Section 5. Supplementary
Homework introduction video
-
Section 6. Practice
There are some handwritten questions for everyone to practice on chp4 about midterm exams.
1 Handwritten
Each 2%.
2 Programming (86%, including: Pipelined CPU and report)
Pipelined CPU (74%)
In this section, we are going to implement a pipeline cpu.
The provided instruction memory is as follows:
Signal
I/O
Width
Functionality
i clk
Input
1
Clock signal
i rst n
Input
1
Active low asynchronous reset
i valid
Input
1
Signal that tells pc-address from cpu is ready
i addr
Input
64
64-bits address from cpu
o valid
Output
1
Valid when instruction is ready
o inst
Output
32
32-bits instruction to cpu
And the provided data memory is as follows:
Signal
I/O
Width
Functionality
i clk
Input
1
Clock signal
i rst n
Input
1
Active low asynchronous reset
i data
Input
64
64-bits data that will be stored
i w addr
Input
64
Write to target 64-bits address
i r addr
Input
64
Read from target 64-bits address
i MemRead
Input
1
One cycle signal and set current mode to reading
i MemWrite
Input
1
One cycle signal and set current mode to writing
o valid
Output
1
One cycle signal telling data is ready (used when ld happens)
o data
Output
64
64-bits data from data memory (used when ld happens)
The test environment is as follows:
We will only test the instructions highlighted in the red box, as the figures below
And one more instruction to be implemented is
i inst
Function
Description
32’b11111111111111111111111111111111
Stop
Stop and set o finish to 1
All the environment settings are the same as HW3 except the rule of accessing data_memory.v and instruction_memory.v, and the interface of modules are changed this time. See the supplementary.pdf for more information.
You may want to reference the diagram of pipelined cpu from textbook.
To make sure that pipeline is actually implemented in your design, we are going to use an open source synthesis tool Yosys to check the timing of the critical path in your design. We’ll also use the FreePDK 45 nm process standard cell library provided here.
You can either build Yosys yourself or use the image provided
docker pull ntuca2020/hw4 # size ~ 1.28G docker run --name=test -it ntuca2020/hw4 cd /root ls
Folder structure for this homework:
HW4/
|-- testcases/
| |-- generate.s
| ‘-- generate.cpp
|-- codes/
| |-- cpu.v
| |-- data_memory.v // provided data memory
| ‘-- instruction_memory.v // provided instruction memory
|-- testbench.v
|-- Makefile
|-- cpu.ys // synthesis command
‘-- stdcells.lib // FreePDK 45 nm standard cell library
Specify all the used modules in the cpu.ys file, then run
make // Compile
make test // Test all test cases
make time // Show the timing and area used in your design
Information about your design is shown when running make time:
ABC: WireLoad = "none" Gates = 13123 ( 14.8 %) Cap = 3.2 ff ( 1.9 %)
Area = 17519.56 ( 87.9 %) Delay = 1091.13 ps ( 5.1 %)
You can optimize the cpu for the 3 workloads (code address range, data address range, etc), but it should not affect other test cases.
Grading:
Correctness check (10%)10 testcases, each 2% for correctness check
Required area and frequency (inverse of delay) (32%)Area < 25,000 µm2, and frequency > 10MHz (5%)
Area < 25,000 µm2, and frequency > 100MHz (5%)
Area < 25,000 µm2, and frequency > 200MHz (5%)
Area < 25,000 µm2, and frequency > 500MHz (5%) – Area < 25,000 µm2, and frequency > 800MHz (4%)
Area < 25,000 µm2, and frequency > 1000MHz (3%) – Area < 25,000 µm2, and frequency > 1200MHz (3%)
Area < 25,000 µm2, and frequency > 1500MHz (2%)
Required time (clock cycle * operating frequency) to finish workloads from last 3 testcases. (32%)Workload1 < 100,000 ns (5%)
Workload2 < 150,000 ns (5%)
Workload3 < 200,000 ns (5%)
Workload1 < 10,000 ns (5%)
Workload2 < 15,000 ns (4%)
Workload3 < 20,000 ns (3%)
Workload1 < 5,000 ns, and Workload2 < 20,000 ns, and Workload3 < 15,000 ns (3%)
Workload1 < 3,500 ns, and Workload2 < 9,000 ns, and Workload3 < 10,000 ns (2%)