Starting from:

$30

CA2 Project 1 Solved

        In this homework, you are going to extend your Project 1 to have memory hierarchy. In Project 1, we still assume memory read/write can be done in a cycle. However, in reality, data memory is several order slower than CPU cycle. In this project, we use a 1KB L1 cache, whose read/write latency is still the same as CPU cycle, with a larger off-chip data memory, which requires 10 cycle for a read/write operation. We will examine the correctness of your implementation by dumping the value of each register and data memory after each cycle.

 

1.1.   System Architecture
        The major difference from Project 1 is that we have an off-chip module in this project, Data_Memory. CPU has its cache within it; however, Data_Memory does not reside in CPU. Instead, it is connected to CPU in testbench. As in Figure 1, the cache controller carrys several control signals and data over CPU to the off-chip memory.

  

Figure 1 System Architecture
        For short, you have to replace the Data_Memory in your Project 1 with the cache controller, then complete the cache controller so that it can correctly handle cache hit and miss to decrease the latency of memory read/write.

             

  

Figure 2 Datapath: you need to replace Data_Memory with dcache, and connect MemStall signal to sequential circuit elements 

1.2.   Specification of Cache and Data Memory
        In this project, we still have 32-bit memory address, and the registers are also 32bit. The cache capacity is 1KB, and 32-byte (256-bit) per cache line. The cache is twoway associative, with replacement policy being LRU (least recently used). Therefore, for the 32-bit address, the cache controller will treat it as 23-bit tag, 4-bit block index, and 5-bit byte offset. Note that you do not have to handle unaligned read/write in this project. The cache applies “write back” policy to handle write hit, and “write allocate” as its write miss policy.

 
hit
miss
read
Fetch data from cache
Evict a block by LRU policy. Then bring the data from data memory into the cache.
write
Write only to the cache, and set up the dirty bit.
Evict a block by LRU policy. Then bring the data from data memory into the cache. Write to the cache, and set up the dirty bit.
 

  

Figure 3 2-way associative cache
        The access latency of off-chip Data_Memory is 10 cycles. When the enable signal of Data_Memory is turned on, the Data_Memory will start accessing the data, and send back an ack signal and data of corresponding address after 10 cycles.

 

1.3.   Instructions (same as Project 1)
funct7 
rs2 
rs1 
funct3 
rd 
opcode 
function 
0000000 
rs2 
rs1 
111 
rd 
0110011 
and 
0000000 
rs2 
rs1 
100 
rd 
0110011 
xor 
0000000 
rs2 
rs1 
001 
rd 
0110011 
sll 
0000000 
rs2 
rs1 
000 
rd 
0110011 
add 
0100000 
rs2 
rs1 
000 
rd 
0110011 
sub 
0000001 
rs2 
rs1 
000 
rd 
0110011 
mul 
imm[11:0] 
rs1 
000 
rd 
0010011 
addi 
0100000 
imm[4:0] 
rs1 
101 
rd 
0010011 
srai 
imm[11:0] 
rs1 
010 
rd 
0000011 
lw 
imm[11:5] 
rs2 
rs1 
010 
imm[4:0] 
0100011 
sw 
imm[12,10:5] 
rs2 
rs1 
000 
imm[4:1,11] 
1100011 
beq 
 

1.4.   Input / Output Format  
        Besides the modules listed above, you are also provided “testbench.v” and “instruction.txt”. After you finish your modules and CPU, you should compile all of them including “testbench.v”. A recommended compilation command would be

$ iverilog *.v –o CPU.out 

Then by default, your CPU loads “instruction.txt”, which should be placed in the same directory as CPU.out, into the instruction memory. This part is written in

“testbench.v”. You don’t have to change it. “instruction.txt” is a plain text file that consists of 32 bits (ASCII 0 or 1) per line, representing one instruction per line. For example, the first 3 lines in “instruction.txt” are

 

0000000_00000_00000_000_01000_0110011 //add  $t0,$0,$0 

000000001010_00000_000_01001_0010011  //addi $t1,$0,10 

000000001101_00000_000_01010_0010011  //addi $t2,$0,13 

 

        Note that underlines and texts after “//” (i.e. comments) are neglected. They are inserted simply for human readability. Therefore, the CPU should take

“00000000000000000000010000110011” and execute it in the first cycle, then “00000000101000000000010010010011” in the second cycle, and 

“00000000110100000000010100010011” in the third, and so on.

 

        Also, if you include unchanged “testbench.v” into the compilation, the program will generate a plain text file named “output.txt”, which dumps values of all registers and data memory at each cycle after execution. The file is self-explainable.  

 

        A difference from Project 1 is that there are two output files in this project, output.txt and cache.txt. output.txt dumps values of registers and some selected data memory at each cycle. And cache.txt records each load/store operations, and whether it is a hit or miss.

 

        Note that your output do not have to be 100% the same as the one of our reference program. We will only check the values of the last cycle in output.txt, and numbers and orders of hit and miss in cache.txt.

 

1.5.   Modules You Need to Add or Modify
1.5.1. dcache_controller

The controller determines whether the upcoming load/store is a hit or miss. Then according to the write back and write allocate policy, properly interact with CPU and Data_Memory.

1.5.2. dcache_sram

This module stores tags and data of the cache. You should add some additional codes to support 2-way associative and LRU replacement policy.

1.5.3. testbench

As in Project 1, You have to initialize reg in your pipeline registers before any instruction is executed. If you initialize your pipeline registers in testbench in Project 1, please remember to copy those codes into testbench here. Except for registers initialization, please do not change the output format ($fdisplay part) of this file.

1.5.4. Others

You can add more modules than listed above if you want. You are free to change some details as long as your CPU can perform correctly.

1.5.5. CPU

Replace the Data_Memory part in your Project 1 with dcache_controller.

 

2.   Report
 

2.1. Modules Explanation

        You should briefly explain how the modules you implement work in the report. You have to explain them in human-readable sentences. Either English or Chinese is welcome, but no Verilog. Explaining Verilog modules in Verilog is nonsense. Simply pasting your codes into the report with no or little explanation will get zero points for the report. You have to write more detail than Section 1.5.

        Take “PC.v” as an example, an acceptable report would be:

PC module reads clock signals, reset bit, start bit, and next cycle PC as input, and outputs the PC of the current cycle. This module changes its internal register “pc_o” at the positive edge of the clock signal. When the reset signal is set, PC is reset to 0. And PC will only be updated by next PC when the start bit is on.
 

        And following report will get zero points.

The inputs of PC are clk_i, rst_i, start_i, pc_i, and ouput pc_o. It works as follows:

 

always@(posedge clk_i or negedge rst_i) begin     if(rst_i) begin         pc_o <= 32'b0;     end     else begin         if(start_i)             pc_o <= pc_i;         else 

            pc_o <= pc_o;     end end 
        You can draw a FSM (Finite State Machine) diagram to explain how your cache works. You need to explain in detail for your cache controller, which is the core of this project.

2.2. Members & Teamwork

        Specify your team members and your work division. For example, who writes cache controller, who is in charge of debugging, etc. 2.3. Difficulties Encountered and Solutions in This Project

        Write down the difficulties if any you encountered in doing this project, and the final solution to them.

2.4. Development Environment

        Please specify the OS (e.g. MacOS, Windows, Ubuntu 18.04) and compiler (e.g. iverilog) or IDE (e.g. ModelSim) you use in the report, in case that we cannot reproduce the same result as the one in your computer.  

More products