$45
CMSC 411 Computer Architecture Project The goal of the semester project is to design and simulate a pipelined RISC CPU. Major components will be the pipelined ALU data path, the instruction decoder, hazard detection and associated forwarding/stall and cache memory controller. Submitting your Project The project is to be submitted on GL as five transactions for five files: submit cs411 part1 part1.vhdl # or submit cs411 part1 part1ce.vhdl submit cs411 part2 part2a.vhdl submit cs411 part2 part2b.vhdl submit cs411 part3 part3a.vhdl submit cs411 part3 part3b.vhdl The files you submit are not the starter files but the starter files with your additions to make it work. Note: DO NOT use "Blackboard" for turning in project or homework. Five Part Project
part1
part2a
part2b
part3a
part3b
Other Links
Getting Started Using Cadence VHDL on linux.gl.umbc.edu IF you have not already done this for HW4 and HW6: First: You should ssh to linux.gl.umbc.edu because the Cadence software is licensed to only some machines. Next: Follow instructions exactly or you figure out a variation. Be in your home directory on a cadence machine and then type commands: (Do not do this if you have worked on HW4 or HW6) cp /afs/umbc.edu/users/s/q/squire/pub/download/cs411.tar . tar -xvf cs411.tar cd vhdl tcsh # not needed on Linux source vhdl_cshrc make # or gmake if make does not work more add32_test.out make clean # saves a lot of disk quota Then do your own thing with Makefile for parts of the project You should use this directory for HW4, HW6, and the five parts of the project. Each time you log on: cd vhdl tcsh source vhdl_cshrc then work on your .vhdl files make # then fix errors and check "diff" if no errors make clean # just before you logoff, save disk quota Start the project by getting files Starter files may be copied to your vhdl subdirectory on linux.gl.umbc.edu using commands such as: cp /afs/umbc.edu/users/s/q/squire/pub/download/part1_start.vhdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/cs411_opcodes.txt . cp /afs/umbc.edu/users/s/q/squire/pub/download/add32.vhdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/bshift.vhdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1ce.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1ce.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1ce.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/divcas16.vhdl . Part1 PART1: Handle lw, sw, add, sub, or, addi, sll, srl, cmpl and nop CMPE also do and, mul with no hazards. (nop's are inserted in the part1.abs file to prevent hazards.) See cs411_opcodes.txt for detailed instruction formats and definitions. See reglist.txt for register use conventions. You should use part1_start.vhdl as a start for coding your circuit. You can do your own shift circuit or use the bshift.vhdl component. The instruction definitions and bit patterns for this semester are in cs411_opcodes.txt Quick start steps: 1) copy part1_start.vhdl to part1.vhdl then work on project in part1.vhdl 2) replace all strings "part1_start" with "part1" 3) fill in VHDL for the ALU_32 architecture to implement sub, and, or, sll, srl, cmpl, mul. See diagram. All other instructions must do a plain add. Note that EX_IR coming into ALU_32 has the instruction in "inst" and a possible schematic is alu_or.jpg and alu_or.ps Hints on coding the ALU. 4) compute the signals RegDst other input is rrop ALUSrc other input is rrop MEMWrite other input opcode for sw WB_write_enb (needs 'or' of more opcodes) Use MEM_lw:entity WORK.equal6(...) as an example for setting a mux control based on opcode. In each stage **_IR is the instruction currently in that stage. **_IR(31 downto 26) is the six bit major op code. "100011" for lw **_IR(5 downto 0) is the six bit minor op code. "100000" for add. 5) Compile, analyze, run using commands in your Makefile all: ... part1.out # add part1.out to the list part1.out: part1.vhdl add32.vhdl bshift.vhdl part1.run part1.abs ncvhdl -v93 add32.vhdl ncvhdl -v93 bshift.vhdl ncvhdl -v93 part1.vhdl # renamed and modified part1_start.vhdl ncelab -v93 part1:schematic ncsim -batch -logfile part1.out -input part1.run part1 diff -iw part1.out part1.chk should be no differences no stalls, timing should be exact The CS411 Project Part 1 uses a schematic as shown in Lecture 18 and part1.ps Check that opcodes are latest cs411_opcodes.txt For grading reasons, keep the signal names that are pipeline registers and the entity/memory names. The resulting output should be as shown in part1.chk file based on part1.abs and part1.run . Check the results in part1.out to be sure the instructions worked. You can follow each instruction through the pipeline by following the instruction register, *_IR and check the *_* signals for correct values at each stage. It is possible that your part1.out does not agree with part1.chk but you should be able to explain why. (Probably different don't care choices.) You may want to copy part1.vhdl to another file and add more 'write' statements to print out more internal signal names in order to help debug your circuit. debug.txt Submit all components and your main circuit as one plain text file using submit. Do not include add32.vhdl or bshift.vhdl, they are provided by the instructor for testing. The file must be named "part1.vhdl". DO NOT EMail. You submit on GL using: submit cs411 part1 part1.vhdl No makefiles or run files or output is to be submitted. Partial credit will be given based on number of instructions simulated correctly. The starter file part1_start.vhdl only simulates the lw instruction correctly. Computer Engineering Majors only: Create part1ce.vhdl with all requirements above. Change part1_start to part1ce everywhere. Also design and implement the multiply instruction in the ALU using your multiplier from homework 6. Use the bottom 16 bits of inA and bottom 16 bits of inB and output mresult into the expanded mux. Use part1ce.abs in place of part1.abs. Use part1ce.run in place of part1.run. You will have a value for register 12, your multiply result. Use part1ce.chk in place of part1.chk. Add ncvhdl -v93 pmul16.vhdl to your Makefile to become: ALU diagram alu_div.jpg change part1 to part1ce if not already changed sllop_and is just sllop, "and" with rrop in circuit CMPE Makefile all ... part1ce.out # add part1ce.out to the list part1ce.out: part1ce.vhdl add32.vhdl pmul16.vhdl bshift.vhdl part1ce.run part1ce.abs ncvhdl -v93 add32.vhdl ncvhdl -v93 pmul16.vhdl ncvhdl -v93 divcas16.vhdl ncvhdl -v93 bshift.vhdl ncvhdl -v93 part1ce.vhdl # renamed and modified part1_start.vhdl ncelab -v93 part1ce:schematic ncsim -batch -logfile part1ce.out -input part1ce.run part1ce diff -iw part1ce.out part1ce.chk should be no differences no stalls, timing should be exact OK to be sure it compiles before doing project. Part2a: Copy your part1.vhdl to part2a.vhdl Substitute string "part2a" for every "part1" CMPE copy your part1ce.vhdl part2a.vhdl CMPE Substitute string "part2a" for every "part1ce" cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.chk . implement data forwarding and jump and branch. CS411 does the branch and jump in the ID stage CS411 goes beyond the book by forwarding for beq. submit cs411 part2 part2a.vhdl # before working part2b You are upgrading part1.vhdl to part2a.jpg or part2a.ps Data forwarding paths must cover at least those cases covered in class (see the class handout for details). Additional insight may be gained from a comparison of the pipeline stages with and without data forwarding in forward.txt A possible implementation of forwarding is forward_mem.jpg The EX stage forwarding may use entity mux_32_3, a multiplexor with three 32-bit inputs. Note: jump and beq are followed by a delayed branch slot that contains an instruction that is always executed. jump can not cause a stall. If beq does not get data forwarding, then it can stall, and stall, and stall. Add data forwarding for beq by adding two mux's in the ID STAGE that get inputs from the MEM stage as shown in part2a.jpg or part2a.ps Implement your circuit assuming that software has correctly filled the delayed branch slot and implement the branch in the ID stage as modified for this class project. You may use the mux32_3 For grading reasons, keep the signal names that are pipeline registers and the component/memory names. Download files part2a.abs and part2a.run and part2a.chk Run the following commands to check your work. all: ... part2a.out # add part2a.out to the list part2a.out: part2a.vhdl add32.vhdl bshift.vhdl part2a.run part2a.abs ncvhdl -v93 add32.vhdl ncvhdl -v93 bshift.vhdl ncvhdl -v93 part2a.vhdl # renamed and modified part1.vhdl ncelab -v93 part2a:schematic ncsim -batch -logfile part2a.out -input part2a.run part2a diff -iw part2a.out part2a.chk CMPE copy from part1 to all future Makefile, the lines: ncvhdl -v93 pmul16.vhdl ncvhdl -v93 divcas16.vhdl Part2b: Copy your part2a.vhdl to part2b.vhdl Substitute string "part2b" for every "part2a" cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.chk . implement hazard detection and stall the minimum possible. Handle hazards. Detect hazards, prevent wrong results by stalling when necessary. A stall is implemented by holding the instruction in the ID stage and letting the EX, MEM and WB stages proceed. The stall signal prevents the IF and ID stages from getting a clock signal. A terse summary of the hazard detection is in hazard.txt A possible implementation of hazards is stall_lw.jpg The CS411 Project Part 2b uses a modified schematic handed out in class and shown in part2b.jpg and part2b.ps Download files part2b.abs and part2b.run and part2b.chk Run the following commands to check your work. all: ... part2b.out # add part2b.out to the list part2b.out: part2b.vhdl add32.vhdl bshift.vhdl part2b.run part2b.abs ncvhdl -v93 add32.vhdl ncvhdl -v93 bshift.vhdl ncvhdl -v93 part2b.vhdl # renamed and modified part2a.vhdl ncelab -v93 part2b:schematic ncsim -batch -logfile part2b.out -input part2b.run part2b CMPE copy from part2a to all future Makefile, the lines: ncvhdl -v93 pmul16.vhdl ncvhdl -v93 divcas16.vhdl diff -iw part2b.out part2b.chk Part2b needs both data forwarding and hazards (stalls) Submit all components and your main circuit as one plain text file using 'submit'. No makefiles or run files or output is to be submitted. Partial credit will be given based on number of data forwards, jump, beq, and hazard stalls handled correctly. Your circuit will not be tested with jump or branch or data addresses greater than 10 bits, in other words your instruction and data memories do not need to be bigger than 1024 words. You may not get exactly the .chk results. Timing and stalls will be graded. Points will be deducted for memory or register differences or improper stalls. Part3a: Copy your part2b.vhdl to part3a.vhdl Substitute "part3a" for every "part2b" cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.chk . Implement a cache in the instruction memory (read only) submit cs411 part3 part3a.vhdl Put the cache inside the instruction memory component (entity and architecture). (you will need to pass a few extra signals in and out) Use the existing shared memory data as the main memory. Make a miss on the instruction cache cause a three cycle stall. A cycle is 10 ns, thus a three cycle stall is 30 ns. Previous stalls from part2b must still work. The instruction cache cache holds 16 words organized as four blocks of four words. Remember vhdl memory is addressed by word address, the MIPS/SGI memory is addressed by byte address and a cache is addressed by block number. The cache schematic for the instruction cache was handed out in class and shown in. icache.jpg The cache may be implemented using behavioral VHDL, basically writing sequential code in VHDL or by connecting hardware. Possible behavioral, not required, VHDL to set up the start of a cache: (no partial credit for just putting this in your cache.) -- add in or out signals to entity instruction_memory as needed -- for example, 'clk' 'clear' 'miss' architecture behavior of instruction_memory is subtype block_type is std_logic_vector(154 downto 0); type cache_type is array (0 to 3) of block_type; signal cache : cache_type := (others=(others='0')); -- now we have a cache memory initialized to zero begin -- behavior inst_mem: process ... -- whatever, does not have to be just 'addr' variable quad_word_address : natural; -- for memory fetch variable cblock : block_type;-- the shaded block in the cache variable index : natural; -- index into cache to get a block variable word : natural; -- select a word variable my_line : line; -- for debug printout variable W0 : std_logic_vector(31 downto 0); ... begin ... index := to_integer(addr(5 downto 4)); word := to_integer(addr(3 downto 2)); cblock := cache(index); -- has valid (154), tag (153 downto 128) -- W0 (127 downto 96), W1(95 downto 64) -- W2(63 downto 32), W3 (31 downto 0) -- cblock is the shaded block in handout ... quad_word_address := to_integer(addr(13 downto 4)); W0 := memory(quad_word_address*4+0); W1 := memory(quad_word_address*4+1); -- ... -- fill in cblock with new words, then cache(index) <= cblock after 30 ns; -- 3 clock delay miss <= '1', '0' after 30 ns; -- miss is '1' for 30 ns -- this "miss" signal gets ored into part2b "stall" signal ... -- the part3a.chk file has 'inst' set to zero while 'miss' is 1 -- not required but cleans up the "diff" More information, including debug print, is in Lecture 24 and debug.txt For debugging your cache, you might find it convenient to add this 'debug' print process inside the instruction_memory architecture: debug: process -- used to print contents of I cache variable my_line : LINE; -- not part of working circuit begin wait for 9.5 ns; -- just before rising clock for I in 0 to 3 loop write(my_line, string'("line=")); write(my_line, I); write(my_line, string'(" V=")); write(my_line, cache(I)(154)); write(my_line, string'(" tag=")); hwrite(my_line, cache(I)(151 downto 128)); -- ignore top bits write(my_line, string'(" w0=")); hwrite(my_line, cache(I)(127 downto 96)); write(my_line, string'(" w1=")); hwrite(my_line, cache(I)(95 downto 64)); write(my_line, string'(" w2=")); hwrite(my_line, cache(I)(63 downto 32)); write(my_line, string'(" w3=")); hwrite(my_line, cache(I)(31 downto 0)); writeline(output, my_line); end loop; wait for 0.5 ns; -- rest of clock end process debug; And, add in front of instruction_memory architecture: use STD.textio.all; use IEEE.std_logic_textio.all; Then diff -iw part3a.out part3a_print.chk see part3a_print.chk with debug You may print out signals such as 'miss' using prtmiss from. debug.txt For grading reasons, keep the signal names that are pipeline registers and the component/memory names. Add the following commands to your Makefile. all: ... part3a.out part3a.out: part3a.vhdl part3a.run part3a.abs add32.vhdl bshift.vhdl ncvhdl -v93 add32.vhdl ncvhdl -v93 bshift.vhdl ncvhdl -v93 part3a.vhdl # renamed and modified part2b.vhdl ncelab -v93 part3a:schematic ncsim -batch -logfile part3a.out -input part3a.run part3a CMPE copy from part2b to all future Makefile, the lines: ncvhdl -v93 pmul16.vhdl ncvhdl -v93 divcas16.vhdl diff -iw part3a.out part3a.chk or diff -iw part3a.out part3a_print.chk You submit on GL using: submit cs411 part3 part3a.vhdl Part3b: Copy your part3a.vhdl to part3b.vhdl Substitute "part3b" for every "part3a" cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.chk . Implement a cache in the data memory (read/write) submit cs411 part3 part3b.vhdl Put the cache inside the data memory entity and process. Almost all the code from the instruction cache, part3a, can be copied and used inside the data memory for the data cache. (you will need to pass a few extra signals in and out) Use the existing shared memory data as the main memory. Make a miss on the data cache cause a three cycle stall of all pipeline stages. (you will need another signal similar to sclk in order to stall the EX, MEM and WB stages) A cycle is 10 ns, thus a three cycle stall is 30 ns. Previous stalls from part2b and part3a must still work. Change MEMread : std_logic := '1'; to MEMread : std_logic := '0'; for part3b. Do a write through cache for the data memory. (It must work to the point that results in main memory are correct at the end of the run and the timing is correct, partial credit for partial functionality with correct timing for the stalls.) Then test part3b.vhdl with the data cache. Add the following commands to your Makefile. all: ... part3b.out part3b.out: part3b.vhdl part3b.run part3b.abs add32.vhdl bshift.vhdl ncvhdl -v93 add32.vhdl ncvhdl -v93 bshift.vhdl ncvhdl -v93 part3b.vhdl # renamed and modified part3a.vhdl ncelab -v93 part3b:schematic ncsim -batch -logfile part3b.out -input part3b.run part3b CMPE copy from part3a to Makefile, the lines: ncvhdl -v93 pmul16.vhdl ncvhdl -v93 divcas16.vhdl diff -iw part3b.out part3b.chk or diff -iw part3b.out part3b_print.chk submit cs411 part3 part3b.vhdl Submit all components and your main circuit as one plain text file by using 'submit'. No makefiles or run files or output is to be submitted. Partial credit will be given based on correct timing and number of instructions simulated correctly, number of hazards handled correctly and proper operation of the data cache. Of course, the instruction cache must work before the data cache is graded. Files to download and other links
opcodes for this project
pipe1.vhdl - demo for PROJECT part1
pipe1.run - demo for PROJECT part1
pipe1.chk - demo for PROJECT part1
pipe1.jpg - demo for PROJECT part1
pipe2.vhdl - more demo for PROJECT part1
pipe2.run - more demo for PROJECT part1
pipe2.chk - more demo for PROJECT part1
pipe2.jpg - more demo for PROJECT part1
part1_start.vhdl - VHDL to start PROJECT part1
part1_start.chk - results of running starter file
part1.abs - memory for PROJECT part1
part1.run - control for PROJECT part1
part1.chk - results for PROJECT part1
part1lh.jpg - for PROJECT part1 left half
part1rh.jpg - for PROJECT part1 right half
part1.ps - for PROJECT part1
part2a.abs - for PROJECT part2a data forwarding
part2a.run - for PROJECT part2a data forwarding
part2a.chk - for PROJECT part2a data forwarding
forward.txt - conditions for data forwarding
part2a.ps - for PROJECT part2a
part2a.jpg - for PROJECT part2a 1 of 2
partc2.jpg - for PROJECT part2a 2 of 2
hazards and stalls part2b.abs - for PROJECT part2b
hazards and stalls part2b.run - for PROJECT part2b
hazards and stalls part2b.chk - for PROJECT part2b
hazard.txt - conditions that cause a stall
part2b.ps - for PROJECT part2b
part2b.jpg - for PROJECT part2b
part2b.ps - for PROJECT part2b
part3a.abs - for project part3a instruction cache
part3a.run - for project part3a instruction cache
part3a.chk - for project part3a instruction cache
part3a_print.chk - for project part3a instruction cache printout
part3b.abs - for project part3b instruction and data cache
part3b.run - for project part3b instruction and data cache
part3b.chk - for project part3b instruction and data cache
part3b_print.chk - for project part3b data cache printout
icache.jpg - for PROJECT part3a
mipsasm.cpp - source code for creating *.abs from *.asm for project