Starting from:

$30

CEng331- Architecture Lab Solved

1       Introduction
In this homework, you will learn about the design and implementation of a pipelined Y86-64 processor, optimizing both it and a benchmark program to maximize performance. You are allowed to make any semantics-preserving transformation to the benchmark program, or to make enhancements to the pipelined processor, or both. When you have completed the homework, you will have a keen appreciation for the interactions between code and hardware that affect the performance of your programs.

The homework is organized into three parts, each with its own handin. In Part A you will write some simple Y86-64 programs and become familiar with the Y86-64 tools. In Part B, you will extend the SEQ simulator with a new instruction. These two parts will prepare you for Part C, the heart of the homework, where you will optimize the Y86-64 benchmark program and the processor design.

2       Logistics
You will work on this homework alone.

Any clarifications and revisions to the assignment will be posted on Piazza course page.

3       Handout Instructions
1.   Start by copying the file archlab-handout.tar to a (protected) directory in which you plan to do your work.

2.   Then give the command: tar xvf archlab-handout.tar. This will cause the following files to be unpacked into the directory: README, Makefile, sim.tar, archlab.pdf, and simguide.pdf.

3.   Next, give the command tar xvf sim.tar. This will create the directory sim, which contains the Y86-64 tools. You will be doing all of your work inside this directory.

4.   Finally, change to the sim directory and build the Y86-64 tools:

unix cd sim unix make clean; make

4       Part A
You will be working in directory sim/misc in this part.

Your task is to write and simulate the following three Y86-64 programs. The required behavior of these programs is defined by the example C functions in examples.c. Be sure to put your name and ID in a comment at the beginning of each program. You can test your programs by first assemblying them with the program YAS and then running them with the instruction set simulator YIS.

In all of your Y86-64 functions, you should follow the x86-64 conventions for passing function arguments, using registers, and using the stack. This includes saving and restoring any callee-save registers that you use.

sum.ys: Iteratively sum linked list elements
Write a Y86-64 program sum.ys that iteratively sums the elements of a linked list. Your program should consist of some code that sets up the stack structure, invokes a function, and then halts. In this case, the function should be Y86-64 code for a function (sumlist) that is functionally equivalent to the C sumlist function in Figure 1. Test your program using the following three-element list:

# Sample linked list

.align 8 ele1:

.quad 0x00a

.quad ele2 ele2:

.quad 0x0b0

.quad ele3 ele3:

.quad 0xc00

.quad 0

rsum.ys: Recursively sum linked list elements
Write a Y86-64 program rsum.ys that recursively sums the elements of a linked list. This code should be similar to the code in sum.ys, except that it should use a function rsumlist that recursively sums a list of numbers, as shown with the C function rsumlist in Figure 1. Test your program using the same three-element list you used for testing list.ys.

1                /* linked list element */

2                typedef struct ELE {

3                long val;

4                struct ELE *next; 5 } *list_ptr;

6

7                              /* sum_list - Sum the elements of a linked list */

8                              long sum_list(list_ptr ls)

9                              {

10                            long val = 0;

11                            while (ls) {

12                            val += ls-val;

13                            ls = ls-next;

14                            }

15                            return val;

16                            }

17

18                            /* rsum_list - Recursive version of sum_list */

19                            long rsum_list(list_ptr ls)

20                            {

21                            if (!ls)

22                            return 0;

23                            else {

24                            long val = ls-val;

25                            long rest = rsum_list(ls-next);

26                            return val + rest;

27                            }

28                            }

29

30                            /* copy_block - Copy src to dest and return xor checksum of src */

31                            long copy_block(long *src, long *dest, long len)

32                            {

33                            long result = 0;

34                            while (len 0) {

35                            long val = *src++;

36                            *dest++ = val;

37                            result ˆ= val;

38                            len--;

39                            }

40                            return result;

41                            }

Figure 1: C versions of the Y86-64 solution functions. Seesim/misc/examples.c

copy.ys: Copy a source block to a destination block
Write a program (copy.ys) that copies a block of words from one part of memory to another (nonoverlapping area) area of memory, computing the checksum (Xor) of all the words copied.

Your program should consist of code that sets up a stack frame, invokes a function copyblock, and then halts. The function should be functionally equivalent to the C function copyblock shown in Figure Figure 1. Test your program using the following three-element source and destination blocks:

.align 8 # Source block src:

.quad 0x00a

.quad 0x0b0

.quad 0xc00

# Destination block dest:

.quad 0x111

.quad 0x222

.quad 0x333

5       Part B
You will be working in directory sim/seq in this part.

Your task in Part B is to extend the SEQ processor to support the iaddq, described in Homework problems 4.51 and 4.52. To add this instructions, you will modify the file seq-full.hcl, which implements the version of SEQ described in the CS:APP3e textbook. In addition, it contains declarations of some constants that you will need for your solution.

Your HCL file must begin with a header comment containing the following information:

•   Your name and ID.

•   A description of the computations required for the iaddq instruction. Use the descriptions of irmovq and OPq in Figure 4.18 in the CS:APP3e text as a guide.

Building and Testing Your Solution
Once you have finished modifying the seq-full.hcl file, then you will need to build a new instance of the SEQ simulator (ssim) based on this HCL file, and then test it:

•   Building a new simulator. You can use make to build a new SEQ simulator:

unix make VERSION=full

This builds a version of ssim that uses the control logic you specified in seq-full.hcl. To save typing, you can assign VERSION=full in the Makefile.

•   Testing your solution on a simple Y86-64 program. For your initial testing, we recommend running simple programs such as asumi.yo (testing iaddq) in TTY mode, comparing the results against the ISA simulation:

unix ./ssim -t ../y86-code/asumi.yo

If the ISA test fails, then you should debug your implementation by single stepping the simulator in GUI mode: unix ./ssim -g ../y86-code/asumi.yo

•   Retesting your solution using the benchmark programs. Once your simulator is able to correctly execute small programs, then you can automatically test it on the Y86-64 benchmark programs in ../y86-code: unix (cd ../y86-code; make testssim)

This will run ssim on the benchmark programs and check for correctness by comparing the resulting processor state with the state from a high-level ISA simulation. Note that none of these programs test the added instructions. You are simply making sure that your solution did not inject errors for the original instructions. See file ../y86-code/README file for more details.

•   Performing regression tests. Once you can execute the benchmark programs correctly, then you should run the extensive set of regression tests in ../ptest. To test everything except iaddq and leave: unix (cd ../ptest; make SIM=../seq/ssim)

To test your implementation of iaddq: unix (cd ../ptest; make SIM=../seq/ssim TFLAGS=-i)

For more information on the SEQ simulator refer to the handout CS:APP3e Guide to Y86-64 Processor Simulators (simguide.pdf).

6       Part C
You will be working in directory sim/pipe in this part.

The ncopy function in Figure 2 copies a len-element integer array src to a non-overlapping dst, returning a count of the number of positive integers contained in src. Figure 3 shows the baseline Y86-64 version of ncopy. The file pipe-full.hcl contains a copy of the HCL code for PIPE, along with a declaration of the constant value IIADDQ.

1    /*

2    * ncopy - copy src to dst, returning number of positive ints 3 * contained in src array.

4                */

5                word_t ncopy(word_t *src, word_t *dst, word_t len)

6                {

7                word_t count = 0;

8                word_t val;

9

10                                        while (len 0) {

11                                        val = *src++;

12                                        *dst++ = val;

13                                        if (val 0)

14                                        count++;

15                                        len--;

16                                        }

17                                        return count;

18                                        }

Figure 2: C version of the ncopy function. Seesim/pipe/ncopy.c.

Your task in Part C is to modify ncopy.ys and pipe-full.hcl with the goal of making ncopy.ys run as fast as possible.

You will be handing in two files: pipe-full.hcl and ncopy.ys. Each file should begin with a header comment with the following information:

•   Your name and ID.

•   A high-level description of your code. In each case, describe how and why you modified your code.

Coding Rules
You are free to make any modifications you wish, with the following constraints:

•   Your ncopy.ys function must work for arbitrary array sizes. You might be tempted to hardwire your solution for 64-element arrays by simply coding 64 copy instructions, but this would be a bad idea because we will be grading your solution based on its performance on arbitrary arrays.

•   Your ncopy.ys function must run correctly with YIS. By correctly, we mean that it must correctly copy the src block and return (in %rax) the correct number of positive integers.

•   The assembled version of your ncopy file must not be more than 1000 bytes long. You can check the length of any program with the ncopy function embedded using the provided script check-len.pl:

unix ./check-len.pl < ncopy.yo

1 ################################################################## 2 # ncopy.ys - Copy a src block of len words to dst.

3    # Return the number of positive words (0) contained in src.

4    #

5    # Include your name and ID here.

6    #

7    # Describe how and why you modified the baseline code.

8    #

9    ##################################################################

10  # Do not modify this portion 11 # Function prologue.

12 # %rdi = src, %rsi = dst, %rdx = len 13 ncopy:

14

15                            ##################################################################

16                            # You can modify this portion

17                            # Loop header

18                            xorq %rax,%rax     # count = 0; 19       andq %rdx,%rdx   # len <= 0?

20                                    jle Done                                                             # if so, goto Done:

21

22 Loop:                mrmovq (%rdi), %r10           # read val from src... 23       rmmovq %r10, (%rsi)            # ...and store it to dst 24      andq %r10, %r10              # val <= 0?

25                            jle Npos  # if so, goto Npos:

26                            irmovq $1, %r10

27                            addq %r10, %rax  # count++

28                            Npos:      irmovq $1, %r10

29                            subq %r10, %rdx  # len--

30                            irmovq $8, %r10

31                            addq %r10, %rdi   # src++

32                            addq %r10, %rsi    # dst++

33                            andq %rdx,%rdx   # len 0?

34                            jg Loop   # if so, goto Loop:

35                            ##################################################################

36                            # Do not modify the following section of code 37 # Function epilogue.

38                            Done:

39                            ret

40                            ##################################################################

41                            # Keep the following label at the end of your function 42 End:

Figure 3: Baseline Y86-64 version of the ncopy function. Seesim/pipe/ncopy.ys.

• Your pipe-full.hcl implementation must pass the regression tests in ../y86-code and ../ptest (without the -i flag that tests iaddq).

Other than that, you are free to implement the iaddq instruction if you think that will help. You may make any semantics preserving transformations to the ncopy.ys function, such as reordering instructions, replacing groups of instructions with single instructions, deleting some instructions, and adding other instructions. You may find it useful to read about loop unrolling in Section 5.8 of CS:APP3e.

Building and Running Your Solution
In order to test your solution, you will need to build a driver program that calls your ncopy function. We have provided you with the gen-driver.pl program that generates a driver program for arbitrary sized input arrays. For example, typing

unix make drivers

will construct the following two useful driver programs:

•   sdriver.yo: A small driver program that tests an ncopy function on small arrays with 4 elements. If your solution is correct, then this program will halt with a value of 2 in register %rax after copying the src array.

•   ldriver.yo: A large driver program that tests an ncopy function on larger arrays with 63 elements. If your solution is correct, then this program will halt with a value of 31 (0x1f) in register %rax after copying the src array.

Each time you modify your ncopy.ys program, you can rebuild the driver programs by typing

unix make drivers

Each time you modify your pipe-full.hcl file, you can rebuild the simulator by typing

unix make psim VERSION=full

If you want to rebuild the simulator and the driver programs, type

unix make VERSION=full

To test your solution in GUI mode on a small 4-element array, type

unix ./psim -g sdriver.yo

To test your solution on a larger 63-element array, type

unix ./psim -g ldriver.yo

Once your simulator correctly runs your version of ncopy.ys on these two block lengths, you will want to perform the following additional tests:

•   Testing your driver files on the ISA simulator. Make sure that your ncopy.ys function works properly with YIS:

unix make drivers unix ../misc/yis sdriver.yo

•   Testing your code on a range of block lengths with the ISA simulator. The Perl script correctness.pl generates driver files with block lengths from 0 up to some limit (default 65), plus some larger sizes. It simulates them (by default with YIS), and checks the results. It generates a report showing the status for each block length:

unix ./correctness.pl

This script generates test programs where the result count varies randomly from one run to another, and so it provides a more stringent test than the standard drivers.

If you get incorrect results for some length K, you can generate a driver file for that length that includes checking code, and where the result varies randomly:

unix ./gen-driver.pl -f ncopy.ys -n K -rc driver.ys unix make driver.yo unix ../misc/yis driver.yo

The program will end with register %rax having the following value:

0xaaaa : All tests pass.

0xbbbb : Incorrect count

0xcccc : Function ncopy is more than 1000 bytes long.

0xdddd : Some of the source data was not copied to its destination.

0xeeee : Some word just before or just after the destination region was corrupted.

•   Testing your pipeline simulator on the benchmark programs. Once your simulator is able to correctly execute sdriver.ys and ldriver.ys, you should test it against the Y86-64 benchmark programs in ../y86-code: unix (cd ../y86-code; make testpsim)

This will run psim on the benchmark programs and compare results with YIS.

•   Testing your pipeline simulator with extensive regression tests. Once you can execute the benchmark programs correctly, then you should check it with the regression tests in ../ptest. For example, if your solution implements the iaddq instruction, then

unix (cd ../ptest; make SIM=../pipe/psim TFLAGS=-i)

•   Testing your code on a range of block lengths with the pipeline simulator. Finally, you can run the same code tests on the pipeline simulator that you did earlier with the ISA simulator

unix ./correctness.pl -p


9       Hints
•   By design, both sdriver.yo and ldriver.yo are small enough to debug with in GUI mode. We find it easiest to debug in GUI mode, and suggest that you use it.

•   In order to compile the simulator with GUI mode enabled, Tcl/Tk libraries are needeed. These libraries are installed on ineks. Also, TKLIBS and TKINC variables in the Makefiles are configured for ineks. If you want to work on your own personal computer you should have Tcl/Tk libraries installed on your system and configure TKLIBS and TKINC variables in the Makefiles.

•   If you want to work on ineks by connecting them from outside of the department and use GUI of the simulator, you should first connect to login machine with the following command (XXXXXXX is your 7 digit student ID)

unix ssh -X -p 8085 eXXXXXXX@login.ceng.metu.edu.tr

and then connect to an inek again with -X parameter (Command is given for inek5, but you can work on any ineks excluding inek0.):

unix ssh -X eXXXXXXX@inek5

•   With some X servers, the “Program Code” window begins life as a closed icon when you run psim or ssim in GUI mode. Simply click on the icon to expand the window.

•   With some Microsoft Windows-based X servers, the “Memory Contents” window will not automatically resize itself. You’ll need to resize the window by hand.

•   The psim and ssim simulators terminate with a segmentation fault if you ask them to execute a file that is not a valid Y86-64 object file.

More products