Starting from:

$30

CAA- Homework 5 Solved

1           Programming 
The programming part is only for practice, you have no need to hand in this part of this homework.

But if you are interested in this part, it is free to email TAs to have some discussion.

In this homework, we are going to examine the cache effect. The tool we’ll use is rocket-chip. You can either build rocket-chip yourself or use the image provided

docker pull ntuca2020/hw5 # size ~ 8.28G docker run --name=test -it ntuca2020/hw5 cd /root ls

Folder structure for this homework:

emulator/
// link to rocket-chip emulator
|-- benchmarks/
// link to riscv-tests benchmark
|
|-- Makefile
// complie all benchmarks
|
|-- qsort/
// qsort benchmark folder
|
|-- qsort.riscv
// riscv executable
|
|-- qsort.riscv.dump
// objdump riscv executable
|
|-- mt-matmul/
// mt-matmul benchmark
|
|-- mt-matmul.riscv
// riscv executable
|
|-- mt-matmul.riscv.dump
// objdump riscv executable
|
|-- mt-matmul_4/
// for part2
|
|                ‘-- matmul.c
<-- need to be handed in
|
|-- mt-matmul_4.riscv
// riscv executable
          |                     |-- mt-matmul_4.riscv.dump // objdump riscv executable

          |             |-- ...                                                     // other benchmarks

          |           ‘-- common

          |                           |-- ...

          |                           ‘-- crt.S                                          // specify number of cores available

            |-- system/                                                            // link to rocket-chip system

          |              |-- test.scala                                            // first part SoC settings

          |            |-- HW5.scala                                                                   <-- used for matrix multiplication and need to be handed in

          |              ‘-- *.scala                                                   // other default scala settings

             |-- build.sh                                                          // build all settings

             |-- test.sh                                                            // test all settings

               |-- spike_test.sh                                                 // can test on spike first

             |-- Config1                                                        // Configuration1

|-- generated-src_Config1        // Layout, RTL, mappings, dts, etc, for Config1 |-- ...

             ‘-- Makefile                                                          // Build the configuration

Part 1: Observing cache behavior
Run test.sh and fill in cycle counts for each benchmark and each setting in the following form

Answer the following questions (answers should be based your observation on the cache configurations and the program behavior)

Why are (1) the same or different?
Why are (2) the same or different?
Why are (3) the same or different?
Why are (4) the same or different?
Why are (5) the same or different?
See the pmp.c in /root/emulator/benchmarks/pmp, what does this program want to do? And how does it make it?
Change the number of cores available in crt.S file (line 125) in /root/emulator/benchmarks/common and recompile the mt-matmul program (for this question, matrix size is 32x32).Report the cycle count of configuration17 on 1-core, configuration19 on 2-core, and configuration20 on 4-core (1%)
Describe whether the cycle count decreases linearly, why or why not.
 
dhrystone
median
multiply
qsort
rsort
towers
vvadd
Configuration 1
(4)
 
 
 
(3)
 
(1)
Configuration 2
 
 
 
 
 
 
(1)
Configuration 3
 
 
 
 
(2),(3)
 
 
Configuration 4
 
 
 
 
(2)
 
 
Configuration 5
 
 
 
 
 
 
 
Configuration 6
(4)
 
 
 
 
 
 
Configuration 7
(4)
 
 
 
 
 
 
Configuration 8
 
 
 
 
 
 
 
Configuration 9
 
 
 
 
 
 
 
Configuration 10
 
 
 
 
 
 
 
Configuration 11
 
 
 
 
 
 
 
Configuration 12
(5)
 
 
 
 
 
 
Configuration 13
(5)
 
 
 
 
 
 
Tabelle 1: Benchmark on different configurations

Part 2: Cache and matrix multiplication 
In this part, we revisit the matrix multiplication. You are asked to implement 64x64 matrix multiplication on 4-core, 128-B L1-D$, 128-B L1-I$ (no L2). The size of cache is fixed so that you can only change way-set setting in L1.

Change the dataset in /root/emulator/benchmarks/mt-matmul/mt matmul.c to the one with 64x64 (dataset2.h). The cache setting is specified in /root/emulator/system/HW5.scala and you can build the simulator using

make -j8 CONFIG=freechips.rocketchip.system.HW5Config

in /root/emulator.

The matrix multiplication program is located at /root/emulator/benchmarks/mt-matmul/matmul.c. Each thread will enter this function with its thread id and local storage (128KB) and exit once the task is finished. You may want to see the files under mt-matmul/ and common/.

The distribution of the workload and the cache behavior should be considered when you implement matrix multiplication. We will score based on the cycle count coming out from your HW5.scala and matmul.c.

Grading:

Correctness
Based on cycle countRanking: Top 5
Ranking: 6∼20
Ranking: 21∼40
Ranking: 41∼80
Ranking: > 80
Report on how you make your matrix multiplication and maybe some cache miss rate statistics using spike
Architecture and Security (0%)
Although it is important to design a high-performance architecture, it is also crucial to design a secure architecture. Read the “Spectre Attacks: Exploiting Speculative Execution” (or you may want to reference the original paper here) and answer the questions.

How to perform “exploiting conditional branch misprediction” attack?
How to perform “poisoning indirect branches” attack?
How to mitigate Spectre Attacks? (at least 3 methods)

More products