Starting from:

$29.99

CS3210 Assignment 2-CUDA Implementation of Game of Invasions Solution


Learning Outcomes
This assignment lets you explore the intricacies of building a parallel application using NVIDIA CUDA for a problem you are already familiar with.
1 Problem Scenario
In this assignment, you will re-implement Game of Invasions described in Assignment 1 in CUDA. email”OhNoNotagain!”2prof&benedict&brian
1.1 Simulation Rules
The simulation rules are exactly the same as in Assignment 1. Refer to Assignment 1 write-up for further details.
1.2 Inputs and Outputs
Your program should accept eight command-line arguments.
• The third to eighth arguments specify the grid and block sizes that the program will run with in the following order: GRID_X, GRID_Y, GRID_Z, BLOCK_X, BLOCK_Y, BLOCK_Z.
The formats and constraints of the input and output files are the same as in Assignment 1, with one exception:
please remove all prints to stdout and stderr in your submission.
Sample Program Execution
$ ./goi_cuda sample_input.in output.out 1 2 3 4 5 6
Explanation of Command-line Arguments
1
1.3 Starter Code
We provide some utility functions and example usage code to export world states for use in the GOI visualizer. The code structure is shown in Table 1.
Files/Folders Description
check_zip.sh Script to check that your archive follows the required structure.
exporter.cu exporter.h These files contain the library we wrote to export world states to a format that the GOI visualizer can understand.
As usual, feel free to use, ignore or delete these files as long as your program follows specifications. You will not receive credit for modifications to these files.
export_example.cu This file shows example usage of the exporter module in a ”CUDA” program.
Makefile Contains one recipe example to build export_example.
README Contains information about how to use the exporter module with CUDA. Feel free to delete after reading, like in a spy movie.
sb/ This folder contains code for a string builder library imported to implement the exporter module.
The same rules apply as in exporter.cu.
sample_inputs/ sample_outputs/ These folders contain sample input and output files for you to test with, as in assignment 1.
Table 1: Code Structure
1.3.1 GOI Visualizer
The same visualizer application from assignment 1 can be used for assignment 2, and can be found (in the same place) here. As usual, using or even downloading the visualizer is not necessary at all for completion of this assignment.
If you experience compatibility issues or have any feedback/suggestions, email Benedict (benedictkhoo.mw@u.nus.edu).
1.4 Your Task
Your task is to implement a parallel version of Game of Invasions using CUDA. Your parallel implementation should be bug-free, make reasonable effort to minimize memory leaks (i.e. do not forget to free memory you malloc) and should run faster than your OpenMP implementation for a large enough world size (otherwise there is no point using CUDA). You will also need to conduct some performance measurements and write a report.
Your parallel implementations should give the same result (output) as your OpenMP implemen-
tation (on the machines on the SoC compute cluster), and execute faster for a large enough world
size.
1.5 Optimizing your Solution
While correctness is important in a parallel program, improving performance is the reason we parallelize. After implementing a working CUDA program, you should investigate various modifications of the code and how they affect different parallel performance metrics (e.g. speedup). These modifications include, but are not limited to:
• Different block and grid sizes. Your implementation should work on varying grid and block sizes.
• Different data/task distribution methods.
Distinguish any alternative implementations you include in your submission clearly from the final parallel implementations to be graded.
2 Admin Issues
2.1 Running your Programs
During development you might use your personal computer (if you have a CUDA-capable GPU) or any of the 14 machines (with one or two GPGPUs each) from the SoC Compute Cluster reserved for CS3210. Their hostnames are: xgpc0-7 and xgpd0-7.
Your code should successfully compile and run on the SoC Compute Cluster nodes mentioned above. Run your correctness tests and performance measurements on these machines.
2.2 Bonus

2.3 FAQ
If there are any questions regarding the assignment, please post on the LumiNUS forum or email Benedict (benedictkhoo.mw@u.nus.edu) or Brian (e0310531@u.nus.edu).
Useful resources for Assignment 2:
• CUDA Programming Guide
• CUDA nvprof Guide
2.4 Submission Instructions
Your CUDA implementation should:
• Make reasonable effort to minimize memory leaks (i.e. have a corresponding free for each malloc)
Your report should include:
• A brief description of your program’s design and implementation assumptions, if any.
• A brief explanation of the parallel strategy you used in your CUDA implementation, e.g. synchronisation, work distribution, memory usage and layout, etc.
• Any special consideration or implementation detail that you consider non-trivial.
• Details on how to reproduce your results, e.g. inputs, execution time measurement, etc.
• Present and explain graphs showing the execution time and speedup (y-axis) variation with world size, and grid size (x-axis) (fixed input size). Show measurements with graphs showing how the block size/grid size (task granularity) impact on the execution time and speedup.
• Compare your CUDA implementation performance with your OpenMP implementation performance.
Use a world size of 3000×3000 and 10,000 steps.
• A description of the modifications made to your code (from your baseline correct CUDA implementation) and an analysis of their impact on performance.
Tips:

• There could be many variables that contribute to performance, and studying every combination could be highly impractical and time-consuming. A report that investigates two or three variables sensibly, with explanations as to why these variables might affect performance (and are worth investigating) is better than a report that blindly tries every combination of variables. You will be graded more on the quality of your investigations, not so much on the quantity of things tried or even whether your hypothesis turned out to be correct.
There is no minimum or maximum page length for the report. Be comprehensive, yet concise.

Submit one zip archive named with your student number(s) (A0123456Z.zip - if you worked by yourself, or A0123456Z_A0173456T.zip - if you worked with another student) containing the following files and folders. Only one archive for both students must be submitted if you worked with another student. Do not add any additional folder structure.
1. Your C/C++ code for goi_cuda.cu and any source or header files needed to build them.
2. Makefile with a recipe named build that builds your implementation exactly as you intend it to be graded for correctness/performance. Also remember to remove unnecessary print/export statements if you think they will affect correctness/performance. The executable name produced should be goi_cuda. Be sure to include everything in your submission needed such that when make build is run on a SoC Compute Cluster machine, goi_cuda is built without issue.
3. Report in PDF format (A0123456Z_A0173456T_report.pdf or A0123456Z_report.pdf).
4. A folder, named testcases, containing any additional test cases (input and output) that you might have used.
5. An optional folder, named scripts, containing any additional scripts you used to measure the execution time and extract data for your report.
Once you have the zip file, you will be able to check it by doing:
$ chmod +x ./check_zip.sh
$ ./check_zip.sh A0123456Z_A0173456T.zip (replace with your zip file name)
During execution, the script prints if the checks have been successfully conducted, and which checks failed. Successfully passing the checks ensures that we can grade your assignment. You will receive 0.5% simply for having a valid submission file!

More products