CSC718 Operating Sys & Parallel Programming Homework 4 MPI, OpenMP, and GPU Programming Assignments Total: 100 points The homework assignment will be graded based on the following criteria: • Accuracy: 1) the solution meets specific requirements in the problem description; 2) the solution produces correct results; 2) the procedures adopted in the solution are technically sound. • Efficiency: efficiency will be one of the criteria when grading programing assignment. The solution should produce the desired results efficiently. • Effort/neatness: the solution includes excellent effort, and all relate work is shown neatly and organized well. Homework assignment feedback will be available through the DropBox folder on D2L. For all the programming assignments, you can choose any operating systems you want. I will usually provide C/C++ samples for the programming assignments. If you prefer to use other languages, e.g., Java, they are accepted too. A README.txt is required to submit any programming assignments. In the README.txt, you need to provide the following information: 1) How to compile your program? 2) How to run your program? 3) What is the average running time of your program? 4) What are the expected output results when I run your program? Zip all you source code, project files, supporting files, and README.txt and submit the all-in-one zip file together to the D2L Dropbox. If you have any questions about the homework, please let me know. (10 points) What are the strengths and limitations of GPU programing? (15 points) Among the four parallel frameworks, multithreaded programming, MPI programming, openMP programming, and GPU programming (e.g., CUDA), we discussed in the class, what will be the strategies you are going to use in general when selecting a parallel framework for your applications? (10 points) Benchmarking of a sequential program reveals that 95 percent of the execution time is spent inside functions that are amendable to parallelization. What is the maximum speed up we could expect from executing a parallel version of this program on 10 processors using Amdahl’s Law? The “acceptable” identifier must meet the following constraints:
CSC718 Operating Sys & Parallel Programming • The identifier is a six-digit combination of the numerals 0-9 1. (15 points) Write a sequential program to count the different identifiers given these constraints. 2. (20 points) Convert the sequential program to an MPI parallel program. 3. (20 points) (choose only one: A or B, not both) A. Convert the sequential program to an openMP parallel program. B. Convert the sequential program to a GPU program using a parallel framework at your choice (e.g., CUDA, OpenCL, etc.) 4. (10 points) Benchmark the three programs based on your choice in 3). Problem 1 process/ 1 thread 2 processes / 2 threads 3 processes / 3 threads 4 processes / or 4 threads Program1.c (sequential) n/a n/a n/a Program2.c (MPI) Program3.c (openMP or CUDA)