$30
Task Decomposition is an important phase in constructing a Parallel Algorithm for solving any problem. Task Decomposition is to divide the computation into smaller parts, which can be executed concurrently. Different task decomposition leads to different parallelism. There are two main types of task decomposition:
1. Fine-grained decomposition: large number of small tasks.
2. Coarse-grained decomposition: small number of large tasks.
Problem definition
Given two matrices A, B and C is the result of multiplication where:
• Matrix A(m ×r) of m rows and r columns and each of its elements is denoted aij with 1 ≤ i≤ m and 1 ≤ j ≤ r.
• Matrix B(r ×n) of r rows and n columns and each of its elements is denoted bij with 1 ≤ i≤ r and 1 ≤ j ≤ n.
• Matrix C resulting from the operation of multiplication of matrices A and B, C = Computers and Artificial Intelligence
A ×B, is such that each of its elements is denoted cij with 1 ≤ i≤ m and 1 ≤ j ≤ n, and is calculated as follows:
Project Tasks
1- Design a serial program for the matrix multiplication problem.
2- Apply Foster's methodology steps for converting from serial to parallel algorithm showing the tasks communication graph.
3- Construct a parallel algorithm (Steps or Pseudo code) for matrix multiplication.
4- Implement the parallel algorithm with MPI API.
5- Implement the parallel algorithm with OpenMP API.
6- Evaluate the performance of each implementation in 4 and 5 using the following measures:
• Speedup
• Efficiency
• Scalability
7- Document test results for your programs (in 4 and 5) as follows:
i. Table for run time of serial program on different dimensions for matrices (up to 1024 x 1024 increasing with steps of 100).
ii. Table for run time of parallel program using different number of processes/threads (from 1 to 100) on different dimensions for matrices (up to 1024 x 1024 increasing with steps of 100).
iii. Table for speedups using different number of processes/threads (from 1 to 100) on different dimensions for matrices (up to 1024 x 1024 increasing with steps of 100). iv. Table for efficiencies using different number of processes/threads (from 1 to 100) on different dimensions for matrices (up to 1024 x 1024 increasing with steps of 100).
v. Analysis for scalability (weakly or strongly).
8- Graph a comparison between the two solutions performance (with MPI and with OpenMP) and conclude what you found.