Starting from:

$29.99

CSE6220 Assignment 3 Solution

Jacobi's Method
Jacobi's method is an iterative, numerical method for solving a system of linear equations. Formally, given a full rank matrix and a vector , Jacobi's method iteratively approximates for:

Given the matrix ,

then follows the following steps till convergence or termination:
2.
3.
where is the L2-norm, and is a parameter for the termination accuracy.
A matrix is diagonally dominant, if its diagonal elements are larger than the sum of absolutes of the remaining elements in the corresponding rows, i.e., iff:

It can be shown that for diagonally dominant matrices, Jacobi's method is guaranteed to converge. The goal of this assignment is to develop a parallel algorithm for Jacobi's method.
Parallel Algorithm
Data distribution
We use a 2-dimensional grid/mesh as the communication network for this problem. We assume that the number of processors is a perfect square: , arranged into a mesh of size .
The inputs are distributed on the grid as follows:
The -by- matrix is block distributed onto the grid. The rows of are distributed onto the
will thus have the following local size:
Parallel Matrix-Vector Multiplication
Let be a -by- matrix and let and be dimensional vectors. We wish to compute . Assume that the square matrix and the vector are distributed on the processor grid as explained above. The answer will again be distributed in the same fashion as the vector . To do so, we will first ''transpose'' the vector on the grid, such that a processor with index will end up with the elements that were on processor according to the above distribution. We do this by first sending the elements from a processor to its corresponding diagonal processor , using a single MPI_Send and the sub-communicator for the row of processors. We then broadcast the received elements from along each column of the grid using a sub-communicator for the column of processors.
Now, each processor has the elements it needs for its local matrix-vector multiplication. We multiply the local vector with the local matrix and then use a parallel reduction to sum up the resulting vectors along the rows of the grid back onto the processors of the first column. The final result of the multiplication thus ends up distributed among the first column in the same way that the input was.
Parallel Jacobi
For parallelizing Jacobi's method, we distribute and as described above and use parallel matrixvector multiplication for calculating . The vectors , and , are distributed among the first column of the processor grid. The diagonal elements of need to be collected along the first column of the processor grid. We can thus update by using only local operations.
In order to detect termination, we calculate using parallel matrix-vector multiplication, and then subtract locally on the first column of processors (where the result of the multiplication and are located). Next, we calculate the L2-norm by first calculating the sum of squares locally and then performing a parallel reduction along the first column of processors. We can now determine whether to terminate or not by checking the L2-norm against the termination criteria and then broadcasting the result along rows. Now, every processor knows whether or not to continue with further iterations. In your implementation, you should not only check for the termination criteria, but also terminate after a pre-set maximum number of iterations.
Task
In this assignment, you will implement Jacobi's method using MPI. A framework is provided with this programming assignment, which declares all functions that you should implement.
Code Framework
You are required to implement your solution within the provided framework in the attached zip file. In the event that we have to update the framework, we will publish the updates and notify the class via Canvas. The framework comes with some pre-implemented unit tests. You are encouraged to add more test cases to make sure that your implementation works as expected.
First, you should implement matrix-vector multiplication and Jacobi's method sequentially. You will later use this as the comparison for measuring speedup and as reference implementation for testing your parallel code.
Implement the sequential code in the file jacobi.cpp according to the function declarations in jacobi.h. Unit tests for the sequential code are implemented in seq_tests.cpp. Add your own test cases in this file. To compile and run the tests, run make test either locally on your own machine, or in an interactive job. Do NOT run the tests on the login node of the cluster.
Finally, test the performance of your parallel implementation against your sequential implementation for various input sizes. Use the executable jacobi for this and generate the input on-the-fly via the -n and -d (difficulty) parameters. Difficulty is a value between (easiest) and and its default is . Report your achieved speedups for different input sizes and difficulties. Increasing difficulty increases the number of iterations needed for Jacobi's method to converge.
Guidelines/Rules
1. Your code should be written in C or C++ language
3. Expected output format (taken care of by the code framework) should be strictly adhered since the grading for this is automated and additional output (eg: debug print statements you've forgotten to remove) will prevent you from getting full points.
generate_input.py and check_output.py for you to test your code. We have also introduced unit testing for both your serial and parallel implementations. You are welcome to add your own unit tests to these files. Refer the README.md on how to build and execute the unit tests.
5. Your solution should execute and give correct results for very large matrices (By "very large", we mean ) and using as many nodes as allowed from the cluster. Your largest should be at least as large as .
Team
You can form a team up to three members. No matter how you decide to share the work between yourselves, each student is expected to have full knowledge of the submitted program. Only one member from each team should submit the required deliverables on Gradescope and use "Add Group Member" to add other team members.
Grading [20pt]
The highest score you can get for this assignment is 20pt.
Code and report should be submitted separately, and is worth 10 points each.
Your submission will be graded based on,
Correct submission format (including comments on the code)
Successful compilation of the submitted code in Pace-ICE cluster
Efficient parallel implementation
Successful execution in Pace-ICE cluster and correctness in final results Report
A submission made by a team will receive the same grade for all members of the team.
Submission Policies
The submission must be a zip file named assignment_3.zip for PA3: Code and a pdf file for PA3:
Report on Gradescope.
The zip file should contain the following contents:
team.txt: A text file containing the names of all team members.
[Source files]: You should use good programming style and comment your program so that it is easy to read and understand. Make sure that your implementation does not rely on any extra files or functions implemented outside of the provided source files. (Please exclude the gtest directory from your submission)
The report should contain the following contents:
The names of all team members.
Short design description of your algorithm(s)
Runtime and speedup plots by varying problem size and difficulty, when using different numbers of processors. Give observations on how your algorithm behaves on varying the parameters. Show the results of your experiments, and the conclusions along with the evidence that led you to reach the conclusions.
Explain any ideas or optimizations of your own that you might have applied and how they might have affected the performance.
The program will be graded on correctness and use of the appropriate parallel programming features.
Your archive submission of code MUST look like the following:

assignment_3.zip │───check_output.py
│───generate_input.py
│───io.h
│───jacobi.cpp
│───jacobi.h
│───main.cpp
│───Makefile
│───mpi_gtest.cpp
│───mpi_jacobi.cpp
│───mpi_jacobi.h
│───mpi_tests.cpp
│───README.md
│───seq_tests.cpp
│───utils.cpp
│───utils.h
│───team.txt

More products