Starting from:

$24.99

ENEL464 Group Assignment 2 Solution


1 Introduction
The purpose of this assignment is to implement a numerical algorithm on a computer to make the most efficient use of its caches and cores. You will find that you will get a large variation in program performance depending on how you implement your code and use compiler optimisations. A knowledge of computer architecture should help with this!
The algorithm is Jacobi relaxation. This is an iterative algorithm used to approximate differential equations, for example, Poisson’s and Laplace’s equations. Poisson’s equation can be used to find the electric potential given a specified charge distribution or temperature given a specified heat source. There are more efficient ways to solve this problem using Green’s functions and Fourier transforms but that is not the purpose of this assignment.
2 Jacobi relaxation
The discrete form of Poisson’s equation is
∇2Vi,j,k,n = fi,j,k,n, (1)
where f is the source (say, the electric charge) and V is the potential field to be determined. Poisson’s equation is a partial differential equation and thus requires boundary conditions. For the assignment Neumann boundary conditions are required. This is equivalent to enclosing the problem in an insulating box so that no current flows across the boundary.
Poisson’s equation can be solved iteratively, at each time-step n, using Jacobi relaxation, where

Here ∆ = ∆x = ∆y = ∆z is the spacing between voxels in metres, 0 ≤ i < N, 0 ≤ j < N, and 0 ≤ k < N. Voxels on the boundary (i = 0, i = N − 1, j = 0, j = N − 1, k = 0, k = N − 1) are defined such that the derivative over that boundary is zero (∂V/∂n = 0), see Fig. 1. For

x
Figure 1: Illustration of a Neumann boundary condition where the external ‘ghost’ point (red) is defined such that the gradient at the boundary is zero with respect to the internal points (blue).
example, consider the derivative in the x-direction, ∂V/∂x. This can be approximated by the central difference,
. (3)
On the boundary where i = 0, then
, (4)
and for this to be zero implies V−1,j,k = V1,j,k. Since these voltages are the same, no current flows normal to the boundary as expected for an insulating boundary. Similarly, on the other x-boundary, VN,j,k = VN−2,j,k.
3 Implementation
Your program to solve (2) must either use C or C++. Your goal is to find a fast implementation that will run on your (or an ECE lab) computer, making best use of the caches and multiple cores.
A good starting point is provided with poisson.c. You should modify this to solve the assignment. Instructions on building it are in the source code. Some suggested reading is located in the docs/ folder regarding compiler optimisations, multithreading, and profiling.
4 Testing
Test your algorithm with a single point charge in the centre of the volume, i.e.,
, (5)
where the volume is comprised of N ×N ×N voxels. Note, you algorithm must work with arbitrary source distributions.
A testing script (test.sh) has been provided along with some sample outputs at several cube sizes. This will automatically compare your output against a reference implementation.
5 Support
Only questions submitted via the ENCE464 assignment forum will be answered. Emails will be quietly ignored.
6 FAQ
• How do I find out the CPU version? Run lscpu. You can also run lshw but this needs root privileges to get the cache sizes. Note, Linux considers each thread of a multithreaded core to be a CPU.
7 Reports
Guidelines for writing a report are available at https://eng-git.canterbury.ac.nz/mph/report-guidelines/blob/master/report-guidelines. pdf.
Each report is to use a 12 point font and be no longer than five pages, including appendices. No title page please. Please use margins of at least 2cm.
8 Assessment
Your report will be marked in terms of:
• Written style. The writing should be concise technical writing.
• Presentation. The key here is consistency and clear diagrams and graphs.
• Architecture overview. This should describe your computer’s architecture (such as the cache organisation, memory size, CPU model, etc.).
• Multithreading analysis. This should discuss how your program takes advantage of multiple cores.
• Cache analysis. This should discuss how your program takes advantage of the cache.
• Profiling analysis. This should show which parts of your program take the most time.
• Optimisation analysis. This should discuss the affects of some of the compiler optimisations, such as loop unrolling, on your program.
• Overall excellence.
Each section is marked out of 5, giving a total of 40 marks. Five bonus marks will be awarded to any group who can beat our program when running on a CAE lab computer and give the correct results.
Your report should present the statistics for the time your program takes to run for the following problem sizes: N = 101,201,301,401,501,601,701,801,901. For N = 801 and N = 901, do not worry about performing many trials. If you have a really old computer, you can skip N = 901.
Your analyses should be backed up with experimental evidence, such as the output from profiling tools.
9 Code
Your fastest implementation must be an executable named poisson that is built by running make at the command line. For testing purposes, it must accept two command line flags:
• -i: the number of iterations
• -n: the edge length of the cube.
It must print out the resulting values from the middle slice of the cube in space-separated format to five decimal places. (You can just use the command line argument parsing and output formatting code from the provided poisson.c example.)

More products