Starting from:

$25

FYS4150 - H20 Project 1 - Solved

Abstract
In this assignment we investigate and compare three different algorithms for a numerical solution to Poisson’s equation with Dirichlet boundary conditions on a domain x∈R. The efficiency of the three algorithms are compared with regards to speed and memory. We will see that the efficiency of a purely library-based general algorithm for solving a system of linear equations is greatly surpassed when we take into account the tridiagonal character of the coefficient matrix; and that the efficiency is further increased when we take into account the fixed pattern of the diagonals in the equation. We will also see that the relative error of the numerical solution has a simple analytical representation which only depends on step size h.

I have implemented the algorithms in C++ using the Armadillo library. See project1.cpp, the Makefile and a README.md at my repository https://github.com/fonstelien/FYS4150/tree/master/project1.

A: Establishing the system of linear equations
Poisson’s equation is a second degree differential equation on the form u00(x) = f(x,u(x)), and the assignment already gives us the form of the numerical approximation vi ≈ ui to this problem as

                         .                 (1)

With boundary values u0 = u(0) = α and un+1 = u(1) = β, we get the

equations

−α + 2v1 − v2 = h2f1

−v1 + 2v2 − v3 = h2f2

...

−vn−1 + 2vn − β = h2fn.

Now, with α = β = 0 we can represent this as the set of linear equations Av = b˜ where A ∈ Rn×n is tridiagonal with diagonal weights −1,2 and −1, v = {v1,v2,...,vn} ,and b˜ = {h2f1,h2f2,...,h2fn}:

 

B: General numerical solution to a tridiagonal system of linear equations
The first programming task is to find an algorithm for a solution to the general tridiagonal problem

Av  .

To solve this, let’s first subtract   times the first row from the second row in A to get rid of the first non-zero coefficient:

 b1

 0



 ...
c1

b2 − ab12c1

...
0

c2

...
··· ...

...
 0

...

...
      ˜b1              

˜b2 − ab12˜b1 .

        ...           
We will denote these new values in the central diagonal and constant vector   and , with the general terms

                                                   , for i = 1,2,...,n.

Forward substitution of the rest of the rows will then give us vn:

 b1

 0



 ...














c1

b∗2

...
0 c2

b∗3
··· ...

...

...
b∗n−1 0
 0

...

cn−1 b∗n
˜b1         

˜b∗2     



˜b∗3    ⇒ vn = ˜b∗n/b∗n = ˜b∗∗n .

...      

˜b∗n−1 

˜b∗n
Now that we have found vn, finding the rest is easy. First we find vn−1 =  , and then the remaining vi in the same manner:

 

Counting the number of FLOPs needed to solve the tridiagonal Av = b˜ problem, we see that updating each term in the forward substitution process requires one division, one multiplication and one subtraction, and that we must do this n − 1 times; that is Nfwd = 2 × 3 × (n − 1) = 6(n − 1). Solving for xn requires one operation, and backward substitution to solve for remaining n − 1 requires 3 operations each, giving Nbwd = 1+3×(n−1) ≈ 3(n−1) operations. All in all, the required number of FLOPs for a system of size n is

Ntridiag = Nfwd + Nbwd = 9(n − 1).

We should note that the above algorithm allows us to reuse the space which originally contained the description of the problem; that is A and b˜; amd also that we have found an algorithm which will solve a set of n equation in O(n) time. This means that it is objectively efficient and scales well. My C++ implementation:

 

/* General tridiagonal system solver*/

colvec solve_tridiag(rowvec a, rowvec b, rowvec c, colvec b_tilde) {

int n = b_tilde.n_rows; colvec solution = b_tilde; rowvec b_mod = b;

// Forward substitution

for (int i = 1; i < n; i++) {

double fac = a[i]/b_mod[i-1]; b_mod[i] -= fac*c[i-1]; solution[i] -= fac*solution[i-1]; }

// Solving for the last row solution[n-1] /= b_mod[n-1];

// Backward substitution gives solution for (int i = n-2; i > -1; i--)

solution[i] = (solution[i] - c[i]*solution[i+1])/b_mod[i];

return solution;

}

 

Let’s move on and compare how the numerical approximation given in equation 1 performs relative to the closed-form solution. The assignment gives us the functions f and the closed-form solution u to the differential equation:

f(x) = 100e−10x u(x) = 1 − (1 − e−10)x − e−10x.

We insert fi = f(xi) = f(hi) into our numerical approximation for various step sizes h and observe from the plotted results in Figure 1 below that the approximation vi approaches ui for decreasing h, as expected.

 

Figure 1: Plot of the numerical solution to the Poisson equation vs the closedform solution. When n > 100 we see that the numerical solution is very close to the closed-form.

C: Optimized numerical solution to the Poisson equation
Let’s now take into consideration the fixed form of the diagonals of the coefficient matrix

 2

−1





A =  0



 ...





0
−1

2

...

...

···
0

−1

...

−1

0
··· ...

...

2

−1
0 

... 



0 .



−1

2
In contrast to the general case solved in Section B, this time we know everything there is to know about A, and the solution will only be dependent on the constant vector b˜. As soon as we have found out which steps are needed to transform A to I, we can concentrate only on operations on b˜, which will yield an algorithm which is optimized both with regards to speed and memory. To find the solution, we first multiply row 2 with 2 and add row 1:

 2

  0





 0

 ...
−1

3

−1

...
0

−2

2

...
··· ...

−1

...
 0

...

...

...
     ˜b1            

2˜b2 +˜b1  .

     ˜b3       

      ...         
We now see that to get rid of a3 = −1 in the next row, we must this time multiply row 3 with 3 and add row 2, which will mean that in row 4 we must multiply with 4, and so on until we get [A|b˜] on the wanted form, and we can solve for vn:

 2

 0



 ...















 ...





0
−1

3

...

···
0

−2

4
··· ...

−3

...
... ...
n

0
 0

...

...

−(n − 1) n + 1
˜b1

 

˜b∗3

...

˜∗










  ⇒ vn = ˜b∗n/(n + 1).












 
Here we have denoted the new values in the constant vector b˜ as ,  , with the general term:

 

We now find , and again on general form

 , which gives us the solution v = b˜∗∗.

The resulting C++ implementation of this algorithm very much resembles solve_tridiag() from Section B, but it’s a bit simpler since there is less stuff to be done in the optimized Poisson algorithm, as seen in solve_posson() below:

 

/* Optimized Poisson system solver*/ colvec solve_poisson(colvec b_tilde) {

int n = b_tilde.n_rows; colvec solution = b_tilde;

// Forward substitution

for (int i = 1; i < n; i++)

solution[i] = (i+1)*solution[i] + solution[i-1];

// Solving for the last row solution[n-1] /= (n+1);

// Backward substitution gives solution for (int i = n-2; i > -1; i--)

solution[i] = (solution[i] + (i+1)*solution[i+1])/(i+2);

return solution;

}

 

As mentioned above, this algorithm is optimized both with regards to speed and memory. Memory required to solve a given problem is limited to the memory needed to describe it. Counting the FLOPs, we see that since we know how the forward substitution will unfold, each n−1 rows require one multiplication and one addition, and Nfwd = 2(n−1). Likewise for backward substitution, we only need one multiplication, one addition, and one division, and Nbwd = 3(n − 1). In total, the number of FLOPs required for a system of size n is

NPoisson = Nfwd + Nbwd = 5(n − 1).

Compared to the general case discussed in Section B, this of course is quite good, but the real benefit comes from the greatly decreased memory use. Numerical problems such as these are usually memory-bound, not CPU bound. We see that general triagonal system required three vectors a,b,c for the coefficients and a fourth b˜ for the constants which must all be streamed from main memory, or at least Level 3 cache for a system of some size. The optimized Poisson system, however, only needs the constant vector b˜, and therefore requires far fewer reads and writes for solving the system. This is the reason for the approximately 2.5-3 times speedup that we see in Table 1 below.

Table 1: CPU time needed to solve the Poisson equation. The algorithm optimized for the Poisson equation has better speedup than expected from counting the FLOPs. The reason is fewer read/write operations. Results are averaged over 5 runs.

n
General time [ms]
Optimized time [ms]
Speedup factor
100
0.103
0.042
2.452381
1000
0.659
0.277
2.379061
10000
10.179
3.385
3.007090
100000
55.765
20.122
2.771345
1000000
459.470
189.190
2.428617
D: Relative error of the numerical solution to the Poisson equation
Given the closed-form solution u(x) to the Poisson equation −u00(x) = f(x), we can calculate the relative error of our approximation vi to ui:

 .

If we calculate this value for a sample of i, we will quickly notice that for a given step size h, εi seems to be constant. To investigate this further, let’s assume for a moment that indeed it is constant, such that εi = ε, and

  .

If we install this for vi−1,vi,vi+1 in the numerical approximation 2vi−vi−1− vi+1 = h2fi, we get that the relative error is given as

 .

Now, with fi = f(ih) = 100e−10ih and ui = u(ih) = 1−(1−e−10)ih−e−10ih, we get an expression for ε that is independent of i, meaning that the relative error is indeed constant for any step size h:

                       (2)

It is also interesting to see what happens as h → 0. cosh0 = 1 and sinh0 = 0, so by applying L’Hopital’s twice we will see that the limit of the relative error is – luckily – zero:

 

From Table 2 we see that this mathematical error fits well with the errors that we find when we calculate it from the numerical solution. The difference is due to the rund-off error stemming from the limited precision in a double float. We also see that up to about n = 106, the computer error decreases, before it starts to increase, meaning that we will reach maximum accuracy in this region of n. The C++ implementation is given further below.



 
 
 

Table 2: Relative error for decreasing step size h on a log10 scale. The Computed error follows the Mathematical error from Equation 2 closely up until n = 105, after which it starts to deviate, and even increase for about n = 106. The mathematical error is calculated with the decimal module in Python.

n
Computed error
Mathematical error
10
-1.1797
-1.179698
100
-3.0880
-3.088037
1000
-5.0801
-5.080052
10000
-7.0793
-7.079268
100000
-9.0776
-9.079190
1000000
-10.1230
-11.079182
10000000
-9.0901
-13.080922
100000000
-8.1294
 
 

/* Calculates the log10 relative error */ colvec log_eps(colvec exact, colvec approximated) {

colvec eps;

// Check if exact solution vector has zeros if (min(abs(exact)) == 0.) {

cerr << "error: cannot calculate relative error; " \

<< "exact solution contains one or more zeros." << endl; return eps; }

eps = abs((approximated - exact)/exact); return log10(eps); }

E: Numerical solution by the LU decomposition
If detA 6= 0, the decomposition of A into a lower-triangular matrix L and an upper-triangular matrix U, where A,L,U ∈ Rn×n, exists, such that

 1

LU = ll2131



l41
0 1

l32 l42
0

0

1

l43
 
u12 u22

0

0
u13 u23 u33

0
 .
(3)
 ALU, and

                                             a41       a42        a43       a44

A tridiagonal matrix with diagonals −1,2,−1 is obviously invertible, and hence detA 6= 0. We will not discuss how to decompose A into LU, since that is not in the scope of the assignment, but concentrate on the computational complexity of solving the set of equations which emerges from Equation 3. In general we have that Av = b˜, into which we insert A = LU, such that

( Av = LUv = L(Uv) = b˜ ⇒ Uv = y .

Ly = b˜

We begin by solving Ly = b. Let’s first get rid of l21 in row 2, which requires one multiplication and one subtraction, so 2 FLOPs:

 1

 0



 l31 l41
0

1

l32 l42
0

0

1

l43
 0 0 0 1
      ˜b1         

˜b2 − l21˜b1 

      ˜b3     

˜b4
Next, we get rid of l31 and l32 in row 3, which requires two multiplications and two subtractions, or 4 FLOPs:

 
0

1 0

l42
0

0

1

l43
 0

0 0 1
 
The next row requires 6 FLOPs, and so on. For a problem of size n, we need

 1) FLOPs for the ”L”

operation. Then, we need to solve the next set of equations, which we do in much the same manner, except for that it requires an extra division operation in each row:

   u11          u12       u13            u14˜b∗1                       u11        u12       u13                    u14˜b∗1                                      

       0      u22       u23            u24˜b∗2               

 0         0      u33            u34˜b3∗           ∼  00  u022    u123 u024(˜b3∗ − u34˜b∗2˜b∗∗4 )/u33 



       0        0        0          1˜b∗4/u44                               0        0        0                 1˜b∗∗4

Continuing in the same fashion we see that for the ”U” operation we need

  FLOPs. Adding all together, we

get that the number of FLOPs needed for solving a set of linear equations with size n and known LU decomposition of A is

NLU = NL + NU = n(n − 1) + n2 = n(2n − 1).

The C++ implementation of the algorithm with Armadillo is very simple, but to ensure that the library takes advantage of the diagonal shape of the matrices, we need to pass this information in to arma:solve() with the arma:trimatl() and arma:trimatu() arguments:

 

/* Solver for LU decomposed system without permutation matrix */ colvec solve_LU(mat L, mat U, colvec b_tilde) {

colvec y;

y = solve(trimatl(L), b_tilde); return solve(trimatu(U), y);

}

 

Here we should note that the solution to a problem of size n can be solved in O(n2) time by LU decomposition, which is a factor n better than the straightforward row reduction runs in O(n3) time. However, as Table 3 shows, compared to the optimized Poisson algorithm discussed in Section C, the LU method is still slow. Description of the problem also requires O(n2) space, in contrast to O(n) for the specialized algorithm, and for problems of size n > 104 or maybe 105, the problem cannot at all be solved on a regular computer with the LU method due to memory restrictions.

Table 3: CPU time required for the solution by LU matrix decomposition algorithm relative to the optimized Poisson algorithm. Results are averaged over 5 runs.

n
LU
Poisson
Speedup
100
4.312
0.042
102.666667
1000
219.650
0.277
792.960289
10000
1649.400
3.385
487.267356
Discussion and conclusion
This exercise illustrates very well that great improvements both with regards to speed and memory usage can be achieved with optimized algorithms. We also see that if the problem at hand is large, a simple solution with standardized library functions may not even be possible due to resource limitations.

More products