$30
Graph Partitioning – Load Balancing on Parallel Architectures
New discoveries in applications often require the solution of discretized partial differential equations, nonlinear optimizations, eigenvalue computations, and management of massive data sets, such as in accelerator design, PDEconstrained optimization, or groundwater flow modeling. In such simulations, continuous infinite-dimensional mathematical models are approximated with finite-dimensional models. To obtain the required accuracy and to resolve the underlying physics, the finite-dimensional models are often large, thus requiring fast parallel computers and/or advanced algorithmic solutions. A typical application might require a sequence of numerical optimization problems to be solved. Furthermore eigenvalues and eigenvectors have to be computed, and/or nonlinear and linear systems of equations have to be solved. Porting such an application to a parallel computer would require the computational work to be equally distributed on the processors. In an adaptive simulation framework, the work distribution can also change during the computation. The work and data need to be redistributed among the compute cores to enable it to be solved quickly. Several computational tasks need to be scheduled to maximize the utilization of the processors and to reduce the idle time processors spend waiting for data or for synchronization.
A typical model in computational science and engineering is expressed using the language of continuous mathematics, such as PDEs and linear algebra, but techniques from discrete or combinatorial mathematics also play an important role in solving these models efficiently. Several discrete combinatorial problems and data structures, such as graph and hypergraph partitioning, supernodes and elimination trees, vertex and edge reordering, vertex and edge coloring, and bipartite graph matching, arise in these contexts. As an example, parallel graph partitioning tools can be used to ease the task of distributing the computational workload across the processors.
Over the past decade, parallel computers have been used with great success in many scientific simulations. While differing in their numerical methods and details of implementation, most applications successfully parallelized to date are static applications. Their data structures and memory usage do not change during the course of the computation. Their interprocessor communication patterns are predictable and nonvarying, and their processor workloads are predictable and roughly constant throughout the simulation. However, increasing use of dynamic simulation techniques is creating new challenges for developers of parallel software. For example, adaptive finite element methods refine localized regions of the mesh and adjust the order of the approximation on individual elements to obtain a desired accuracy in the numerical solution. As a result, memory must be allocated dynamically to allow creation of new elements or degrees of freedom. Communication patterns can vary as refinement creates new element neighbors. In addition, localized refinement can cause severe processor load imbalance as elemental and processor work loads change throughout a simulation. Particle simulations and N-body simulations are other examples of dynamic applications. In particle simulations, scalable parallel performance depends on a good assignment of particles to processors; grouping physically close particles within a single processor reduces interprocessor communication. Data structures and communication patterns change as particles and surfaces move. Repartitioning of the particles or surfaces is needed to maintain geometric locality of objects within processors. This scientific question can be solved with parallel graph partitioning algorithms and parallel coloring tools, e.g., based on discrete graphs and hypergraphs techniques.
High-performance graph partitioning libraries have been a very important part of scientific and engineering computing for years, and their importance continues to grow as microprocessor architectures become more complex and software libraries become better designed to integrate easily within applications. Despite the fact that there are various science
1
Figure 1: The solution of an application in computational science and engineering on a parallel computer requires scientific computing algorithms (the first row of the figure), and involves graph-theoretical research such as graph partitioning (the second row) and HPC tasks (the third row).
and engineering applications, the underlying algorithms typically have remarkable similarities, especially those algorithms that are most challenging to implement well in parallel. It is not too strong a statement to say that these graph partitioning libraries are essential to the broad success of scalable high-performance computing in computational science and engineering.
Graph Partitioning, Single-Core Optimization (100 points)
In computational science and high-performance computing, the graph partition problem is defined on data represented in the form of a graph G = (V,E) with V vertices and E edges, such that it is possible to partition G into smaller components with specific properties. For instance, a k-way partition divides the vertex set into k smaller components. In general, good solutions for the graph partitioning problem require that cuts are small and partitions have equal size. The problem arises, for instance, when assigning work to a parallel computer. In order to achieve efficiency, the workload (partition size) should be balanced evenly among the processors while communication between them (edge cut) should be kept to a minimum. Other important application domains of graph partitioning, and a detailed overview of advances in the field can be found at [?].
Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological, and biological networks. Since graph partitioning is an NP-hard problem, practical solutions are based on heuristics. There are two broad categories of methods, local and global. Well known local methods are the Kernighan–Lin algorithm, and Fiduccia–Mattheyses algorithms, which were the first effective 2-way cuts by local search strategies. Their major drawback is the arbitrary initial partitioning of the vertex set, which can affect the final solution quality. Global approaches rely on properties of the entire graph and not on an arbitrary initial partition. The most common example is spectral partitioning, where a partition is derived from the spectrum of the graph Laplacian matrix. In cases where the nodes of the graph are associated with a coordinate list, inertial bisection is also a frequent choice of partitioning method.
The spectral method was initially considered the standard to solve graph partitioning problems. It utilizes the spectral
0 0.1 0 0.2 0.3 0 0 0 n
W := := Xw 0 1.2 0 0 i∈A,j∈B j=1
0.2 0.7 0.8 0 0 0 0 1.7
0.3 −0.1 0 −0.2
L := D − W .
Figure 2: A weighted, undirected and connected graph G(V,E), with 4 vertices and 5 edges, with its degree D, adjacency W, and graph Laplacian L matrices.
graph theorem of linear algebra, that enables the decomposition of a real symmetric matrix into eigenvalues, within an orthonormal basis of eigenvectors. For an undirected graph G(V,E), with vertex set V = {v1,··· ,vn} and two complementary subset V1,V2, such that V1 ∪ V2 = V and V1 ∩ V2 = ∅, we consider an indicator vector x ∈ Rn such that
,
Some of the n nodes of the graph are connected via m edges, each of which has a positive weight associated with it, thus
(
> 0, if vi connected to vj, wij =
= 0 otherwise.
A bisection is computed using the eigenvector associated with the second smallest eigenvalue of the graph Laplacian matrix L ∈ Rn×n, which encodes the degree , i.e.; the number of connected edges, of each vertex along its diagonal, and the negative weighs −wij. For an undirected and connected graph G(V,E), as illustrated in Figure 2, with n vertices and m edges, the graph Laplacian can be realized in terms of the adjacency matrix W ∈ Rn×n and the degree matrix D ∈ Rn×n as follows. The graph Laplacian is a symmetric positive semidefinite matrix, with its smallest eigenvalue being λ(1) = 0, and the associated eigenvector v(1) = c1 being the constant one. The eigenvector v(2) associated with the second smallest eigenvalue λ(2) is Fiedler’s celebrated eigenvector, and is crucial for spectral graph partitioning. Each node vi of the graph is associated with one entry in v(2). Thresholding the values of v(2) around 0 results in two roughly balanced (equal sized) partitions, with minimum edgecut, while thresholding around the median value of v(2) produces two strictly balanced partitions. The procedure to compute a bisection using spectral partitioning is summarized in Algorithm 1. For a much more detailed description of the method we refer to [?].
Algorithm 1 Spectral bisection
Require: G(V,E),
Ensure: A bisection of G
1 function SPECTRALBISECTION(graph G(V,E)) 2 Form the graph Laplacian matrix L
3 Calculate the second smallest eigenvalue λ(2) and its associated eigenvector u(2).
4 Set 0 or the median of all components of u(2) as threshold.
5 Choose V1 := {vi ∈ V |ui < thres},V2 := {{vi ∈ V |ui ≥ thres}.
6 return V1,V2
7 end function
. bisection of G
Inertial bisection is a very different approach, relying on the geometric layout of the graph, i.e. geometric coordinates attached to the vertices of the graph. The main idea of the method is to find a hyperlane running though the ”center of gravity of the points.” In 2 dimensions (2D) such a line L is chosen such that the sum of squares of the distance of the nodes to the line is minimized. It is defined by a point P = (x,y) and a vector u = (u1,u2) with
such that L = {P + αu|α ∈ R} (Figure 3). We choose
, (1)
as the center of mass lies on the line L. Finally, we need to compute u in order to minimize the sum of distances
, (2)
where Sxx,Sxy,Syy are the sums as defined in the previous line. The resulting matrix M is symmetric, thus the minimum of (2) is achieved by choosing u to be the normalized eigenvector corresponding to the smallest eigenvalue of M. The procedure of bisecting a graph using inertial partitioning is summarized in Algorithm 2. We refer again to [?] and references therein for a thorough overview of the inertial method.
The last task of the inertial partitioning routine (line 4), i.e partitioning nodes associated with geometric coordinates around a direction normal to the partitioning plane, is already implemented in the toolbox provided to you in function partition.m. The eigenvalue computations, both for spectral and inertial bisection should be performed with the in-Matlab function eig. Type help eig, or help eigs in your command line for more information.
Partitioning metrics
In what follows we will use the number of cut edges between partitions to determine the quality of the partitioning result. The size of an edge separator, or edgecut, which partitions the graph into a vertex subset and its complement V1 ∪ V2 = V is defined as follows: cut(V1,V2) = X wij. (3)
i∈V1,j∈V2
Figure 3: Illustration of inertial bisection in 2D [?].
Algorithm 2 Inertial bisection.
Require: G(V,E),Pi = (xi,yi), i = 1,...,n the coordinates of the nodes
Ensure: A bisection of G
1 function INERTIALBISECTION(graph G(V,E),Pi) 2 Calculate the center of mass of the points
. acc. to (1)
3 Compute the eigenvector associated with the smallest eigenvalue of M
. M acc. to (2)
4 Partition the nodes of the graph around line L.
5 return V1,V2
6 end function
. bisection of G
The calculation of cut edges is implemented in the function cutsize.m of our toolbox. The cardinality of each subset (i.e the number of nodes in each partition) is given by the 2-norm of the indicator vector x
x . (4)
Good scalability of parallel distributed memory solvers requires equally sized (balanced) cardinalities, so that the waiting time of the individual threads is minimized. For a k-way partition the load imbalance bk is defined as the ratio of the highest partition cardinality over the average partition cardinality,
, (5)
where V˜ is the average partition weight. The load imbalance , where characterizes the deviation from obtaining an evenly balanced partitioning. The optimal value for is therefore zero, and it implies that the k partitions contain exactly the same number of nodes. In most applications it suffices to ensure that bkr is bounded from above by a desired threshold.
The purpose of this assignment is
1. construct graphs from lists describing node connectivity
2. to implement various graph partitioning algorithms in Matlab and to test these methods on a variety of graphs;
3. to use a standard software package for graph partitioning (such as METIS) and to use modern computational science software engineering tools such as GitHub and mex to interface external software libraries with Matlab;
4. to evaluate the quality of the various graph partitioning algorithm, and visualize the partitioning results.
Software tools
We will use one external partioning software tool for this HPC miniproject. METIS[1] [?], probably the most widely used graph partitioning software, was developed by Karypis and Kumar and is probably the most widely used graph partitioning software. It consists of a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. It is based on local techniques that iteratively improve starting solutions by swapping nodes between the partitions that lie in the neighborhood of the cut. Kernighan and Lin [?] were the first to propose a local search method in this fashion. Their partition refinement strategy, which swaps pairs of vertices that result in the largest decrease in the size of the edgecut, was later accelerated by Fiduccia and Mattheyses [?]. METIS is based on this method, and subsequent improvements of it. Finally, in terms of computational efficiency, numerical experiments have demonstrated that graphs with several millions of vertices can be partitioned to 256 parts in a few seconds on current generation workstations and laptops using METIS.
The assignment
You can find the graph partitioning toolbox PartToolbox in the iCorsi webpage. This toolbox contains Matlab code for the creation of several graph and graph partitioning methods, e.g., coordinate bisection, and has routines to generate recursive multiway partitions and visualize the partitioning results.
1. Install METIS 5.0.2, and the corresponding Matlab mex interface:
In order to use METIS you need the corresponding Matlab interface, since METIS is written in the C language. You can use the precompiled Matlab interface (metismex.mexmaci64) for Mac and a precompiled version for Linux (metismex.mexa64, Ubuntu, glibc 2.27). Then, all you need to do is to tell Matlab where to find the binary (see addpath command in Matlab). If you want to compile the package yourself (e.g., if you are using a different OS), read carefully the instruction on https://github.com/dgleich/metismex on how to install METIS 5.0.2, and how to use cmake, mex, and Matlab to build a mex interface between MATLAB and METIS.
Copy your Matlab interface metismex.mexa64 in the PartToolbox directory so that it can be used for the partitioning of meshes using METIS. Alternatively, specify the path where the interface is located (use the addpath command). Check that the interface works using the following code snippet:
>> A = blkdiag ( ones (5) , ones ( 5 ) ) ;
>> A(1 ,10) = 1; A(10 ,1) = 1; A(5 ,6) = 1; A(6 ,5) = 1;
>> [ p1 , p2 ] = metispart ( sparse (A) ) p1 =
1 2 3 4 5
p2 =
6 7 8 9 10
2. Construct adjacency matrices from connectivity data [10 points]:
Your first programming task in this assignment is to construct adjacency matrices from a collection of Comma Separated Value files (.csv), describing the edge structure and the node coordinates. These files are located in datasets/countrymeshes and follow the naming convention “CountryName-NumberOfNodesFileType.csv”. The countries considered here are Great Britain, Greece, Norway, Russia, Switzerland and Vietnam. The files describing the adjacency matrices contain a list of the nodes that are connected through an edge, and the files describing the coordinates of the graph contain a list with the x,y coordinates of each node. The resulting graphs correspond to the continental maps of the above-mentioned countries, with the overseas territories excluded since we are interested in connected graphs. Some examples are illustrated in Figure 4.
Run the Matlab script src/readcsvgraphs.m and complete the missing sections of the code. Your goal
is to
• read the .csv files in MATLAB,
• construct the adjacency matrix W ∈ Rn×n and the node coordinate list C ∈ Rn×[2], and Note that W must be symmetric and sparse 2.
• visualize the graphs of Norway and Vietnam using the function src/Visualization/gplotg.m
• save the sparse adjacency matrices and the corresponding coordinates in a new folder,
e.g. /datasets/countries/mat.
Figure 4: Graphs corresponding to the maps of various countries. Left: Greece with 3117 nodes and 3902 edges. Right: Switzerland with 4468 nodes and 15230 edges.
3. Implement various graph partitioning algorithms [30 points]:
• Run in Matlab the script Benchbisection.m and familiarize yourself with the Matlab codes in the directory PartToolbox. An overview of all functions and scripts is offered in Contents.m.
• Implement spectral graph bisection based on the entries of the Fiedler eigenvector. Use the incomplete Matlab file bisectionspectral.m for your solution.
• Implement inertial graph bisection. For a graph with 2D coordinates, this inertial bisection constructs a line such that half the nodes are on one side of the line, and half are on the other. Use the incomplete Matlab file bisectioninertial.m for your solution.
• Report the bisection edgecut for all toy meshes that are loaded in the script ”Bench bisection.m.” Use Table 1 to report these results.
Table 1: Bisection results
Mesh
Coordinate
Metis 5.0.2
Spectral
Inertial
mesh1e1
18
mesh2e1 netz4504 dual stufe
37
4. Recursively bisecting meshes [20 points]:
We will now partition 2D graphs emerging from structural engineering matrices of NASA, available in the SuiteSparse Matrix Collection (SSMC) [?]. A robust algorithm that allows us to create a 2l partition, with l being an integer, is presented in Algorithm 3. It uses the recursive function Recursion that takes as inputs the part C0 of the graph to partition, the number of parts p0 into which we will partition C0, and the index (an integer) of the first part of the partition of C0 into the final partitioning result Gp. In practize, only strongly balanced bisection routines are utilized within this algorithm [?].
Algorithm 3 Recursive bisection.
Require: G(V,E),
Ensure: A p-way partition of G
1 Gp = {C1,...,Cp}
2 p = 2l . p is a power of 2
3 function RECURSIVEBISECTION(graph G(V,E), number of parts p)
4 function RECURSION(C0,p0, index)
5 if then . If p0 is even, a bisection is possible
6
7bisection(C0)
8 RECURSION ,index)
9 RECURSION ,index+k0)
10 else . No more bisection possible, C0 is in a partition of G
11 Cindex ← C0
12 end if
13 end function
14 RECURSION(C,p,1)
15 return Gp . Gp is a partition of G in p = 2l parts
16 end function
This algorithm is implemented in the file recbisection.m of the toolbox. Utilize this function within the script Benchrecbisection.m to recursively bisect the finite element meshes loaded within the script in 8 and 16 subgraphs. Use your inertial and spectral partitioning implementations, as well as the coordinate partitioning and the METIS bisection routine. Summarize your results in 2. Finally, visualize the results for p = 16 for the case ”crack”. An example for spectral recursive partitioning is illustrated in Figure 5.
5. Comparing recursive bisection to direct k-way partitioning [10 points]:
Recursive bisection is highly dependent on the decisions made during the early stages of the process, and also suffers from the lack of global information. Thus, it may result in suboptimal partitions [?]. This necessitated the development of robust methods for direct k-way partitioning.
Besides recursive bi-partitioning, METIS also employs a multilevel k-way partitioning algorithm. The graph
Table 2: Edge-cut results for recursive bi-partitioning.
Case
Spectral
Metis 5.0.2
Coordinate
Inertial
mesh3e1
airfoil1 3elt barth4 crack
Figure 5: Left: The finite element mesh ”crack” with 10240 nodes and 30380 edges. Right: 16-way recursive bisection of the mesh using Metis 5.0.2. Number of cut edges: 1290.
G = (V,E) is initially coarsened down to a small number of vertices, a k-way partitioning of this much smaller graph is computed, and then this partitioning is projected back towards the original finer graph by successively refining the partitioning at each intermediate level [?].
We will compare the quality of the cut resulting from the application of recursive bipartitioning and direct multiway partitioning, as implemented in Metis 5.0.2. Our test cases will be the graphs presented in Figure 6. These graphs emerge from the road networks of Luxemburg and the US, with edges representing road segments and node intersections. Graph partitioning is crucial for computing driving directions in such networks, a fundamental problem of practical importance. It can be solved in almost linear time by Dijkstra’s shortest-path algorithm, but this is not fast enough for large scale road networks. Various lightweight alternatives have been suggested [?], that use graph partitioning as a preprocessing tool to define the reduced (partitioned) topology of the network. Additionally, we will consider the map-graphs that you created in the second task of this assignment.
Use the incomplete Benchmetis.m for your implementation. Compare the cut obtained from Metis 5.0.2 after applying recursive bisection and direct multiway partitioning for the graphs in question. Consult the Metis manual, and type help metismex in your MATLAB command line to familiarize yourself with the way the Metis recursive and direct multiway partitioning functionalities should be invoked. Summarize your results
Figure 6: Left: The road network of Luxemburg with 114599 nodes and 119666 edges. Right: A segment of the US road network with 126146 nodes and 161950 edges.
in Table 3 for 16 and 32 partitions. Comment on your results. Was this behavior anticipated? Visualize the partitioning results for the graphs of i) USA, ii) Luxemburg, and iii) Russia for 32 partitions.
Table 3: Comparing the number of cut edges for recursive bisection and direct multiway partitioning in Metis 5.0.2.
Partitions
Luxemburg
usroads-48
Greece
Switzerland
Vietnam
Norway
Russia
16
32
Figure 7: Partitioning the Airfoil graph based on the values of the Fiedler eigenvector. The two partitions are depicted in black and gray, while the cut edges in red respectively. The z-axis represents the value of the entries of the eigenvector.
6. Utilizing graph eigenvectors [30 points]:
Provide the following illustrative results. Use the incomplete script Bencheigenplot.m for your implementation.
a) Plot the entries of the eigenvectors associated with the first (λ1) and second (λ2) smallest eigenvalues of the graph Laplacian matrix L for the graph ”airfoil1.” Comment on the visual result. Is this behavior expected?
b) Plot the entries of the eigenvector associated with the second smallest eigenvalue λ2 of the Graph Laplacian matrix L. Project each solution on the coordinate system space of the following graphs: mesh3e1, barth4, 3elt, crack. An example is shown in Figure 7, for the graph ”airfoil1”.
Hint: You might have to modify the functions gplotg.m and gplotpart.m to get the desired result.
c) In this assignment we dealt exclusively with graphs G(V,E) that have coordinates associated with their nodes. This is, however, most commonly not the case when dealing with graphs, as they are in fact abstract structures, used for describing the relation E over a collection of entities V . These entities very often cannot be described in a Euclidean coordinate space. Therefore graph drawing is a tool to visualize relational information between nodes. The optimality of graph drawing is measured in terms of computation speed the ultimate usefulness of the resulting layout [?]. A successful layout should transmit the clearly the desired message, e.g the subsets of a partitioned graph. We will now see a spectral graph drawing method, which constructs the layout utilizing the eigenvectors of the graph Laplacian matrix L. Draw the graphs mesh3e1, barth4, 3elt, crack, and their spectral bi-partitioning results using the eigenvectors to supply coordinates. Locate vertex i at position:
xi = (v2(i),v3(i)),
where v2,v3 are the eigenvectors associated with the 2nd and 3rd smallest eigenvalues of L. Figure 8 illustrates these 2 ways of visualizing the partitions of the ”airfoil1” graph.
Figure 8: Visualizing the bipartitioning of the graph ”airfoil1” with 4253 nodes and 12289 edges. Left: Spatial coordinates. Right: Spectral coordinates.
Additional notes and submission details
Submit the source code files (together with your used Makefile) in an archive file (tar, zip, etc.), and summarize your results and observations for all exercises by writing an extended Latex report. Use the Latex template provided on the webpage and upload the Latex summary as a PDF to iCorsi.
• Your submission should be a gzipped tar archive, formatted like project number lastname firstname.zip or project number lastname firstname.tgz. It should contain:
– all the source codes of your MATLAB solutions.
– your write-up with your name project number lastname firstname.pdf,
• Submit your .zip/.tgz through iCorsi.
• You only need MATLAB for this mini-project.
References
[1] C.E. Bichot and P. Siarry. Graph Partitioning. ISTE. Wiley, 2013.
[2] Aydin Buluc, Henning Meyerhenke, Ilya Safro, Peter Sanders, and Christian Schulz. Recent Advances in Graph Partitioning, pages 117–158. Springer International Publishing, Cham, 2016.
[3] Timothy A. Davis and Yifan Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1–1:25, December 2011.
[4] Daniel Delling and Renato F. Werneck. Faster customization of road networks. In Vincenzo Bonifaci, Camil Demetrescu, and Alberto Marchetti-Spaccamela, editors, Experimental Algorithms, pages 30–42, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
[5] U. Elsner. Graph Partitioning: A Survey. Technical report, Technische Universitat Chemnitz, Germany, 97-27,¨ 1997.
[6] C. M. Fiduccia and R. M. Mattheyses. A linear-time heuristic for improving network partitions. In Proceedings of the 19th Design Automation Conference, DAC ’82, pages 175–181, Piscataway, NJ, USA, June 1982. IEEE
Press.
[7] George Karypis and Vipin Kumar. Parallel multilevel k-way partitioning scheme for irregular graphs. In Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, Supercomputing 96, page 35es, USA, 1996. IEEE Computer Society.
[8] B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49(2):291–307, Feb 1970.
[9] Y. Koren. Drawing graphs by eigenvectors: Theory and practice. Comput. Math. Appl., 49(1112):18671888, June 2005.
[10] Horst D. Simon and Shang-Hua Teng. How good is recursive bisection? SIAM J. Sci. Comput., 18(5):14361445, September 1997.
[1] http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
[2] Check the symmetry of your resulting adjacency matrix with the function issymmetric.m. If there is an edge missing and the matrix is non-symmetric, apply the transformation Wsym .