Starting from:

$24.99

CSE6332 PROJECT-5 Like in Project-3, your task is to write a Spark program tha Solution

t finds the connected components of any undirected graph and prints the size of these connected components. A connected component of a graph is a subgraph of the graph in which there is a path between any two vertices in the subgraph. As you can see in the incomplete Graph.scala program, the variable graph has type RDD[(Long, Long, List[Long])]. That is, it is a dataset of vertices, where each vertex is a triple (group, id, adj), where id is the vertex id, group is the group id (initially equal to id), and adj is the list of outgoing neighbors. For example, from the input line 8,5,6,7, you need to return the tuple (8, 8, List(5, 6, 7)). Then, during the forloop, you need to construct a dataset which, for each vertex, generates group candidates. For example, from the vertex (4, 8, List(5, 6, 7)) you generate the group candidates (8, 4), (5, 4), (6, 4), and (7, 4), that is, 8 can be in group 4, 5 can be in group 4, etc. Then, for each vertex id, you select the minimum group candidate. This will be the new group for this vertex at this iteration. These new groups are stored in the variable groups. Then, you need to reconstruct the graph with the new groups so that your program can do more iterations (there are 5 iterations). After the loop, you print the sizes of each group.

More products