One of the goals of network analysis is to find mathematical models that characterize real-world networks and that can then be used to generate new networks with similar properties. In this problem, we will explore two famous models—Erd˝os-R´enyi and Small World—and compare them to real-world data from an academic collaboration network. Note that in this problem all networks are undirected. You may use the starter code in hw1-q1-starter.py for this problem.
• Erd˝os-R´enyi Random graph (G(n,m) random network): Generate a random instance of this model by using n = 5242 nodes and picking m = 14484 edges at random. Write code to construct instances of this model, i.e., do not call a SNAP function.
• Small-World Random Network: Generate an instance from this model as follows: begin with n = 5242 nodes arranged as a ring, i.e., imagine the nodes form a circle and each node is connected to its two direct neighbors (e.g., node 399 is connected to nodes 398 and 400), giving us 5242 edges. Next, connect each node to the neighbors of its neighbors (e.g., node 399 is also connected to nodes 397 and 401). This gives us another 5242 edges. Finally, randomly select 4000 pairs of nodes not yet connected and add an edge between them. In total, this will make m = 5242·2+4000 = 14484 edges. Write code to construct instances of this model, i.e., do not call a SNAP function.
• Real-World Collaboration Network: Download this undirected network from http://snap. stanford.edu/data/ca-GrQc.txt.gz. Nodes in this network represent authors of research papers on the arXiv in the General Relativity and Quantum Cosmology section. There is an edge between two authors if they have co-authored at least one paper together. Note that some edges may appear twice in the data, once for each direction. Ignoring repeats and self-edges, there are 5242 nodes and 14484 edges. (Note: Repeats are automatically ignored when loading an (un)directed graph with SNAP’s LoadEdgeList function).
1.1 Degree Distribution
Generate a random graph from both the Erd˝os-R´enyi (i.e., G(n,m)) and Small-World models and read in the collaboration network. Delete all of the self-edges in the collaboration network (there should be 14,484 total edges remaining).
Plot the degree distribution of all three networks in the same plot on a log-log scale. In other words, generate a plot with the horizontal axis representing node degrees and the vertical axis representing the proportion of nodes with a given degree (by “log-log scale” we mean that both the horizontal and vertical axis must be in logarithmic scale). In one to two sentences, describe one key difference between the degree distribution of the collaboration network and the degree distributions of the random graph models.
1.2 Clustering Coefficient
Recall that the local clustering coefficient for a node vi was defined in class as
0 otherwise,
where ki is the degree of node vi and ei is the number of edges between the neighbors of vi. The average clustering coefficient is defined as
.
Compute and report the average clustering coefficient of the three networks. For this question, write your own implementation to compute the clustering coefficient, instead of using a built-in SNAP function.
Which network has the largest clustering coefficient? In one to two sentences, explain. Think about the underlying process that generated the network.
2 Structural Roles: Rolx and ReFex
In this problem, we will explore the structural role extraction algorithm Rolx and its recursive feature extraction method ReFex. As part of this exploration, we will work with a dataset representing a scientist co-authorship network, which can be dowloaded at http://www-personal. umich.edu/~mejn/netdata/netscience.zip. [1] Although the graph is weighted, for simplicity we treat it as undirected and unweighted in this problem.
Feature extraction consists of two steps; we first extract basic local features from every node, and we subsequently aggregate them along graph edges so that global features are also obtained. Collectively, feature extraction constructs a matrix V ∈ Rn×f where for each of the n nodes we have f features to cover local and global information. Rolx extracts node roles from that matrix.
2.1 Basic Features
We begin by loading the graph G provided in the bundle and computing three basic features for the nodes. For each node v, we choose 3 basic local features (in this order):
1. the degree of v, i.e., deg(v);
2. the number of edges in the egonet of v, where egonet of v is defined as the subgraph of G induced by v and its neighborhood;
3. the number of edges that connect the egonet of v and the rest of the graph, i.e., the number of edges that enter or leave the egonet of v.
We use V˜u to represent the vector of the basic features of node u. For any pair of nodes u and v, we can use cosine similarity to measure how similar two nodes are according to their feature vectors x and y:
Sim( ;
Also, when ||x||2 = 0 or ||y||2 = 0, we defined Sim(x,y) = 0.
Compute the basic feature vector for the node with ID 9, and report the top 5 nodes that are most similar to node 9 (excluding node 9). As a sanity check, no element in V˜9 is larger than 10.
2.2 Recursive Features
In this next step, we recursively generate some more features. We use mean and sum as aggregation functions.
Initially, we have a feature vector V˜u ∈ R3 for every node u. In the first iteration, we concatenate the mean of all u’s neighbors’ feature vectors to V˜u, and do the same for sum, i.e.,
,
where N(u) is the set of u’s neighbors in the graph. If N(u) = ∅, set the mean and sum to 0.
After K iterations, we obtain the overall feature matrix V = V˜(K) ∈ R3K+1.
For this exercise, run K = 2 iterations, and report the top 5 nodes that are most similar to node 9 (excluding node 9). If there are ties, e.g. 4th, 5th, and 6th have the same similarity, report any of them to fill up the top-5 ranking. As a sanity check, the similarities between the reported nodes and node 9 are all greater than 0.9. [5 points]
Compare your obtained top 5 nodes with previous results from 2.1. In particular, are there common nodes? Are there different nodes? In one sentence, why would this change? [3 points]
2.3 Role Discovery
In this part, we explore more about the graph according to the recursive feature vectors of nodes and node similarity.
(a) Produce a 20-bin histogram to show the distribution of cosine similarity between node 9 andany other node in the graph (according to their recursive feature vectors). Note here that the x-axis is cosine similarity with node 9, and the y-axis is the number of nodes. [3 points]
According to the histogram, can you spot some groups / roles? How many can you spot? (Clue: look for the spikes!) [2 points]
(b) For these groups / roles in the cosine similarity histogram, take one node u from each group to examine the feature vector, and draw the subgraph of the node based on its feature vector. You can draw the subgraph by hand, or you can use libraries such as networkx or graphviz.
For these drawings, you should use the local features for u, and pay attention to the features aggregated from its 1-hop neighbors, but feel free to ignore further features if they are difficult to incorporate. Also, you should not draw nodes that are more than 3-hops away from u. [6 points]
Briefly argue how different structural roles are captured in 1-2 sentences. [1 point]
3 Community detection using the Louvain algorithm
Note: For this question, assume all graphs are undirected and weighted.
Communities are a fundamental aspect of several networks. However, it is often not an easy task to come up with an “optimal” grouping of nodes into communities. Through this problem, we will explore some properties of the Louvain algorithm for community detection (so named because the authors were all affiliated with the University of Louvain in Belgium at some point). The original paper from Blondel et al. is available here: https://arxiv.org/pdf/0803.0476.pdf. If you are stuck on this question at any point please refer to the paper; there is a good chance that you will find what you seek there.
We will first explore the idea of modularity. The modularity of a weighted graph is a measure that compares the density of edges within a community to the density of edges between communities. Formally, we define the modularity Q for a given graph as follows:
(1)
Here 2m = ΣAij is the sum of all entries in the adjacency matrix, Aij represents the (i,j)th entry of the adjacency matrix, di represents the degree of node i, δ(ci,cj) is 1 when i and j are in the same community (ci = cj) and 0 otherwise. Note that we treat communities as disjoint. In other words, a given node from a graph can only belong to one community in that graph.
The modularity of a graph lies in the range [−1,1]. Maximizing the modularity of a given graph is a computationally hard problem, so we try different heuristics for this purporse. One such heuristic is the Louvain algorithm. This algorithm outperforms many similar algorithms in terms of both speed as well as maximum modularity obtained. We will run a few steps of the algorithm on a couple of example networks to gain some insights about its behavior and properties.
Each pass of the algorithm has two phases, and proceeds as follows:
• Phase 1 (Modularity Optimization): Start with each node in its own community.
• For each node i, go over all the neighbors j of i. Calculate the change in modularity when i is moved from its present community to j‘s community. Find the neighbor jm for which this process causes the greatest increase in modularity, and assign i to jm‘s community (break ties arbitrarily). If there is no positive increase in modularity during this process, keep i in its original community.
• Repeat the above process for each node (going over nodes multiple times if required) until there is no further maximization possible (that is, each node remains in its community). This is the end of Phase 1.
• Phase 2 (Community Aggregation): Once Phase 1 is done, we contract the original graph G to a new graph H. Each community found in G after Phase 1 becomes a node in H. The weights of edges in between 2 nodes in H are given by the sums of the weights between the respective 2 communities in G. The weights of edges within a community in G sum up to form a self-edge of the same weight in H (be careful while calculating self-edge weights, note that you will have to go once over each original node within the community in G). This is the end of Phase 2. Phase 1 and Phase 2 together make up a single pass of the algorithm.
• Repeat Phase 1 and Phase 2 again on H and keep proceeding this way until no further improvement is seen (you will reach a point where each node in the graph stays in its original community to maximize modularity). The final modularity value is a heuristic for the maximum modularity of the graph.
Figure 1: Example from Blondel et al. showing the two phases of the Louvain algorithm. Note how weights for self-edges are assigned in the Community Aggregation phase.
An example taken from Blondel et al. (Figure 1) illustrates the two phases of the algorithm. Note how weights for self-edges are assigned in the Community Aggregation phase - we will need to use this later. The weight of the self-edge formed by merging all nodes in a community K would be given by Σi∈KΣj∈KAij - you can verify this for yourself by checking in Figure 1 that node 11 and 13 have an edge of weight 1 between them, but the corresponding self-edge has a weight of 2.
3.1 Modularity gain when an isolated node moves into a community
Consider a node i that is in a community all by itself. Let C represent an existing community in the graph. Node i feels lonely and decides to move into the community C, we will inspect the change in modularity when this happens.
This situation can be modeled by a graph (Figure 2) with C being represented by a single node. C has a self-edge of weight Σin. There is an edge between i and C of weight ki,in/2 (to stay consistent with the notation of the paper). The total degree of C is Σtot and the degree of i is ki. As always, 2m = ΣAij is the sum of all entries in the adjacency matrix. To begin with, C and i are in separate communities (colored green and red respectively). Prove that the modularity gain seen when i merges with C (i.e., the change in modularity after they merge into one community) is given by:
(2)
Hint: Using the community aggregation step of the Louvain method may make computation easier.
In practice, this result is used while running the Louvain algorithm (along with a similar related result) to make incremental modularity computations much faster.
Figure 2: Before merging, i is an isolated node and C represents an existing community. The rest of the graph can be treated as a single node for this problem.
3.2 Louvain algorithm on a 16 node network
Consider the graph G (Figure 3), with 4 cliques of 4 nodes each arranged in a ring. Assume all the edges have same weight value 1. There exists exactly one edge between any two adjacent cliques. We will manually inspect the results of the Louvain algorithm on this network. The first phase of modularity optimization detects each clique as a single community (giving 4 communities in all). After the community aggregation phase, the new network H will have 4 nodes.
Figure 3: G is a graph with 16 nodes (4 cliques with 4 nodes per clique).
• What is the weight of any edge between two distinct nodes in H? [1 point]
• What is the weight of any self-edge in H? [2 point]
• What is the modularity of H (with each node in its own community)? The easiest way to calculate modularity would be to directly apply the definition from 1 to H (also holds for upcoming questions of this type). [2 points]
Spoiler alert: In this network, this is the maximum modularity and the algorithm will terminate here. However, assume that we wanted to contract the graph further into a two node network (call it J) by grouping two adjacent nodes in H into one community and the other two adjacent nodes into another community, and then aggregating (following the same rules of the community aggregation phase).
• What is the weight of any edge between two distinct nodes in J? [1 point]
• What is the weight of any self-edge in J? [2 point]
• What is the modularity of J (with each node in its own community)? [2 points] As expected, the modularity of J is less than the modularity of H.
3.3 Louvain algorithm on a 128 node network
Now consider a larger version of the same network, with 32 cliques of 4 nodes each (arranged in a ring as earlier); call this network Gbig. Again, assume all the edges have same weight value 1, and there exists exactly one edge between any two adjacent cliques. The first phase of modularity optimization, as expected, detects each clique as a single community. After aggregation, this forms a new network Hbig with 32 nodes.
• What is the weight of any edge between two distinct nodes in Hbig? [1 point]
• What is the weight of any self-edge in Hbig? [2 point]
• What is the modularity of Hbig (with each node in its own community)? [2 points]
After what we saw in the earlier example, we would expect the algorithm to terminate here. However (spoiler alert again), that doesn’t happen and the algorithm proceeds. The next phase of modularity optimization groups Hbig into 16 communities with two adjacent nodes from Hbig in each community. Call the resultant graph (after community aggregation) Jbig.
• What is the weight of any edge between two distinct nodes in Jbig? [1 point]
• What is the weight of any self-edge in Jbig? [2 point]
• What is the modularity of Jbig (with each node in its own community)? [2 points]
This particular grouping of communities corresponds to the maximum modularity in this network (and not the one with one clique in each community). The community grouping that maximizes the modularity here corresponds to one that would not be considered intuitive based on the structure of the graph.
3.4 What just happened?
Explain (in a few lines) why you think the algorithm behaved the way it did for the larger network (you don’t need to prove anything rigorously, a rough argument will do). In other words, what might have caused modularity to be maximized with an “unintuitive” community grouping for the larger network?
4 Spectral clustering
This question derives a spectral clustering algorithm that we then use to analyze a real-world dataset. These algorithms use eigenvectors of matrices associated with the graph. You may find this handout https://bit.ly/2l0dXCL on graph clustering to be useful for additional background information.
Let’s first discuss some notation:
• Let G = (V,E) be a simple (that is, no self- or multi- edges) undirected, connected graph with n = |V | and m = |E|.
• A is the adjacency matrix of the graph G, i.e., Aij is equal to 1 if (i,j) ∈ E and equal to 0 otherwise.
• D is the diagonal matrix of degrees: Dii = Pj Aij = the degree of node i.
• We define the graph Laplacian of G by L = D − A.
For a set of nodes S ⊂ V , we will measure the quality of S as a cluster with a “cut” value and a “volume” value. We define the cut of the set S to be the number of edges that have one end point in S and one end point in the complement set S¯ = V \S:
cut(S) = X Aij.
i∈S,j∈S¯
Note that the cut is symmetric in the sense that cut(S) = cut(S¯). The volume of S is simply the sum of degrees of nodes in S:
vol(S) = Xdi,
i∈S
where di is the degree of node i.
4.1 A Spectral Algorithm for Normalized Cut Minimization: Foundations
We will try to find a set S with a small normalized cut value:
cut(S) cut(S¯)
NCUT(S) = + (3) vol(S) vol(S¯)
Intuitively, a set S with a small normalized cut value must have few edges connecting to the rest of the graph (making the numerators small) as well as some balance in the size of the clusters (making the denominators large).
Define the assignment vector x for some set of nodes S such that
(4)
Prove the properties below.
Note: There are many ways to prove the following properties, and we provide some hints for one of the ways. You do not necessarily need to use the provided hints for your proof.
(i) L = P(i,j)∈E(ei − ej)(ei − ej)T , where ek is an n-dimensional column vector with a 1 at position k and 0’s elsewhere. Note that we aren’t summing over the entire adjacency matrix and only count each edge once.
(ii) xT Lx = P(i,j)∈E(xi − xj)2. Hint: Apply the result from part (i).
(iii) xT Lx = c·NCUT(S) for some constant c (in terms of the problem parameters). Hint: Rewrite the sum in terms of S and S¯.
(iv) xT De = 0, where e is the vector of all ones.
(v) xT Dx = 2m.
4.2 Normalized Cut Minimization: Solving for the Minimizer
Since xT Dx is just a constant (2m), we can formulate the normalized cut minimization problem in the following way:
xT Lx
minimize S⊂V, x∈Rn subject to
as in Equation 4
(5)
The constraint that x takes the form of Equation 4 makes the optimization problem NP-hard. We will instead use the “relax and round” technique where we relax the problem to make the optimization problem tractable and then round the relaxed solution back to a feasible point for the original problem. Our relaxed problem will eliminate the constraint that x take the form of Equation 4 which leads to the following relaxed problem:
minimize
x∈Rn(6) subject to xT De = 0, xT Dx = 2m
Show that the minimizer of Equation 6 is D−1/2v, where v is the eigenvector corresponding to the second smallest eigenvalue of the normalized graph Laplacian L˜ = D−1/2LD−1/2. Finally, to round the solution back to a feasible point in the original problem, we can take the indices of all positive entries of the eigenvector to be the set S and the indices of all negative entries to be S¯.
Hint 1: Make the substitution z = D1/2x.
Hint 2: Note that e is the eigenvector corresponding to the smallest eigenvalue of L.
Hint 3: The normalized graph Laplacian L˜ is symmetric, so its eigenvectors are orthonormal and form a basis for Rn. This means we can write any vector x as a linear combination of orthonormal eigenvectors of L˜.
4.3 Relating Modularity to Cuts and Volumes
In Problem 3, we presented the modularity of a graph clustering in the context of the Louvain Algorithm. Modularity actually relates to cuts and volumes as well. Let’s consider a partitioning of our graph A into 2 clusters, and let y ∈ {1,−1}n be an assignment vector for a set S:
Then, the modularity of the assignment y is
if i ∈ S if i ∈ S¯
(7)
. (8)
Let y be the assignment vector in Equation 7. Prove that
cut( vol( (9)
Thus, maximizing modularity is really just minimizing the sum of the cut and the negative product of the partition’s volumes. As a result, we can use spectral algorithms similar to the one derived in parts 1-2 in order to find a clustering that maximizes modularity. While this might provide an intuitively “better” clustering after inspection than the Louvain Algorithm, spectral algorithms are computationally intensive on large graphs, and would only partition the graph into 2 clusters.
Note: You only need to prove the relationship between modularity and cuts; you do not need to derive the actual spectral algorithm.
[1] We provide a binary file named hw1-q2.graph for you to directly load into snap by
G = snap.TUNGraph.Load(snap.TFIn("hw1-q2.graph")) . You are welcome to either use this file or load from raw data yourself.