Starting from:

$30

CS224W-Homework SNAP Network analysis Solved

The purpose of these exercises is to get you started with network analysis and the SNAP software. For this homework, you need to install and try out the SNAP network analysis tools. We strongly encourage you to use Snap.py for Python (available from http://snap.stanford.edu/snappy/). However, you can also use SNAP for C++ (available from http://snap.stanford.edu/snap/ download.html)

1                      Analyzing the Wikipedia voters network
Download the Wikipedia voting network wiki-Vote.txt.gz: http://snap.stanford.edu/ data/wiki-Vote.html.

Using one of the network analysis tools above, load the Wikipedia voting network. Note that Wikipedia is a directed network. Formally, we consider the Wikipedia network as a directed graph G = (V,E), with node set V and edge set E ⊂ V × V where (edges are ordered pairs of nodes). An edge (a,b) ∈ E means that user a voted on user b.

To make our questions clearer, we will use the following small graph as a running example:

Gsmall = (Vsmall,Esmall), where Vsmall = {1,2,3} and Esmall = {(1,2),(2,1),(1,3),(1,1)}.

Compute and print out the following statistics for the wiki-Vote network:

1.   The number of nodes in the network. (Gsmall has 3 nodes.)

2.   The number of nodes with a self-edge (self-loop), i.e., the number of nodes a ∈ V where

(a,a) ∈ E. (Gsmall has 1 self-edge.)

                                                              CS224W: Analysis of Networks - Problem Set 0                                                   2



3.   The number of directed edges in the network, i.e., the number of ordered pairs (a,b) ∈ E for which a 6= b. (Gsmall has 3 directed edges.)

4.   The number of undirected edges in the network, i.e., the number of unique unordered pairs (a,b), a 6= b, for which (a,b) ∈ E or (b,a) ∈ E (or both). If both (a,b) and (b,a) are edges, this counts a single undirected edge. (Gsmall has 2 undirected edges.)

5.   The number of reciprocated edges in the network, i.e., the number of unique unordered pairs of nodes (a,b), a 6= b, for which (a,b) ∈ E and (b,a) ∈ E. (Gsmall has 1 reciprocated edge.)

6.   The number of nodes of zero out-degree. (Gsmall has 1 node with zero out-degree.)

7.   The number of nodes of zero in-degree. (Gsmall has 0 nodes with zero in-degree.)

8.   The number of nodes with more than 10 outgoing edges (out-degree 10).

9.   The number of nodes with fewer than 10 incoming edges (in-degree < 10).

Each sub-question is worth 3 points.

2                      Further Analyzing the Wikipedia voters network
For this problem, we use the Wikipedia voters network. If you are using Python, you might want to use NumPy, SciPy, and/or Matplotlib libraries.

1.   (18 points) Plot the distribution of out-degrees of nodes in the network on a log-log scale. Each data point is a pair (x,y) where x is a positive integer and y is the number of nodes in the network with out-degree equal to x. Restrict the range of x between the minimum and maximum out-degrees. You may filter out data points with a 0 entry. For the log-log scale, use base 10 for both x and y axes.

2.   (15 points) Compute and plot the least-square regression line for the out-degree distribution in the log-log scale plot. Note we want to find coefficients a and b such that the function log10 y = a · log10 x + b, equivalently, y = 10b · xa, best fits the out-degree distribution. What are the coefficients a and b? For this part, you might want to use the method called polyfit in NumPy with deg parameter equal to 1.

3     Finding Experts on the Java Programming Language on StackOverflow
Download the StackOverflow network stackoverflow-Java.txt.gz: http://snap.stanford. edu/class/cs224w-data/hw0/stackoverflow-Java.txt.gz. An edge (a,b) in the network means that person a endorsed an answer from person b on a Java-related question.

Using one of the network analysis tools above, load the StackOverflow network. Note that StackOverflow is a directed network.

Compute and print out the following statistics for the stackoverflow-Java network:

1.   The number of weakly connected components in the network. This value can be calculated in Snap.py via function GetWccs.

                                                              



2.   The number of edges and the number of nodes in the largest weakly connected component. The largest weakly connected component is calculated in Snap.py with function GetMxWcc.

3.   IDs of the top 3 most central nodes in the network by PagePank scores. PageRank scores are calculated in Snap.py with function GetPageRank.

4.   IDs of the top 3 hubs and top 3 authorities in the network by HITS scores. HITS scores are calculated in Snap.py with function GetHits.

More products