Starting from:

$29.99

NET4103-7431 Homework Solution

Network science and Graph Learning
Credits: Some of these problems have been adapted from the following classes:
• Prof. Aaron Clauset Network Analysis and Modeling, 2014
Submitting answers: Prepare answers to your homework in a single PDF/DOCX file and submit it via Moodle.
Submitting code: Provide a link to a github classroom repository containing all code assignment. Failure to submit your code will result in reduction of all points for that question from your project score.
Background Information
Facebook100 is a dataset of Facebook friendship connections from each of the 100 US universities during a single-time snapshot in fall 2005 (when one had to have a .edu account to be on Facebook).
The dataset is avalaible inside the following github repository
https://classroom.github.com/a/jm4seIEs
or you can download the dataset from the following link:
https://partage.imt.fr/index.php/s/iyFWSQPJNmc7AC7
Facebook history (this text is extracted from the sec. 2 of [1])
Harvard
Friendster launches
Columbia
Stanford
Yale
Cornell, Dartmouth
UPenn, MIT
LinkedIn launchesNYU, BU
Duke, Georgetown, UVA
BC, Tufts, Northeastern, Illinois
Florida, Wellesley, Michigan,
Michigan State, Northwestern
UCLA
Thefacebook.com Emory, UNC, Tulane, UChicago, Rice launches at HarvardWashU
UC Davis, UC San Diego
USC
Caltech, UC Santa Barbara
Rochester, Bucknell
Williams, Amherst, Swarthmore,
Facebook drops the "the" Wesleyan, Oberlin, Middlebury,Hamilton, Bowdoin
South Florida, Central Florida,
Florida State, GWU, Johns Hopkins
Syracuse, Notre Dame, Maryland
Maine, Smith, UC Irvine, Villanova, Virginia Tech,
UC Riverside, Cal Poly, Mississippi, Michigan Tech,
UCSC, Indiana, Vermont, Auburn, U San Fran, Facebook launches News Feed William & Mary, Miami, James Madison, UT Austin,Wake Forest, Santa Clara, American, Haverford,
Facebook open to everyone Simmons, Binghamton, Temple, Texas A&M, Vassar,
Pepperdine, Wisconsin, Colgate, Rutgers, Howard,
UConn, UMass, Baylor, Penn State, Tennessee,
Lehigh, Oklahoma, Reed, Brandeis
Trinity (and 9 others)
(a) Key milestone in the early history of Facebook
(Figure extracted from [1] (b) Fraction of undergraduate that adopted Facebook vs. network index
Figure 1
Description of the network dataset
For nearly all colleges, alumni made up about 10-25% of users, a number that increased with the age of the network. Vertices labeled as faculty, staff or students who were not regular undergraduates (graduate students and summer students) made up on average
4.1% of each population.
Question 1: Reading
Read the following documents [3, 2, 1]
Question 2: Social Network Analysis with the Facebook100 Dataset The smallest network (Caltech) has 762 nodes in the largest connected component (LCC), and the largest has more than 40000 nodes in the LCC.
Lets use three networks from the FB100: Caltech (with 762 nodes in the LCC), MIT (which has 6402 nodes in the LCC), and Johns Hopkins (which has 5157 nodes in the LCC).
(a) (1 point) For these three networks plot the degree distribution for each of the three networks that you downloaded. What are you able to conclude from these degree distributions?
(b) (1 point) Compute the global clustering coefficient and mean local clustering coefficient for each of the 3 networks. In addition compute the edge density of each network. Should either of these networks be construed as sparse? Based on the density information and the clustering information what can you said about the graph topology?
(c) (1 point) For each network, also draw a scatter plot of the degree versus local clustering coefficient. Based on these calculations as well as your previous ones, are you able to draw any conclusions about any similarities or differences between the tree networks? What other observations can you make?
Question 3: Assortativity Analysis with the Facebook100 Dataset
In this question we expect you will compute the assortativity on a large set of graphs (if possible all the graphs).
(a) (2 points) Of the FB100 networks, investigate the assortativity patterns for five vertex attributes: (i) student/faculty status, (ii) major, (iii) vertex degree, and (iiii) dorm, (iiiii) gender. Treat these networks as simple graphs in your analysis.
For each vertex attribute, make a scatter plot showing the assortativity versus network size n, with log-linear axes for all 100 networks, and a histogram or density plot showing the distribution of assortativity values. In both figures, include a line indicating no assortativity. Briefly discuss the degree to which vertices do or do not exhibit assortative mixing on each attribute, and speculate about what kind of processes or tendencies in the formation of Facebook friendships might produce this kind of pattern. For example, below are figures for assortativity by gender on these networks. The distribution of points spans the line of no assortativity, with some values nearly as far below 0 as others are above 0. However, the gender attributes do appear to be slightly assortative in these social networks: although all values are within 6% in either direction of 0, the mean assortativity is 0.02, which is slightly above 0. This suggests a slight amount of homophily by gender (like links with like) in the way people friend each other on Facebook, although the tendency is very weak. In some schools, we see a slight tendency for heterophily (like links with dislike), as one might expect if the networks reflected heteronormative dating relationships.

Network size Gender Assortativity
Figure 2: a) Gender assortativity as function of the networks size. b) Gender assortativity distribution.
Question 4: Link prediction
In this question we expect you will compute the link prediction algorithms on a large set of graphs (> 10).
(a) Read the following documents [4].
(b) (2 points) Implement the following link prediction metrics: common neighbors, jaccard, Adamic/Adar. We use the scikit-learn API as an example for our implementation of the link prediction metrics. Please use the implementation (in listing. 1) as an example. Your implementation should inherit from the class LinkPrediction defined in listing. 1. You should implement yourself the given metrics, don’t used the ones defined in Networkx (c) (2 points) Evaluating a link predictor:
1. Select graph Gfb(V,E) in the Facebook100 dataset
2. randomly remove a given fraction f ∈ [0.05,0.1,0.15,0.2] of edges Eremoved from the original graph Gfb.
3. For each node pair in the graph |V |×|V |, for each node pair compute the link predictor metrics of interest p, these are the predicted ”friendship” Epredict.
4. Sort in decreasing order of confidence as a function p from the node pair Epredict and then we take the first k pairs of nodes .
5. Compute the size of the intersection between the edge set of removed edges and the edge set of predicted node . Then compute the top@k, recall@k and precision@k (for k = 50,100,200,··· ,400) using the k best scored edges provided by link predictor algorithm (see [5] for more information). Where the top@k predictive rate is the percentage of correctly classified positive samples among the top k instances in the ranking produced by a link predictor P.

Prediction Terminology: TP stands for true positives, TN stands for true negatives, FP stands for false positives, and FN stands for false negatives.
(d) (2 points) Choose a couple of graphs in the facebook100 dataset run and evaluate each link predictor on them, and conclude on the efficiency of the following metrics: common neighbors, jaccard, Adamic/Adar.
from abc import ABC from abc import abstractmethod import networkx as nx import numpy as np import progressbar
class LinkPrediction(ABC):
def __init__(self, graph):
"""
Constructor
Parameters
---------graph: Networkx graph
""" self.graph = graph self.N = len(graph)
def neighbors(self, v):
"""
Return the neighbors list of a node
Parameters
---------v: int node id
Return
-----neighbors_list: python list
""" neighbors_list = self.graph.neighbors(v) return list(neighbors_list)
@abstractmethod def fit(self):
raise NotImplementedError("Fit must be implemented")
class CommonNeighbors(LinkPrediction):
def __init__(self, graph):
super(CommonNeighbors, self).__init__(graph)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Listing 1: Python implementation example of a Link prediction metric
Question 5: Find missing labels with the label propagation algorithms
In this question we expect you will compute the label propagation algorithm on a large set of graphs (> 10). We studied in class two algorithms with the name ”label propagation” that have different objective, choose wisely the one to implement.
(a) Read the following document [6].
(b) (2 points) Implement in python the label propagation algorithm [6], please consider pytorch and networkx for the development of your algorithm.
(c) (2 points) Choose a network from The Facebook100 dataset and randomly select 10%, 20%, and 30% of of the node attributes of the network to be removed. Use the label propagation algorithm you implemented to recover the missing attributes. Perform this operation for each of the following attributes : ”dorm”, ”major”, ”gender”.
fraction removed
0.1 0.2 0.3 0.4
Duke Major 0.282 0.265 0.255 0.241
Dorm 0.529 0.523 0.500 0.463
Y ear 0.913 0.905 0.898 0.891
Gender 0.675 0.684 0.682 0.679
Table 1: Accuracy of the label propagation algorithm
) (1)
(e) (1 point) Conclude on the accuracy of the label propagation algorithm for different labels, could you explain why is there such difference in the accuracy between each type of label ?
Question 6: Communities detection with the FB100 datasets
Formulate a research question about group formation among students in the FB100 dataset. To validate your hypothesis, use only a few universities and a community detection algorithm of your choice to extract the different groups of students. To help you formulate a research question, some of the following references might be useful [8, 9] and section 3.4 in [2].
(a) (1 point) Formulate a research question about group formation in FB100 and explain your hypothesis.
(b) (1 point) Write the code to validate your research question and show the result using a few selected community detection algorithms and graphs.
(c) (1 point) Explain the results and conclude whether your experiment confirms your initial hypothesis.
References
[1] Jacobs, A. Z., Way, S. F., Ugander, J. & Clauset, A. Assembling thefacebook: Using heterogeneity to understand online social network assembly. In Proceedings of the ACM Web Science Conference, WebSci ’15, 18:1–18:10 (2015). URL https://arxiv. org/abs/1503.06772.
[2] Traud, A. L., Kelsic, E. D., Mucha, P. J. & Porter, M. A. Comparing community structure to characteristics in online collegiate social networks. SIAM Review 53, 526–543 (2011). URL https://arxiv.org/abs/0809.0690.
[3] Traud, A. L., Mucha, P. J. & Porter, M. A. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications 391, 4165 – 4180 (2012). URL https://arxiv.org/abs/1102.2166.
[4] Liben-Nowell, D. & Kleinberg, J. The link prediction problem for social networks. In Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM ’03, 556–559 (2003). URL https://www.cs.cornell.
edu/home/kleinber/link-pred.pdf.
[5] Yang, Y., Lichtenwalter, R. N. & Chawla, N. V. Evaluating link prediction methods. Knowledge and Information Systems 45, 751782 (2014). URL http://dx.doi.org/ 10.1007/s10115-014-0789-0.
[6] Bhagat, S., Cormode, G. & Muthukrishnan, S. Node Classification in Social Networks. In Social Network Data Analytics, 115–148 (Springer US, 2011). URL https:// arxiv.org/abs/1101.3291v1.

More products