Starting from:

$25

CSC4760 - Big Data Programming - Assignment 2 - Solved

PageRank

Dataset: 

The toy dataset is the following graph. The PageRank values are already known. We can use it to check your program.

  

Figure 1: A toy graph for computing PageRank. The number on the edge represents the transition probability from one node to another.

The PageRank values are given in the following table (given that the decay factor 𝑐𝑐 = 0.85):

Nodes
PageRank Values
1
0.1556
2
0.1622
3
0.2312
4
0.2955
5
0.1556
 

PageRank: 

Compute the PageRank value of each node in the graph. Please refer to the slides for more details about the PageRank method. The key PageRank equation is as follows.

𝐫𝐫 = 𝑐𝑐𝐏𝐏⊤𝐫𝐫 + (1 −𝑐𝑐)𝟏𝟏/𝑛𝑛

where 𝐫𝐫 represents the 𝑛𝑛 × 1 PageRank vector with each element 𝐫𝐫𝑖𝑖 representing the PageRank value of node 𝑖𝑖, 𝑛𝑛 represents the number of nodes in the graph, 𝐏𝐏 represents the 𝑛𝑛 ×𝑛𝑛 transition probability matrix with each element 𝐏𝐏𝑖𝑖,𝑗𝑗  𝑑𝑑𝑖𝑖 representing the transition probability from node 𝑖𝑖 to node 𝑗𝑗, 𝑑𝑑𝑖𝑖 represents the degree of node 𝑖𝑖, 𝐏𝐏⊤ represents the transpose of 𝐏𝐏, 𝑐𝑐 ∈ (0,1) represents a decay factor, 𝟏𝟏 represents a 𝑛𝑛 ×1 vector of all 1’s, and 𝑛𝑛 represents the number of nodes in the graph.

Please see the slides for more details.

In this assignment, we set the decay factor 𝑐𝑐 = 0.85 and set the number of iterations to 30.

Implementation: 

Design and implement a MapReduce program to compute the PageRank values.

You are encouraged to implement the PageRank algorithm from scratch without using the provided “PageRankIncomplete.java” file.

The provided “PageRankIncomplete.java” file is incomplete. It will help you start programming with Hadoop. You need to understand the existing code and basic structure in order to complete the file.

Example command:

hadoop                                 jar                                           PageRank.jar                        file:///home/rob/pagerank/01InitialPRValues.txt    file:///home/rob/pagerank/02AdjacencyList.txt      file:///home/rob/pagerank/output    30


More products