Starting from:

$25

CS69011-Assignment 2 Solved

Consider a social network like Facebook, which has users π‘ˆ = {𝑒%, … , 𝑒(}, and shared content 𝐢 = {𝑐%, … , 𝑐,}. The shared content may include images, videos, third party URLs etc., which signify user activity and interest on the social networks. Additionally, the social network has two more data sources: friendship links (undirected edges) between users 𝐸 ⊆ π‘ˆ × π‘ˆ, and sharing links (directed edges) 𝑆 ⊆ π‘ˆ × π‘‰, where each edge 2𝑒3, 𝑐45 ∈ 𝑆 denotes that user 𝑒3 has shared content 𝑐4. Together, they form the user-content graph for the social network. For the purpose of this assignment, you can assume that this graph is a connected graph.

 

Input File format: 

 

The user-content graph is input as follows:

 

n (number of users) 

<user id 1: <list of users connected to user 1 

… 

<user id n: <list of users connected to user n 

m (number of contents) 

<user id 1: <list of contents posted by user 1 

… 

<user id n: <list of contents posted by user n 

 

 

Task 1:                                                                                                                               

 

For the first task, consider the problem of content recommendation, a content 𝑐4 is shown to user 𝑒3 based on the distance (length of the shortest path) between 𝑒3 and 𝑐4. Note that, there are no paths through the content nodes since all share edges are directed. Consider 𝑒% and 𝑒7 are not friends (say connected through the social network with distance 5), but both have shared content 𝑐%, and 𝑒7 has also shared content 𝑐7. Then, the distance between 𝑒% and 𝑐7 is not 3 due to path (𝑒% - 𝑐% - 𝑒7 - 𝑐7).

 

Your task is to write a program which will list the first π‘˜ recommended contents, 𝑐39, … , 𝑐3:, in increasing order of their distances from user 𝑒3. The content which is already shared by 𝑒3 should not be displayed. Here the user 𝑒3 and π‘˜ should be taken as input, and name of the file containing user content graph should be taken as command line input.

 

Hint: Use a modification of Dijkstra’s single source shortest path algorithm.

 

Input: filename for the user content graph in the above format, the user 𝑒3 for whom the content will be recommended and π‘˜ the maximum number of recommended contents. These should be taken as command line input.

 

Output: The output for this task is a list of recommended content  π‘39, … , 𝑐3: along with the corresponding distances  π‘‘39, … , 𝑑3: from user 𝑒3.

 

Algorithm description: 

Dijkstra’s algorithm maintains, at each stage:

1.     Distances to all nodes in the graph, some of them infinity.

2.     A set Q of all vertices which can be explored next.

In every step, the minimum distance node in set Q is finalized (it’s distance is frozen as the shortest distance and the node is removed from Q), and its neighbors are processed or relaxed (added to the set Q and their distance are updated). You have to modify the above algorithm so that you keep a counter which counts the number of content nodes which are finalized. Once this number reaches π‘˜, you can stop the algorithm and output the π‘˜ content nodes that you have discovered along with their distances.

 

Data structures: Both the distance and the Q set can be maintained as arrays of size the number of vertices. Mindistance node can be found using a linear scan or a min-heap. The min-heap implementation is more efficient and will get more marks.

 

You can call this function int topKMinDist(Graph *G, int start, int * nodes, int * 

distances), where G is an input graph pointer (can be taken as an adjacency list), and nodes and distances are arrays for return top k node ids and distances. The return value from the function can be the actual number of content nodes (which could be less than k).

 

Task 2:                                                                                                                                                                 

 

In the second task, consider the problem of friend recommendation, using the number of common contents in the π‘˜-neighborhood. Specifically, given a user 𝑒3, you will rank users 𝑒4 using the count of contents 𝑐<, such that distances between 𝑒3 and 𝑐<, 𝑑(𝑒3, 𝑐<), and 𝑑(𝑒4, 𝑐<) are both less than π‘˜. Let 𝑄(𝑒3, 𝑒4) be the count of k-neighbors which are content nodes in the user-content graph. Write a program which given a user content graph, a user 𝑒3 and π‘˜, prints the list of top 𝑝 users 𝑒4 in decreasing order of 𝑄(𝑒3, 𝑒4).

 

Input: the user content graph as above, a user 𝑒3, neighborhood size π‘˜, list length 𝑙. The filename for the user content graph can be taken as first command line input, followed by other arguments in the order mentioned above.

 

Output: The list of top 𝑙 users along with their corresponding distances.

  

Algorithm description: 

In this problem, you have to find all content node whose distance is less than π‘˜ from the starting user node 𝑒3. Let that list be 𝐿. Next, find all user nodes 𝑒4 which are at distance at most π‘˜ from each of these content nodes. Since, there are not paths starting with content nodes, you have consider all paths from all users nodes which are directly connected to the content nodes. The algorithm for finding distance are same as described in task 1.

You need two functions:  

int distanceLContentNodes(Graph *G, int start, int * nodes, int * distances) and

int connectedUserNodes(Graph *G, int start, int * nodes, int * distances)

 

 

Task 3:                                                                                                                                                               

 

Now, consider that there are privacy settings for each share link (𝑒3, 𝑐4) where each link is marked as:

•        Friends: only friends can see the post.

•        Friends of friends: friends of friends can also see the post.

•        Public: everyone can see the post.

 

The privacy setting induces additional and implicit “view” edges, which connect each shared content 𝑐4 to friends of 𝑒3 (1-hop neighbors), friends of friends of 𝑒3 (both 1-hop and 2-hop neighbors), and public (all users of the system).

 

The third task is to recommend content on the basis of distance in the “view” graph, which consists of the original edge as well as view edges introduced due to privacy settings. However, you are not allowed to generate the full view graph and then use the algorithm developed in task 1. Your algorithm should only “expand” the relevant share edges. For example, if the top list ends with content at distance 3 in the view graph, there is no need to expand shares of content beyond distance 5 in the user-content graph, since distance can reduce by at most 2 in the view graph, unless the share is a public one.

 

Note that, here all content can be recommended to all users. Only post information is private.

 

Algorithm hint: You should treat public shares separately, since they will be at distance 1 from all users. While updating the distances in the shortest path algorithm, you should take care of privacy setting of the edge.

 

Input File format: 

 

The modified input format of privacy aware user-content graph is as follows: n (number of users) 

<user id 1: <list of users connected to user 1 

… 

<user id n: <list of users connected to user n 

m (number of contents) 

<user id 1: <posted content 1,<privacy flag 1; <posted content 2,<privacy flag 2; … 

… 

… 

… 

<user id n: <posted content 1,<privacy flag 1; <posted content 2,<privacy flag 2; … 

 

Input: filename for the privacy-aware user-content graph in the above format, the user 𝑒3 for whom the content will be recommended and π‘˜ the maximum number of recommended content. These should be taken as command line input.

 

Output: The output for this task is a list of recommended content  π‘39, … , 𝑐3: along with the corresponding distances  π‘‘39, … , 𝑑3: from user 𝑒3.

 

Algorithm Description: 

There are two cases:

1.     Content 𝑐4 has been posted publicly by some user. Hence, it is at distance 1 from all other users in the view graph. Hence it should be included in the list.

2.     If the number of such content is less than π‘˜. Then, note that all distance 𝑙 content nodes are:

a.      Either “friends” shares of distance 𝑙 − 1 user nodes from 𝑒3

b.     Or “friends of friends” shares of distance 𝑙 − 2 user nodes from 𝑒3.

Use the above logic to progressively enumerate all distance 𝑙 content nodes for 𝑙 = 1, … , 𝐷, where 𝐷 is the diameter of the graph. If all content node have been listed, or π‘˜-content nodes have been listed (whichever is earlier), exit.

More products