Starting from:

$50

LINMA2472- Homework 1 Part 1 and 2 module “Networks” Solved

Algorithms in Data Science

This assignment is to be completed in groups of 2 or 3, please form your groups on moodle (using the activity called “Group choice for assignment 1”). If you need help to look for teammates, use the “Teammate finder” forum

Assignment 1: co-occurrence network of characters 

Please choose one of the following options:

Find an appealing book (for example, use the Project Gutenberg https://www.gutenberg.org/ to find the text), parse the textual information in order to reconstruct the co-occurrence network of characters. For example, two characters can be linked if they appear in the same paragraph. Examples for inspiration: Lord of the Rings, War and Peace, Les Miserables, etc.
Find a screenplay from your favorite movie (there are many resources can be found by Googling, for example, https://thescriptsavant.com/free-moviescreenplays-am/). Convert the .pdf to text using any online tool and parse the textual information to reconstruct the co-occurrence network of characters, where two characters can be linked if they appear in the same scene. Scenes are usually distinguished in bold notation. Examples for inspiration: Harry Potter, Lord of the Rings, Zootopia, etc.
 

The only requirement here would be to choose a book or a movie with many characters (ideally more than 50). Tools for text processing were discussed on the first lecture.

 

Find degree assortativity of the network and perform community detection using Louvain algorithm. Visualise the results. What can you tell from the them?
Write the code for k-core decomposition (do not use the preprogrammed instance in networkx) and apply it to the network. What can you infer from it?
Generate the preferential attachment (Barabasi-Albert) network with similar average degree and size. Perform same operations on this network. Describe any differences or similarities you can spot.
Report guidelines: 

Write in a concise and structured manner. No long sentences, only relevant information.
You may present your data and the preprocessing steps, but remember that this isn’t the main goal of the report
Any numerical result that can be presented in a table should be presented so.
Round numbers up to 3rd digit, unless it’s really necessary. Don’t copy-paste 10 digits floats.
Plots must be easy-to-read. Must include labels on axes, legend if more than one curve is shown, title or a caption, explaining what the plot is about.
Network properties (k-core shell, community index, etc) can be visualized in color. When doing so, it’s a good practice to add a colourbar (k-core shell) or a summary of each or most representative communities.
Assignment 1 (part 2): maximizing the influence in the network of characters 


Take the network of characters you infer from the first part of the assignment. Imagine there is an important rumour to spread in this network. 


You want it to quickly reach all the people, thus you want to solve the influence maximization problem. Implement the greedy algorithm from the lectures and identify the set MI of maximal influence of size k = 5% of the nodes.

 
Implement the independent cascade model on this network and use it to compare* the outcomes starting from the obtained set MI with similar size set of nodes of largest degrees and a random selection. 
 

(*Comparison can be made by the total size of people reached by a cascade or by the spreading curve : (t,Y(t)) - curve, where t in discrete time and Y(t) is the total average proportion of “infected” people at time t.)

 

More products