$29.99
City Analytics on the Cloud
Background
Assignment Description
The teams should develop a Cloud-based solution that exploits a multitude of virtual machines (VMs) across the UniMelb Research Cloud for harvesting tweets through the Twitter APIs (using both the Streaming and the Search API interfaces). The teams should produce a solution that can be run (in principle) across any node of the UniMelb Research Cloud to harvest and store tweets and scale up/down as required. Teams have been allocated 4 servers (instances) with 8 virtual CPUs and 500Gb of volume storage. All students have access to the UniMelb Research Cloud as individual users and can test/develop their applications using their own (small) VM instances, e.g. using personal instances such as pt-1234. (Remembering that there is no persistence in these small, free and dynamically allocated VMs).
Teams are expected to develop a range of analytic scenarios, e.g. using the MapReduce capabilities offered by CouchDB for social media analytics and comparing the data with official data from AURIN. Teams are free to explore any scenarios that connect “in some way” to the AURIN data. Teams are encouraged to be creative here. A prize will be awarded for the most interesting scenarios identified!
● How many tweets mention Covid-19 or coronavirus and are these clustered in certain areas,
e.g. rich vs poor suburbs or in statistical areas where there are more/less hospitals etc?
● What do the movement patterns of people look like before Covid-19, during and after lockdown etc?
● Is there a correlation between alcohol related tweets or crime and locations of places to buy alcohol (bottleshops)?
● Does language use, e.g. vulgar words used in Twitter happen more or less in wealthy or poor areas?
A front-end web application is required for visualising these data sets/scenarios.
For the implementation, teams are recommended to use a commonly understood language across team members – most likely Java or Python. Information on building and using Twitter harvesters can be found on the web, e.g. see https://dev.twitter.com/ and related links to resources such as Tweepy and Twitter4J. Teams are free to use any pre-existing software systems that they deem appropriate for the analysis and visualisation capabilities, e.g. Javascript libraries, Googlemaps etc.
Error Handling
Final packaging and delivery
You should collectively write a team report on the application developed and include the architecture, the system design and the discussions that lead into the design. You should describe the role of the team members in the delivery of the system and where the team worked well and where issues arose and how they were addressed. The team should illustrate the functionality of the system through a range of scenarios and explain why you chose the specific examples. Teams are encouraged to write this report in the style of a paper than can ultimately be submitted to a conference/journal.
Each team member is expected to complete a confidential report on their role in the project and the experiences in working with their individual team members. This will be handed in separately to the final team report. (This is not to be used to blame people, but to ensure that all team members are able to provide feedback and to ensure that no team has any member that does nothing!!!).
The length of the team report is not fixed. Given the level of complexity of the assignment and total value of the assignment a suitable estimate is a report in the range of 20-25 pages. A typical report will comprise:
● A description of the system functionalities, the scenarios supported and why, together with graphical results, e.g. pie-charts/graphs of Tweet analysis and snapshots of the web apps/maps displaying certain Tweet scenarios;
● A simple user guide for testing (including system deployment and end user invocation/usage of the systems);
● System design and architecture and how/why this was chosen;
● A discussion on the pros and cons of the UniMelb Research Cloud and tools and processes for image creation and deployment;
● Teams should also produce a video of their system that is uploaded to YouTube (these videos can last longer than the UniMelb deployments unfortunately!);
● Reports should also include a link to the source code (github or bitbucket). It is recommended that all students commit their code to the code repository rather than delegate this to a single team member. This can provide an evidence base if teams have “issues”.
It is important to put your collective team details (team, city, names, surnames, student ids) in:
● the head page of the report;
● as a header in each of the files of the software project.
Implementation Requirements
Teams are expected to use:
● a version-control system such as GitHub or Bitbucket for sharing source code.
● MapReduce based implementations for analytics where appropriate, using CouchDB’s built in MapReduce capabilities.
● The entire system should have scripted deployment capabilities. This means that your team will provide a script, which, when executed, will create and deploy one or more virtual machines and orchestrate the set up of all necessary software on said machines (e.g. CouchDB, the twitter harvesters, web servers etc.) to create a ready-to-run system. Note that this setup need not populate the database but demonstrate your ability to orchestrate the necessary software environment on the UniMelb Research Cloud. Teams should use Ansible (http://www.ansible.com/home) for this task.
Teams are also encouraged to describe:
● How fault-tolerant is your software setup? Is there a single point-of-failure?
● Can your application and infrastructure dynamically scale out to meet demand?
One copy of the team assignment is to be submitted through Canvas. The zip file must be named with your team, i.e. <CCC2021-TeamN>.zip.
Marking
The marking process will be structured by evaluating whether the assignment (application + report) is compliant with the specification given. This implies the following:
● Detailed documentation on the system architecture and design – 20%
The (confidential) assessment by your peers in your team on the Qualtrics system will be used to weight your individual scores accordingly. Timeliness in submitting the assignment in the proper format is important. A 10% deduction per day will be made for late submissions.
As a team, you are free to develop your system(s) where you are more comfortable with (at home, on your PC/laptop, in the labs...) but obviously the demonstration should work on the UniMelb Research Cloud.