$34.99
Table of Content:
1. Objectives and Learning outcomes
2. Data
3. Tasks
3.1. Format
3.2. Tasks to do
3.3. Sanity checks
4. Submission
5. Marking Rubric
1. Objectives and learning outcomes
Assignment 1 & 2 walked you through what you have learnt in Lectures 1 to 7 and also the “middle pipeline” or Collection, Wrangling, Analyse and Present of our Standard Value Chain. It provides you an introduction to the Data Science lifecycle. This assignment relates to the latter part of this unit, in the use of the BASH Shell and the R programming language to work on larger datasets.
This assignment will test your ability to:
• Navigate the BASH Shell
• Process large file using BASH Shell o Use online resources or the “man” pages or the “--help” to assist in the commands Output a processed file to CSV format using BASH Shell
• Read a processed file in R o Conduct visualisation using R
Contribute to the following learning outcomes:
LO 4. Classify participants in a data science project: such as statistician, archivist, analyst, and systems architect;
LO 5. Classify the kinds of data analysis and statistical methods available for a data science project;
LO 6. Locate suitable resources, software and tools for a data science project;
2. Data
Format: compressed file (FB_dataset.gz) provided via Moodle site
Note: You will need to use either a Windows Subsystem for Linux (WSL) in Windows OS, Linux machine, a Mac terminal or Cygwin on a Windows machine for this purpose.
3. Tasks
3.1. Format
3.2. Tasks
Part A: Investigating Facebook Data using shell commands
A1 (2 mark) What is the original file size? Decompress the file. How big is it?
How did you find this? (Do not ignore the case, i.e., lower/upper case)
[Challenge]
Note: Justify your answer. (Do not need to ask for clarification for this, you are to justify your interpretation of the question and your approach).
Part B: Investigating Facebook Data using shell commands
i.) To answer this question, you will need to extract the timestamps for all posts referring to “Dog” (ignore case) using the BASH Shell. You will then need to read them into R and generate a histogram.
[Hint: To read the data into R, first generate a file containing only the timestamp column as text. Then read the file into R as a CSV file.]
ii.) Once you have converted the timestamps, use the hist() function to plot the data in R.
Challenge: In this question, we want to look at a specific content type that influences engagement on Facebook. To make this task easier, we will specifically look at the number of comments posted against each of the post type (event, link, photo, status and video) for “abcnews”.
i.) Draw a boxplot to show the distribution of comments made against each type of post (event, link, photo, status and video) created by “abc-news”. What can you infer from this plot? Which is the most engaging post type?
ii.)
iii.) iii. Which type of post (event, link, photo, status or video) has on average been most effective for “abcnews”. In other words, which post_type has the highest median comment_count.
3.3. Sanity checks
● After you are done with the tasks, do sanity checks.
○ Even though you don’t need to submit the code script, you will need to double check the your file consisting of the correct code (copied from your Bash Shell/R), answer for the questions that you had attempted and images of your outputs.
● Make sure that your submission contains everything we've asked for.
4. Submission
For this assignment, you are to hand-in your work via Moodle, only 1 well formatted PDF (generated from your word processor, e.g. from Microsoft Word) file is needed.
Details for the submission:
1) Hand in a PDF file containing your answers to all the questions and numbered correspondingly.
2) Your report should include the following cases:
a) The screenshots/images of the outputs/graphs you generate in order to justify your answers to all the questions. Ensure that they are legible, such as making sure that the image resolution is sufficient.
Note: Do not screenshot the code, copy and paste your code to your file.
3) Please be informed that you need to explain what each part of command does for all your answers. For instance, if the code you use is ‘unzip tutorial_data.zip‘, you need to explain that the code is used to uncompress the zip file.
5. Marking Rubric
General points:
• Zip (compressed) file submission will be penalised : Zip file (or any compressed file) submission will have a penalty of 10%.
• Drafts (not submitted): There have been many of you who left your submission in Draft mode. Please make sure to submit your assignments that are in draft mode. Note: For this assignment, we reserve the right not to accept the assignments that are not yet submitted.
Coding Part (60 % of Part
Answers or justify answers (if applicable)
Coding and Visualisation
Answers or justify answers (if applicable)
Have Fun!
Clarifications
Congratulations!
You have completed your in-semester assessments for FIT1043. I hope that you have enjoyed the course assignments, starting from a very guided assignment 1, to something with a little bit of flexibility for you to try out new stuff and compete in assignment 2, and finally assignment 3.