$29.99
Objective
The objective of this assignment is to investigate and visualise data using Python in the Jupyter Notebook environment. This assignment will test your ability to:
● Read data from files in Python,
● Manipulate the data,
● Describe the data using basic statistics,
● Produce non-graphical and graphical visualization to explore the data,
● Communicate your findings as insights, and
● Self-learn new techniques from other resources to complement what is taught in this unit.
Data
The data is presented in three comma-separated (CSV) files sourced from Kaggle, The World
Bank, and The United Nations. The files should be obtained from Moodle and saved in the "data" folder (directory) where your Jupyter ipython notebook is. The data is:
• “LifeExpectancyData-v2.csv” contains information related to life expectancy, health factors for 193 countries have been collected from the same WHO data repository website, and its corresponding economic data was collected from the United Nation website (source: https://www.kaggle.com/kumarajarshi/life-expectancy-who ). As part of the exercise, you can get the description of the fields (columns) on the Kaggle site.
Most of the columns are self-explanatory but do participate in the Moodle forum to ask for clarifications or discussion on the data.
Note: For this assignment, DO NOT download the latest data from the sources. Because some of the columns have been removed, only utilize the provided data files.
Submission
This assignment has to be done using the Jupyter Notebook only. Your Jupyter Notebook has to use the Markdown language for proper formatting of the report and answers, with inline Python code and graphs.
You are to hand in two files:
1. The Jupyter Notebook file (.ipynb) that contains a working copy of your report (using Markdown) and Python code that answers the questions.
2. A PDF file that is generated from your Jupyter Notebook. Execute your Python code and then download it as a PDF document. To do so (in Windows), you can do a “Print Preview”, then “Print” the document, and then select “Save as PDF”. Note that there are other ways to do this, depending on the environment that you are in. Alternatively, you can download as HTML and then “Print” that to PDF. Again, participate in the Moodle forum if you need assistance on this.
Clarifications
This assignment is not intended to provide step-by-step directions, and I anticipate some clarification questions. The questions can range from "What is a PDF file?" to” to something relating to the possible meaning of a column in the CSV file.
I would like you to post these questions on the Moodle Forum and I strongly encourage interactions between all of you in the forum. Some of the questions probably don’t have a single answer or a correct answer and is up to each individual’s interpretation. Just make sure that you do not post answers in the forum.
Link to Moodle Forum (https://edstem.org/au/courses/8243/discussion/ )
Assignment
Tasks
You should start your assignment by providing the title of the assignment and unit code, your name and student ID, e.g.
The tasks will involve:
• Importing the necessary libraries, o ensure you explain each step (like the “hidden” example above)
• Read the files, o do not change the location of the intended files, i.e. they should be in a folder called “data”
o make sure you show that you have read the data correctly
• Wrangle the data, o sub-setting the necessary data,
▪ For the Life_expectancy related DataFrame, you are to keep the
columns: country, Status, max_life_expectancy, mean_BMI, mean_income_composition_of_resources,
mean_schooling, and mean_life_expectancy (aggregated from the respective columns).
o proper renaming of the columns (and indexing),
o this assignment only needs the South East Asian countries, including East Timor. A little bit of geography needed here and a bit of general knowledge as well. For this, you are expected to create a list or tuple or other data structure to store the names of the countries that is required (and explain why you selected the data structure).
• merge the files correctly,
• manage any data type issues or data issues,
• feature engineer (create) the column “perCapitaGDP”, and o as a guide, your final DataFrame should have 11 rows (the countries) and 10
• provide some statistical description of the final data that you have.
o Interpret the data that you have obtained using basic statistics.
You are then to select the appropriate plots (graphs) and provide some basic insights to the following questions (referred to as Question 1, 2 & 3 in the rubrics):
3. For the final question, you will probably need the non-aggregated data from
As extras, you can answer the following (not graded)
• Should a country having a lower life expectancy value(<65) increase its healthcare expenditure in order to improve its average lifespan?
• What is the impact of schooling on the lifespan of humans?
• Does Life Expectancy have positive or negative relationship with drinking alcohol?
• Do densely populated countries tend to have lower life expectancy?
• What is the impact of Immunization coverage on life Expectancy?
You can probably try to answer many other questions just from these datasets, for those who are interested, you can discuss among yourselves and use Ed forum for your discussion.
Marking Rubrics (Guideline ONLY)
Report Appropriately formatted using Markdown (and HTML) and content 1 mark - Using at least 3 formatting codes (Markdown or HTML)
Wrangling, merging the files into one DataFrame 1 mark – Using a list/tuple/other data structure to store the SEA countries, and explaining the choice.
Question 1 1 mark – Appropriately explained choice of graphing.
Question 3 1 mark - Code and graph for life expectancy (logical and executable)
Have Fun!
After completing this project, you should have a solid understanding of Drew Conway's Venn Diagram. By completing this assignment, you will have demonstrated your "hacking skills" (via your Python code), you should have touched on some basic statistics (though you did not use them effectively for understanding Machine Learning), and hopefully, you will have persuaded that you have some domain knowledge (e.g., South East Asian countries and Life expectancy– useful to know if you don't already!).