Starting from:

$25

DATA201- Assignment 1 Solved



Dowload LifeExpectancy.csv from the course webpage, and read it into Python, skipping the necessary rows and reading the header. Make the country name be an index. Print the first few rows to ensure that you have it correct. [2 marks]

In [2]:



Out[2]:

                     Country      Indicator



 
 


Indicator Code        1960     1961     1962     1963     1964     19 Code Name

Country

Name



Aruba
ABW
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
65.662
66.074
66.444
66.787
67.113
67.4
Afghanistan
AFG
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
32.446
32.962
33.471
33.971
34.463
34.9
Angola
AGO
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
37.524
37.811
38.113
38.430
38.760
39.1
Albania
ALB
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
62.283
63.301
64.190
64.914
65.463
65.8
Andorra
AND
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
NaN
NaN
NaN
NaN
NaN
N
5 rows × 63 columns

Drop rows that consist of NaN values. Be careful how you do this, the naive way of just using dropna() without looking at the data a bit might not do what you expect. You might need to know that to delete a column you can use life.drop([list of column names],axis=1 . [2 marks]


In [3]:




Out[3]:

                     Country      Indicator

Indicator Code 1960 1961 1962 1963 Code Name

Country

Name



Aruba
ABW
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
65.662000
66.074000
66.444000
66.787000
Afghanistan
AFG
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
32.446000
32.962000
33.471000
33.971000
Angola
AGO
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
37.524000
37.811000
38.113000
38.430000
Albania
ALB
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
62.283000
63.301000
64.190000
64.914000
Arab World
ARB
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
46.546909
47.141621
47.731783
48.320432
...
...
...
...
...
...
...
...
Samoa
WSM
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
56.902000
57.188000
57.472000
57.756000
Yemen, Rep.
YEM
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
29.919000
30.163000
30.500000
30.943000
South Africa
ZAF
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
48.406000
48.777000
49.142000
49.509000
Zambia
ZMB
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
46.687000
47.084000
47.446000
47.772000
Zimbabwe
ZWE
Life expectancy at birth, total (years)
SP.DYN.LE00.IN
53.019000
53.483000
53.946000
54.403000
235 rows × 61 columns



Now plot the curves of life expectancy against time on 1 plot for the following countries:

Afghanistan, Nepal, New Zealand, Netherlands Include a legend, and make the labels on the x-axis readable. [2 marks] In [4]:



Plot Rwanda separately, and explain briefly why it has that shape (hint: use wikipedia) [2 marks]

In [5]:



Can you detect any other countries where the life expectancy drops significantly?

To compute this write some loops over each country and each year. If the next value is below 95% of the current value, print the name of the country. [4 marks]

In [6]:



# convert to a numpy array, because iterating over a dataframe is non-pattern

# and computationlally expensive due to the constant creation of Series objects df_array = df.reset_index().to_numpy() start_year = 1960 for row in df_array:

   prev = 0    for i, item in enumerate(row[4:]):

       if prev != 0:            if get_drop(prev, item) < 0.95:                year = start_year+i

               print(''.join(['In ', str(year), ' there is a significant (<95%) drop f

rom ', str(year-1), ' in ', row[0], 

                              '\'s life expectancy, going from ', str(prev), ' to ', s tr(item), '\n']))

       prev = item

In 1972 there is a significant (<95%) drop from 1971 in Cambodia's life ex pectancy, going from 39.699 to 36.676

In 1973 there is a significant (<95%) drop from 1972 in Cambodia's life ex pectancy, going from 36.676 to 32.667

In 1974 there is a significant (<95%) drop from 1973 in Cambodia's life ex pectancy, going from 32.667 to 28.04

In 1975 there is a significant (<95%) drop from 1974 in Cambodia's life ex pectancy, going from 28.04 to 23.595

In 1976 there is a significant (<95%) drop from 1975 in Cambodia's life ex pectancy, going from 23.595 to 20.317

In 1977 there is a significant (<95%) drop from 1976 in Cambodia's life ex pectancy, going from 20.317 to 18.907

In 1987 there is a significant (<95%) drop from 1986 in Rwanda's life expe ctancy, going from 50.233000000000004 to 47.409

In 1988 there is a significant (<95%) drop from 1987 in Rwanda's life expe ctancy, going from 47.409 to 43.361000000000004

In 1989 there is a significant (<95%) drop from 1988 in Rwanda's life expe ctancy, going from 43.361000000000004 to 38.439

In 1990 there is a significant (<95%) drop from 1989 in Rwanda's life expe ctancy, going from 38.439 to 33.413000000000004

In 1991 there is a significant (<95%) drop from 1990 in Rwanda's life expe ctancy, going from 33.413000000000004 to 29.248

In 1992 there is a significant (<95%) drop from 1991 in Rwanda's life expe ctancy, going from 29.248 to 26.691

Compute the mean life expectancy for each of the countries over the whole 57 years. Plot a bar chart of this for the first 10 countries. [4 marks]



Find the 5 countries with the highest mean life expectancy and the 5 with the lowest. You might find life.sort_values() helpful, as well as pd.concat and life.transpose . Plot a box and whisker plot of these countries. [4 marks]



In [ ]:

More products