Starting from:

$24.99

DSE5002 Assignment 8 Solution

1 Assignment 8

1.1 Question 1
Read in the two Netflix CSV files from /Data/Netflix as pandas dataframes. Print the number of unique genres. This is not as simple as it sounds. You cannot simply find the length of titles['genres'].unique(). You must convert the output of that code to a list, iterate over that list and replace the following characters: []',. Once you have them replace you can split the individual strings to list items and flatten the list. I have already imported the chain() function for you to flatten the list. Look up the documentation to see its usage. There are 19 unique genres, but I want you to write the code to find them.
[ ]: # your code here
1.2 Question 2
Print the release year and the imdb score of the highest average score of all movies by year. This is trickier than it sounds. To do this you will need to aggregate the means by year. If you use the simple method you will get a pandas series. The series will need to be converted to a dataframe and the index will need to be set as a column (release year). Once you have done that you can find the numerical index with the highest average imdb score.

1
1.3 Question 3
There were 208 actors in the movie with the most credited actors. What is the title of that movie? Nulls and NaN values do not count.

1.4 Question 4
Which movie has the highest IMDB score for the actor Robert De Niro? What year was it made? Create a kdeplot (kernel density estimation to show the distribution of his IMDB movie scores.

1.5 Question 5
Create two new boolean columns in the titles dataframe that are true when the description contains war or gangster. Call these columns war_movies and gangster_movies. How many movies are there in both categories? Which category has a higher average IMDB score? Show the IMDB score kernel density estimations of both categories.

2

More products