Starting from:

$30

COMP3162-Project 2 Solved

1. Tweet Data Analysis
a.       Merge the tweets collected from project 01 by each person in the team.                  

b.      Remove all duplicate tweets in the newly merged set of tweets. A tweet is a duplicate if the text is exactly the same as the text in another tweet. In removing the duplicate tweets, it might be

useful to keep the one that has the highest retweet count.                                                   c. Explore the merged tweets and provide descriptive statistics.                                         

d.      What are the dominant emotions associated with beverages in any two locations?

e.       What are the dominant emotions in the overall dataset?                                         

f.        What is the overall sentiment in tweets regarding “beverages” and “party or concert”

                        (separately)?                                                                                                                          

g.       Conduct ONE additional analysis of your choice to discover any further useful insights.

                                                                                                                                                                     

                                                                                     

2. Collect, Explore, Prepare Structured Data
a.       Download the datafile consumer_pt02_2021.csv from OurVLE 

b.      Explore the data and provide details on all fields retrieved. You should ensure all features in the dataset (each column) are reviewed and summarized to verify things such as value ranges, missing values etc. Be sure to generate relevant graphical representations where necessary to

                        demonstrate your review and decision making.                                                                    

c.       Fix noise, outlier and any other issues discovered (example: na values). You must provide discussion / explanation of all activities done and why each decision has been made.     

d.      Format/reformat the data as necessary. Please note that as you proceed through the project, you may need to do additional formatting to enable your analysis.                 

 

3. Structured Data Analysis/Modeling  
Write code to conduct analysis that will answer the questions below. You are encouraged to use tables/graphs where necessary to visualize results. Additionally, your code should be shown along with each question, the result and notes that explain the results.

a.       What is the average spend on beverages in each country?                                        

b.      Which country has the highest spending on beverages?                                          

c.       Which country consumes the most beverages?                                                                   

d.      What is the average profit from the sale of beverages in each country?                   

e.       What has been the total revenue from beverages for each year since 2014?                        

f.        Plot a time series graph showing change in overall revenues from beverages for the last six months (in the dataset).                                                                                                g. What is the dominant sales channel for beverages?                                                            

 

 

 


 

 

h.      Determine whether beverages units sold is above the overall average for units sold for all other products.                                                                                                                 

i.         In which season (Spring, Summer, Autumn, Winter) does persons spend the most on

                        beverages?                                                                                                                           

Is there a correlation between the season and the units sold for beverages? Explain the result.   

                                                                                                                                                                   

 

 

4. Recommendation: 

a. Based on your analysis of both the tweet data and structured data, what would you recommend to Hard Knocks and why?                                                                                        

 

 

5. BONUS
a. Which features in the dataset can be used to predict the units sold for beverages

More products