Starting from:

$25

CSC4760 - Big Data Programming - Assignment 5 - Counting Tweets - Solved

Input Datasets: 

state
count
Georgia
3
Florida
2
Alabama
2
Tweets (tweets.json):

user
geo
tweet
Bob
Atlanta
It is a sunny day!
Susan
Athens
We have a football game today :)
David
Atlanta
Today is cold.
Lisa
Auburn
I love Auburn University
Ben
Birmingham
I will go to Atlanta today!
Paul
San Francisco
We watch a movie today!
Smith
San Diego
It is hot today. Summer comes.
Ethan
Log Angeles
Oscar ceremony is wonderful!
Emma
Log Angeles
I love Oscar ceremony!
Rolando
Orlando
I will go to the beach!
Mia
Miami
Sunny Day!
City and State lookup table (cityStateMap.json):

city
state
Atlanta
Georgia
Athens
Georgia
Miami
Florida
Orlando
Florida
Birmingham
Alabama
Auburn
Alabama
Log Angeles
California
San Francisco
California
San Diego
California
Problem and Output Data: 

We want to count the number of tweets published in each state. The following table shows the desired results.

California
4
Implementation: 

Design and implement a PySpark program to solve the problem. We did not provide any template python file this time. You may want to create one python file from scratch.

You are required to use Spark Dataframe to implement this function.


More products