$30
Querying using Hive on Yellow Taxi data
Problem statement:
In this case study, we are giving a real-world example of how to use HIVE on top of the HADOOP for different exploratory data analysis. In here, we have a predefined dataset (2018_Yellow_Taxi_Trip_Data.csv) having more than 15 columns and more than 100000 records in it. The dataset has different attributes like
vendor_id string,
pickup_datetime string,
dropoff_datetime string,
passenger_count int,
trip_distance DECIMAL(9,6),
pickup_longitude DECIMAL(9,6),
pickup_latitude DECIMAL(9,6),
rate_code int,
store_and_fwd_flag string,
dropoff_longitude DECIMAL(9,6),
dropoff_latitude DECIMAL(9,6),
payment_type string,
fare_amount DECIMAL(9,6),
extra DECIMAL(9,6),
mta_tax DECIMAL(9,6),
tip_amount DECIMAL(9,6),
tolls_amount DECIMAL(9,6),
total_amount DECIMAL(9,6),
trip_time_in_secs int
Perform taxi trip analysis by solving the questions below:
What is the total Number of trips ( equal to the number of rows)?
What is the total revenue generated by all the trips? The fare is stored in the column total_amount.
What fraction of the total is paid for tolls? The toll is stored in tolls_amount.
What fraction of it is driver tips? The tip is stored in tip_amount.
What is the average trip amount?
What is the average distance of the trips? Distance is stored in the column trip_distance.
How many different payment types are used?
For each payment type, display the following details:
Average fare generated
Average tip
Average tax – tax is stored in column mta_tax
On average which hour of the day generates the highest revenue?
Q1) Creating table:-
Q.2) Finding no of Records:
Q3. Info about Table: -
Q4) Listing the Distinct Vendors: -
Q5) Maximum Passengers hold by a vendor:-
Q6) How many times each vendor carry their passengers: -
Q7) Maximum tip to the vendor ever: -
Q8) which vendor have been given maximum tip and how much:-
Q9) Fraction to tolls amount w.r.t to total amount:-
Q10) On an average how much mta_tax has been given:-
Q11) Average trip time :- Q12) How many different payment types are used:-
Q13) Average passenger count:
Q14) Maximum time spent by passenger:-
Q15)Distinct Rate Code: