Starting from:

$24.99

Data-Science SQL Project Solution

Project Description
You're working as an analyst for Zuber, a new ride-sharing company that's launching in Chicago. Your task is to find patterns in the available information. You want to understand passenger preferences and the impact of external factors on rides.
Working with a database, you'll analyze data from competitors and test a hypothesis about the impact of weather on ride frequency.
Description of Data
A database with info on taxi rides in Chicago:
neighborhoods table: data on city neighborhoods - name: name of the neighborhood - neighborhood_id: neighborhood code cabs table: data on taxis - cab_id: vehicle code - vehicle_id: the vehicle's technical ID - company_name: the company that owns the vehicle
Table Scheme

Note: there isn't a direct connection between the tables trips and weather_records in the database. But you can still use JOIN and link them using the time the ride started (trips.start_ts) and the time the weather record was taken (weather_records.ts).
Instructions on Completing the Project
https://practicum-content.s3.us-west-1.amazonaws.com/data-analyst-eng/moved_chicago_weather_2017.html (see 'parsing_data.py')
Step 2: Exploratory Data Analysis
(see 'databases.sql')
Step 3: Test the hypothesis that the duration of rides from the the Loop to O'Hare International Airport changes on rainy Saturdays.
1) Retrieve the identifiers of the O'Hare and Loop neighborhoods from the neighborhoods table.
Ignore rides for which data on weather conditions is not available.
Step 4: Exploratory Data Analysis (Python)
In addition to the data you retrieved in the previous tasks, you've been given a second file. You now have these two CSVs:
For these two datasets you now need to: - import the files - study the data they contain - make sure the data types are correct - identify the top 10 neighborhoods in terms of drop-offs - make graphs: taxi companies and number of rides, top 10 neighborhoods by number of dropoffs - draw conclusions based on each graph and explain the results
Step 5: Testing Hypotheses (Python)
project_sql_result_07.csv: The result of the last query. It contains data on rides from the Loop to O'Hare International Airport.
Test the hypothesis: "The average duration of rides from the Loop to O'Hare International Airport changes on rainy Saturdays."
Set the significance level (alpha) value on your own.
Explain: - how you formed the null and alternative hypotheses - what criterion you used to test the hypotheses and why

More products