Create a connection to the Adventureworks Data Warehouse (“aw”) on a shared server
• Examine the Adventureworks Data Warehouse to become familiar with the structure of a
realistic star schema data warehouse
• Using a SQL query editor (such as MySQL Workbench)
1. Study the database and answer questions about its structure
2. Create and run queries against the warehouse to answer questions about the
content of the warehouse
3. Create and run queries against the warehouse to analyze Adventureworks’
business.
Assignment Problems:
NOTE: This is a Linux server and Linux commands are typically case sensitive. If
you attempt a query and it looks right but doesn’t work, be sure that the case
used in naming database objects in your SQL exactly matches object names
on the server.
NOTE: Be sure to read “Hints regarding the data” on page 4 of this document prior
to attempting to answer these questions.
1. How many rows are there in each of the dimension tables in this star schema? List the
dimension table name and its row count.
2. Use the information_schema to find out how many rows there are in the
internet_sales_fact table.
Hints:
• use information_schema;
• There is a table within information_schema called TABLES.
• The TABLES table contains columns TABLE_NAME and TABLE_SCHEMA.
• The row count for each table is in TABLES.TABLE_ROWS.
3. What standard table naming convention did the AdventureWorksDW database designers use
to differentiate dimension tables from fact tables in this star schema data warehouse?
4. What do you think is the purpose of the recursive relationship on DimEmployee?
5. What are the three model types (EnglishProductSubCategoryName) of bikes sold
by AdventureWorks?
6. List and compare the total sales dollar amounts of the three model types of bikes sold by
AdventureWorks for the years 2001 – 2004. In what year were the sales of RoadBikes the
highest?
7. List and compare the total sales quantities of bikes sold (all model types) by customer gender
by year and month. In which year and month were bike sales to females the highest?
HINT: you can format the output of a numeric column like this:
format(sum(UnitPrice),2)
8. For the year 2003, which model of bike (DimProduct.ModelName) yielded the highest
margin for AdventureWorks?
Appendix A - Background on the Company:
The AdventureWorks data warehouse is based on a fictitious bicycle manufacturing
company named Adventure Works Cycles. Microsoft created this company and its
databases (an OLTP database and a Star Schema Data warehouse) to assist people in
learning about database technologies.
Adventure Works produces and distributes metal and composite bicycles to North American,
European, and Asian commercial and consumer markets. The base of operations is located
in Bothell, Washington with about 500 employees, and several regional sales teams are
located throughout their market base.
Adventure Works sells products wholesale to specialty shops and to individuals through the
Internet. For the data warehouse exercises, you will work with the AdventureWorksDW
Internet sales tables, which contain realistic patterns that work well for data warehousing
exercises.
The Internet Sales star schema contains a fact table with data regarding customer
purchases of bicycles via the web.
The Internet Sales star schema contains information on several thousand customers. These
customers live in several countries, which are combined into three regions:
• North America (83%)
• Europe (12%)
• Australia (7%)
The database contains sales data covering several fiscal years.
The products in the database are broken down by subcategory, model, and product.
AdventureWorks does business in multiple countries, so some attributes in the data
warehouse contain descriptions in multiple languages.
In 2000, Adventure Works Cycles bought a small manufacturing plant, Importadores
Neptuno, located in Mexico. Importadores Neptuno manufactures several critical
subcomponents for the Adventure Works Cycles product line. These subcomponents are
shipped to the Bothell location for final product assembly. In 2001, Importadores Neptuno,
became the sole manufacturer and distributor of the touring bicycle product group.
Coming off a successful fiscal year, Adventure Works Cycles is looking to broaden its market
share by targeting their sales to their best customers, extending their product availability
through an external Web site, and reducing their cost of sales through lower production
costs.
As a bicycle manufacturing company, Adventure Works Cycles has the following four
product lines:
• Bicycles that are manufactured at the Adventure Works Cycles company.
• Bicycle components that are replacement parts, such as wheels, pedals, or brake
assemblies.
• Bicycle apparel that is purchased from vendors for resale to Adventure Works Cycles
customers.
• Bicycle accessories that are purchased from vendors for resale to Adventure Works
Cycles customers.
At Adventure Works Cycles, the purchasing department buys raw materials and parts used
in the manufacture of Adventure Works Cycles bicycles. Adventure Works Cycles also
purchases products for resale, such as bicycle apparel and bicycle add-ons like water bottles
and pumps.
The AdventureWorks Data Warehouse is updated by an ETL process that periodically pulls
data from the OLTP database and loads it into the data warehouse.
Hints regarding the data:
Granularity: A row in the FactInternetSales table represents the sale of one item by
AdventureWorks. The item sold might be a bike, an article of clothing, a biking accessory
(like a helmet), or a repair/replacement part for a bike. For this exercise, we will only look
at bike sales.
For analysis of sales in FactInternetSales, use the column UnitPrice to reflect the dollar
amount of sales. Use the column OrderQuantity to reflect the number of items sold. Use
the OrderDateKey to determine the date of the sale.
The column ProductStandardCost holds the dollar amount that it cost AdventureWorks to
build or obtain a bike. UnitPrice minus ProductStandardCost represents how much profit
or margin AdventureWorks made on the sale.
DimProductSubcategory.EnglishProductSubcategoryName contains a description that
identifies which type of item was sold. This columns holds a grouping of model types.
DimProduct.ModelName and DimProduct.EnglishProductName further identify more detailed
product/model information regarding the item sold. DimProduct.ModelName holds the
specific model of bike sold.
Appendix B – Creating your connection to the aw data warehouse
Steps to Connect the aw MySQL instance to your query editor.
To complete this assignment, you must connect your MySQL Workbench (or
query editor of your choice) to an instance of MySQL running on a VM within the
CU Boulder Computer Science department. The server name is elra-04. You will
connect to the server using the SSH protocol. Then while connected to that server,
you will establish a connection to the instance of MySQL running on that server.
Prerequisites:
SSH Hostname: elra-04.cs.colorado.edu
SSH Username: your CU Identikey username (ex. niri0478)
SSH Password: your CU Identikey password
MySQL Username: your CU Identikey username (ex. niri0478)
MySQL Password: “password”
Step 1:
This is the first screen you will see when you open MySQL Workbench. Navigate to the upper
left corner and click on “+” icon to create a new connection.
Step 2:
The setup new connection dialog window will pop up. Navigate to the dropdown menu titled
“Connection Method” and select Standard TCP/IP over SSH
CSCI 3287 Design and Analysis of Data Systems
Assignment #5 Data Warehouse Analysis
Page 7
Step 3:
Fill in the SSH Hostname as elra-04.cs.colorado.edu and enter your identikey as your SSH
Username.
Then click the button labeled “Store in Keychain” for SSH Password and type in your identikey
password when prompted.
Step 4
Leave the first two fields in the red box as is. Then enter your MySQL login credentials:
Enter your CU Identikey name in the username. Then, select “Store in Keychain” and when
prompted, type in the password “password”.
Step 5
You must give your new connection a name. Call it anything you like. For example, “aw”,
“HW5” or “AdventureworksDW” or something similar.
CSCI 3287 Design and Analysis of Data Systems
Assignment #5 Data Warehouse Analysis
Page 10
Step 6:
Test the connection by clicking “Test Connection”. If you have followed the steps correctly the
window shown above will appear.
Once it does, click “OK” on the popup window, then “OK” on the configuration window.
CSCI 3287 Design and Analysis of Data Systems
Assignment #5 Data Warehouse Analysis
Page 11
Step 6:
Your list of connections will now display the new one you just created.
Click on the new connection to launch a MySQL Workbench query editor session against your
connection to the “aw” instance of MySQL on elra-04.
CSCI 3287 Design and Analysis of Data Systems
Assignment #5 Data Warehouse Analysis
Page 12
Appendix C – AdventureWorks Data Warehouse Data Model