Download - Data Warehouse and OLAP - Lear-Fabini
Data Warehouses,
OLAP, and the
CUBE operator
Jake Lear, Scott Fabini
CS586 - Winter 2015
OLTP/OLAP ComparisonOLAP: On-Line
Analytic Processing
OLTP: On-Line
Transaction Processing
OLTP Query for A Single
Sales Order in an SAP DB
OLAP Query for revenue
by Geography / Product-
Line in an IBM Cognos DB
OLTP/OLAP Comparison
OLTP
• Simple, frequent queries
• Up-to-the-minute information
• Fast response time needed
OLAP
• Few queries, but very complex
• Lots of indexes
• May run for hours
• Doesn’t need up-to-the-minute info
OLTP QueriesOLAP Query
Animation
OLAP Queries on a DB
OLAP Queries
• We have a long-running transaction
analyzing all sales data.
• What about the new sales data
coming into the database while our
transaction is running?
• Talk to your neighbor. Think in
terms of serializable transactions
and locking mechanisms…
• Recall, Reads get an S-Lock.
Writers forbidden from
getting their X-Lock
Stalled Transactions!
Animation
OLAP
Data Warehouse
OLTP
Database
Copy
Over
night
Support fast transactionsDedicated for Analytics of
Aggregate Data
Animation
The Data Warehouse Congregates
Multiple Data Sources
CRM
SCP
ERP
Data Warehouse
Supply Chain
(Vendor) Data
Factory
Planning Data
Sales/
Marketing Data
OLAP Queries
What is our worst-selling vehicle?
Do we need another factory to meet demand?
Should we focus sales on Europe or Japan?
Summarizing the
Motivation for OLAP• We want to add lots of Indexes for faster search, and perform
long-running transactions, but this slows down the DB
• We only need to look at the long term trends, not up-to-the-
second data
• We may want to aggregate data from multiple DBs
• Answer: Move the data to a Data Warehouse for targeted
analysis. Use OLAP techniques which we learn today to slice
& dice the massive aggregate into discrete actionable data.
Auto Sales Example
• Say we want to record in a data warehouse all auto sales
• We would create a Table with key info for the sale
• Serial Number, Dealer Name, Date of Sale, Price of Sale
• The relation becomes our Fact Table:
• sales(auto, dealer, date, price)
Visualizing Multidimensional
Data
• Fact Table: a central
relation or collection
of data
• E.g. all information
about a sale of an
automobile
• Dimensions:
Represent the
properties of the sale Dealer
Days
Auto
Dealer
Auto
Dicing the Data
• The Fact Table is ‘diced’
(divided) by its dimensions
- the Dimension Tables
Date
Boring Barney’s
Witty Wilma’s
Friendly Fred’s
Go
bi
Aa
rdva
rk
200120022003200420052006
Price
Dependent
AttributesDimension attributes
AutoDealerDate
ColorModelSerial#
Phone#CityNameYearMonthDay
Dealer Dimension
Auto Dimension
Days Dimension
Price
Sales Fact Table
Implementation: Star Schema
Date
Dimension
allows us to
aggregate by
year, month,
or week
Dealer
Dimension
allows us to
aggregate by
dealer
Auto
Dimension
allows us to
aggregate by
Model or Color
Animation
Dependent
Attributes help
group Aggregate
data, (like sum of
prices of autos
sold)
-
Working within a Star Schema
• A Star Schema example can be found in my personal
‘sfabini’ CAT database
• Connect by issuing the following command:
Please do not alter the database
ColorModelSerial#
Phone#CityName
YearMonthDay
Dealer Dimension
Auto Dimension
Days Dimension
Price
Implementation: Star Schema
Dimensions
-
Price
Dependent
AttributesDimension attributes
Auto DealerDate
ColorModelSerial# Phone#CityName
YearMonthDay
Dealer DimensionAuto Dimension
Date Dimension
Price
Sales Fact Table
Implementation: Star Schema
Fact Table
-
Dealer
Auto
Dicing the Data
• Recapping, does everyone see
the ‘dimensions’ and how
they divide the Fact Table?
• Date divided along x axis
• Different dealers on y axis
• Different autos on z axis
• Within the cube (fact table) are
the dots -- individual auto sales,
including how much it sold for (price)
Date
Boring Barney’s
Witty Wilma’s
Friendly Fred’s
Go
bi
Aa
rdva
rk
200120022003200420052006
Dealer
Auto
Exercise: Slicing the Data
• Find Average Sales Price
by State
SELECT state, AVG(price)
FROM sales, dealers
WHERE sales.dealer =
dealer.name
GROUP BY state;
Date
Boring Barney’sWitty Wilma’s
ArizonaN
ew M
exico
Friendly Fred’s
Try it yourself!
Which state’s dealers maximize ASP
(average selling price)?
Dealer
Auto
Exercise: Drilling Down
• Find Total Sales Price for Gobis at
Friendly Fred’s
SELECT color, SUM(price)
FROM sales,
NATURAL JOIN autos
JOIN dealer ON name = sales.dealer
WHERE sales.dealer = ‘Friendly Fred’
AND
Model = ‘Gobi’
GROUP BY color;
Date
Boring Barney’sWitty Wilma’s
ArizonaN
ew M
exico
Friendly Fred’s
Try it yourself!
Joining tables in your query, and then
Grouping ‘drills down’ the table
Rolling Up and Drilling Down
• Drill-Down: Focusing on specific values in certain
dimensions
• looking at the sales of red Gobis at Friendly Fred’s
• Roll-Up: Aggregate along one or more dimensions
• Looking at all colors as groups to find the best selling color overall
Foundations of the CUBE
Operator
Dealer
Date
AutoSum of price of
all autos, by a
specific dealer,
on all dates
Sum of price of a
specific auto, by a
specific dealer, on
all datesPrice of a specific
auto, by a specific
dealer, on a
specific date
The Cube
operator adds a
border of
aggregations to
the data cube
across all
combinations of
dimensions.
Sum of price of all
autos, by all
dealers, on all
dates
CUBE vs. ROLLUP Operators
• WITH CUBE: If we add the term WITH CUBE to a group-by
clause, then we get not only the tuples for the group , but
also the tuples that represent aggregation along one or
more dimensions along which we have grouped
• WITH ROLLUP: A variant of the CUBE operator that
produces the additional aggregated tuples over the tail of
the sequence of grouping attributes
Working with a Cube and
phpMyAdmin
• We need to use MySQL to demonstrate Cube/Rollup
• A Cube example can be found in my personal ‘sfabini’ CAT
database
• Connect via website: cat.pdx.edu/phpMyAdmin/
Working with a Cube
• We need to use MySQL to demonstrate Cube/Rollup
• A Cube example can be found in my personal ‘sfabini’ CAT
database
• Connect by issuing the following command:
ROLLUP Query in MySQL
• SELECT model, color, DATE, dealer, SUM( val ) ,
SUM( cnt )
FROM SalesCubeGROUP BY model, color, DATE, dealer
WITH ROLLUP
Results: Auto Sales WITH ROLLUP
in MySQL
Aggregate of blue Gobis sold
Aggregate of red Gobis sold
Aggregate of all Gobis sold
Aggregate of all cars sold
Aggregate of red Gobis sold by any dealer on this date
CUBE Query in SQL99
• SELECT model, color, DATE, dealer, SUM( val ) ,
SUM( cnt )
FROM SalesCubeGROUP BY model, color, DATE, dealer
WITH CUBE
• CUBE Query Result (Partial):
• Rollup Query Result:
Comparison: WITH ROLLUP vs.
WITH CUBE results
All colors of Gobis sold at Friendly Fred’s on 5/21
Aggregate of all Gobis sold
Aggregate of all autos sold
Aggregate of all Gobi sales by any dealer on 5/21
… all such combinations are listed WITH CUBE
Red Gobis sold at Friendly Fred’s on 5/21
Red Gobis sold at any dealer on 5/21
Red Gobis sold at any dealer on any dateAny Gobi sold at any dealer on any date
Aggregate of all autos sold
Homework
• 1a) Use the ‘sfabini’ mysql db to answer Exercise 10.7.2 (a)
and (b) from the book
• Notes:
• Execute your query FROM the SalesCube table
• Use the WITH ROLLUP operator, instead of WITH CUBE
• Smilin Sally is spelled without the apostrophe in the DB
• See Example 10.33 for further hints on using WITH ROLLUP