starring sakila mysql university 2009
TRANSCRIPT
![Page 1: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/1.jpg)
Starring Sakila
A Data Warehousing Primer
Roland Bouman (Strukton Rail)http://rpbouman.blogspot.com/
![Page 2: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/2.jpg)
Starring Sakila
Topics
![Page 3: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/3.jpg)
Data Warehousing Terminology
● Terminology– Business Intelligence– Data Warehouse– Dimensional Model– Star Schema– OLAP– Cube
![Page 4: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/4.jpg)
What is Business Intelligence?
● Business Intelligence (BI)– Skills, technologies, applications and
practices to acquire a better understanding of the commercial context of your business.
● Data Warehouse● Dimensional Model● Star Schema● OLAP● Cube
![Page 5: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/5.jpg)
What is a Data Warehouse?
● Business Intelligence● Data Warehouse
– A database designed to support Business Intelligence
● Dimensional Model● Star Schema● OLAP● Cube
![Page 6: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/6.jpg)
What is the Dimensional Model?
● Business Intelligence● Data Warehouse● Dimensional Model
– A logical data model that divides data in two kinds: Facts and Dimensions
● Star Schema● OLAP● Cube
![Page 7: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/7.jpg)
What is a Star Schema?
● Business Intelligence● Data Warehouse● Dimensional Model● Star Schema
– Physical implementation of the Dimensional Model on a RDBMS which maps a dimension to a single table
● OLAP● Cube
![Page 8: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/8.jpg)
What is OLAP?
● Business Intelligence● Data Warehouse● Dimensional Model● Star Schema● OLAP
– On-Line Analytical Processing: querying muli-dimensional data, cornerstone of most BI applications
● Cube
![Page 9: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/9.jpg)
What is a Cube
● Business Intelligence● Data Warehouse● Dimensional Model● Star Schema● OLAP● Cube
– Multi-dimensional data structure suitable for OLAP queries
![Page 10: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/10.jpg)
Understanding your Business
Business Intelligence
![Page 11: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/11.jpg)
Business Intelligence
● Front end Applications:– Reports– Charts and Graphs– OLAP Pivot tables– Data Mining– Dashboards
● Back end, Infrastructure– ETL
● Extract● Transformation● Load
– Data Warehouse– Data Mart– Metadata– ROLAP Cube
![Page 12: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/12.jpg)
High Level BI Architecture
Source Systems,External Data
Staging Area
Data Warehouse
Meta Data
Business IntelligenceApplications
Extract Transform Load Present
Back-end Front-end
![Page 13: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/13.jpg)
Business Intelligence Database
DataWarehouse
![Page 14: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/14.jpg)
Data Warehouse
● Ultimately, it's just a Relational Database– Tables, Columns, Keys...
● ...But designed for BI applications– Ease of use– Performance
● Data from various source systems– Integration, Standardization, Data cleaning– Add and maintain history
![Page 15: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/15.jpg)
OLTP vs OLAP: Application Characterization
● OLTP– Operational– 'Always' on– All kinds of users– Many users– Directly supports
business process– Keep a Record of
Current status
● OLAP– Tactical, Strategic– Periodically Available– Managers, Directors– Few(er) users– Decision support,
long-term planning– Maintain history
![Page 16: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/16.jpg)
OLTP vs OLAP: data processing
● OLTP– Subject Oriented– Add, Modify, Remove
single rows– Human data entry– Queries for small sets
of rows with all their details
– Standard queries
● OLAP– Aspect Oriented– Bulk load, rarely
modify, never remove– Automated ETL jobs– Scan large sets to
return aggregates over arbitrary groups
– Ad-hoc queries
![Page 17: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/17.jpg)
OLTP vs OLAP: database schema organization
● OLTP– Entity-Relationship
model– Entities, Attributes,
Relationships– Foreign key
constraints– Indexes to increase
performance– Normalized to 3NF or
BCNF
● OLAP– Dimensional
model– Facts, Dimensions,
Hierarchies– Ref. integrity ensured
in loading process– Scans on Fact table
obliterates indexes– Denormalized
Dimensions (<= 1NF)
![Page 18: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/18.jpg)
Organizing data to suit Business Intelligence
DimensionalModel
![Page 19: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/19.jpg)
The Dimensional Model
● Two kinds of data– Facts– Dimensions
![Page 20: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/20.jpg)
The Dimensional Model:Facts
● Facts– Measures/Metrics of a Business Process– Typical Metrics
● Cost, Units Sold, Profit
![Page 21: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/21.jpg)
The Dimensional Model: Dimensions
● Dimensions– Describe aspects of Business Process– Dimensions typically not inter-dependent– Who? What? Where? When? Why?– Typical Dimensions:
● Customer (who?), Product (what?), Date/Time (when?)
![Page 22: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/22.jpg)
The Dimensional Model: Navigating Facts with Dimensions
● Dimension Attributes organized in Hierarchies– Date dimension examples:
● Year, Quarter, Month, Day● Year, Week, Day
● Metrics typically numeric and additive● Navigate fact data
– Choose particular values for dimension– Aggregate fact data at chosen level of
hierarchy
![Page 23: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/23.jpg)
Dimensional Example: Crosstab
Date Dimension 2008 Q4
Location Dimension All Months October November December
All locations $ 3850 $ 1000 $ 1350 $ 1500
America All America $ 2050 $500 $ 750 $ 800
North $ 1275 $ 300 $ 500 $ 475
South $ 775 $ 200 $ 250 $ 325
Europe All Europe $ 1800 $ 500 $ 600 $ 700
East $ 800 $ 250 $ 250 $ 300
West $ 1000 $ 250 $ 350 $ 400
![Page 24: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/24.jpg)
Dimensional Model Implementation
Star Schema
![Page 25: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/25.jpg)
Stars SchemaCharacteristics
● Central Fact Table– Columns for storing Metrics– 'Foreign Key' columns point to Dimension– Typically normalized and not pre-aggregated
● Dimension maps to a Dimension table– Surrogate key– Descriptive attributes organized in hierarchies– No Foreign Keys to other tables– Typically heavily denormalized
![Page 26: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/26.jpg)
Rentals
Star Schema example: Sakila Rentals
StoreDate
TimeFilm
CustomerStaff
![Page 27: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/27.jpg)
Stars Schema Characteristics
● Star schema is 'just' an implementation– Optimized for simplicity– Optimized for performance (?)– Heavily denormalized dimensions
● Snowflake: Star Schema Alternative– Still a dimensional model– Still a central fact table– Normalized dimensions– Easier maintenance of dimensions
![Page 28: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/28.jpg)
Snow Flake example: Sakila Rentals
StoreDate
Minute
Film
Customer
Staff
Month
Hour
Quarter City
Country
City
Country
Language
Rating
Year
Week Rentals
![Page 29: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/29.jpg)
Starring Sakila
DesingingStar Schemas
![Page 30: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/30.jpg)
Dimensional Design
● Select Business Process– Sales, Purchase, Storage, Transport, ...
● Define Facts and Key Metrics– Facts: Key Event in Business Process– Metrics (Fact Attributes): Count or Amount
● Choose Dimensions and Hierarchies– What? When? Where?– Who? Why?
![Page 31: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/31.jpg)
Dimensional Model example
● MySQL Sample Database– http://dev.mysql.com/doc/sakila/en/sakila.html
● DVD rental business– Overly simplified database schema
● Typical OLTP database
![Page 32: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/32.jpg)
Dimensional Model example
● Rental Business Process– Customer visits store, picks DVD– DVD taken out of store inventory by staff member– Customer returns home and enjoys DVD– Customer returns to store with DVD– DVD returned to staff member– Staff member collects payment made by customer
![Page 33: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/33.jpg)
![Page 34: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/34.jpg)
3NF Source schema: Sakila Rentals
Rental Customer
Film
Store Address
Category Actor
StaffInventory
City
CountryLanguage
![Page 35: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/35.jpg)
Example Business Process:Rentals
● Select Business Process– Rentals
● Identify Facts– Count (number of rentals)– Rental Duration
● Choose Dimensions– What: Films– When: Rental, Return
– Who: Customer, Staff– Where: Store
![Page 36: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/36.jpg)
Target Star Schema
Fact: Rentals
Store
Date
Time
Film
When?
Where?
What?
CustomerStaff
Who?
![Page 37: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/37.jpg)
Rental Star Schema
![Page 38: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/38.jpg)
A star is born: Rentals 3NF
Rental
CustomerStaffInventory
![Page 39: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/39.jpg)
A star is born: Rentals 3NF
Rental
CustomerStaffInventory
StoreFilm
Category
Film Category
![Page 40: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/40.jpg)
A star is born: Denormalize
Rental
CustomerStaffInventory
StoreFilm
Category
Film Category
![Page 41: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/41.jpg)
A star is born: Denormalize
Rental
CustomerStaff
StoreFilm
StoreCategory
![Page 42: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/42.jpg)
A star is born
Rental
CustomerStaff
StoreFilm
Store
Address
Category
![Page 43: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/43.jpg)
A star is born: Denormalize
Rental
CustomerStaff
StoreFilm
Store
Address
Category
![Page 44: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/44.jpg)
A star is born: Denormalize
Rental
CustomerStaff
StoreFilm
Store
AddressAddress
Language
Category
![Page 45: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/45.jpg)
A star is born: Denormalize
Rental
CustomerStaff
StoreFilm
Store
AddressAddress
LanguageCityCity
Category
![Page 46: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/46.jpg)
A star is born: Rental Snowflake
Rental
CustomerStaff
StoreFilm
Store
AddressAddress
LanguageCityCity
CountryCountry
Category
![Page 47: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/47.jpg)
A star is born: Rental Star Schema
Rental
StoreLanguageFilm
CountryCity
What: Film Who: CustomerWhere: Store Who: Staff
AddressStore
StaffCountry
CityAddress
CustomerCategory
![Page 48: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/48.jpg)
Dimensional Design
● Something is missing....– Who ? (Customer, Staff)– What ? (Film)– Where ? (Store)– .... ?
![Page 49: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/49.jpg)
A star is born:Rental Date and Time
Rental
What: Film Who: CustomerWhere: Store Who: Staff
When: Date When: Time
![Page 50: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/50.jpg)
Role Playing: Date/Timefor both Rentals and Returns
Rental
What: Film Who: CustomerWhere: Store Who: Staff
When:Rental Date
When:Rental Time
When:Return Date
When:Return Time
![Page 51: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/51.jpg)
Denormalization through Joins
![Page 52: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/52.jpg)
Denormalization through Flattening (Repeating Group)
![Page 53: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/53.jpg)
ETL withPentaho Data Integration
Loading aData Warehouse
![Page 54: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/54.jpg)
Dimensional Design
● Pentaho Data Integration– sourceforge.net/projects/pentaho/
● ETL and much more● Transformations:
– Extract, Load and Transform● Jobs:
– Organize multiple transformations to a complete ETL process
● > 30 RDBMS-es, > 130 Transformation Steps
![Page 55: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/55.jpg)
Job: Rental ETL Process
● First load dimensions, finally load fact● Mail notification in case of success / failure
![Page 56: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/56.jpg)
Job: Rental ETL Process
● First load dimensions, finally load fact● Mail notification in case of success / failure
![Page 57: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/57.jpg)
Job: Rental ETL Process
● Get store, lookup address (subtransformation) and manager
● Load store dimension table
![Page 58: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/58.jpg)
Job: Rental ETL Process
● Get store, lookup address (subtransformation) and manager
● Load store dimension table
![Page 59: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/59.jpg)
Job: Rental ETL Process
● Get address, lookup city and country● Concatenate address if necessary
![Page 60: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/60.jpg)
Job: Rental ETL Process
● This was just a simple example● More complex example: importing XML<?xml version="1.0" encoding="UTF-8"?><result> <actors> <actor id="00000015">Anderson, Jeff</actor> <actor id="00000015">Anderson, Jeff</actor> .. </actors> <videos> <video> <title>The Fugitive</title> <genre>action</genre> .... </video> ... </videos></result>
![Page 61: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/61.jpg)
Job: Rental ETL Process
● This was just a simple example● More complex example: importing XML
![Page 62: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/62.jpg)
OLAP Pivot Table withPentaho Analysis Services
OLAP
![Page 63: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/63.jpg)
Dimensional Design
● Pentaho Analysis Services– Part of Pentaho BI Server– sourceforge.net/projects/pentaho/– Based on Mondrian ROLAP server– sourceforge.net/projects/mondrian/
![Page 64: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/64.jpg)
Dimensional Design
● Pentaho Schema Workbench– Map data warehouse tables to a logical Cube
![Page 65: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/65.jpg)
Dimensional Design
● Pentaho Analysis View:
![Page 66: Starring Sakila MySQL University 2009](https://reader034.vdocuments.us/reader034/viewer/2022050719/543a29aeafaf9fbe2e8b58bb/html5/thumbnails/66.jpg)
Upcoming Book: Pentaho Solutions
● Pentaho Solutions– Wiley– ISBN 978-0-470-48432-6– September 2009– 630+ page paperback– Amazon pre-order $31.50– Regular: $50.00