creating the dimensional model
DESCRIPTION
Creating the Dimensional Model. Data Warehouse Database Design Objectives. Easy to understand Optimum performance Adaptable. Data Warehouse Data Type. Description. Fact (detail, atomic, raw) base data. Business measures. Dimension data. Query drivers. Derived fact data. - PowerPoint PPT PresentationTRANSCRIPT
4Creating the Dimensional Model
Data Warehouse Database Design Objectives
• Easy to understand
• Optimum performance
• Adaptable
Data Warehouse Data Types
Data Warehouse Data Type DescriptionFact (detail, atomic, raw) base data Business measuresDimension data Query driversDerived fact data Calculated dataSummary (aggregate) data Pre-calculated dataMetadata Warehouse map
Designing the Dimensional Model
Phase III: Defining the dimensional model
Star Dimensional Modeling
History(Dimension
table)
Customer(Dimension
table)
Product(Dimension
table)
Channel(Dimension
table)
Item_nbrItem_descQuantityDiscnt_priceUnit_priceOrder_amount…
(Fact table)
Order
Advantages of Using a Star Dimensional Model
• Supports multidimensional analysis
• Design improves performance
• Optimizers yield better execution plans
• Parallels end-user perceptions
• Provides an extensible design
• Broadens the choices for data access tools
Fact Tables
• Contain numerical metrics of the business
• Can hold large volumes of data
• Can grow quickly
• Can contain base, derived, and summarized data
• Are typically additive
• Are joined to dimension tables through foreign keys that reference primary keys in the dimension tables
Factless Fact Tables
• Event tracking
• Coverage
More on Factless Fact Tables
Emp_FKSal_FKAge_FKEd_FKGrade_FK
Grade dimensionGrade_PK
Education dimensionEd_PK
Employee dimensionEmp_PK
Salary dimensionSal_PK
Age dimensionAge_PK
PK = Primary Key & FK = Foreign Key
Identify Base and Derived Measures
• Identify candidate facts
• Remove duplicate facts
• Discover and document the underlying calculations
• Cross reference base facts
• Obtain final derived fact approval
Base and Derived Data
Payroll table
Derived dataBase data
Emp_FK Month_FK Salary Comm Comp101 05 1,000 0 1,000102 05 1,500 100 1,600103 05 1,000 200 1,200104 05 1,500 1,000 2,500
Translating Business Measures into a Fact Table
Business measures
Facts
Business MeasuresNumber of ItemsAmountCostProfit
FactNumber of ItemsItem Amount
Item CostProfit
BaseBaseBaseDerived
Fact Table Measures
Nonadditive:Cannot be added
along any dimension
Semiadditive: Added along some
dimensions
Additive: Added across all
dimensions
Dimension Tables
• Contain descriptors of the business
• Contain relatively static data
• Contain textual and discrete data
• Are usually smaller than fact tables
• Are joined to a fact table through a foreign key reference
Translating Business Dimensions into Dimension Tables
Dimension TablesBusiness Dimensions
Customer
Product
Store
Day
Source System Information
Translating a Product Business Dimension into a
Product Dimension Table
Product_statusList_price
Business Dimension for Product
Product_Id (Natural Key)
Product_desc
Supplier_Id
Product_name
Category
Promotion_CodeWarehouse_location
Product_typeProduct_code
Catalog_Id
Business DimensionProductProductCategorySupplier
TypeWarehousePromotion
Catalog
Date Dimension
• Should contain the attributes required by all fact
tables.
• Can be categorized into 4 groups
• Date formats
• Calendar date attributes : day, month name,
year
• Fiscal attributes : fiscal week, fiscal period
• Indicator columns : Boolean values such as it
is a national holiday
Slowly Changing Dimensions
Where Product_key is a calculated number stored within the database
Business Dimension for Product
Product_Id (Natural Key)
Product_desc
Supplier_Id
Product_name
Category
Product_PK (surrogate key)
Product_typeProduct_code
Catalog_Id
Product_statusList_price
Valid_from_dateValid_to_date
Promotion_codeWarehouse_location
Business DimensionProductProductCategorySupplier
TypeWarehousePromotion
Catalog
Types of Database Keys
• Primary keys (PKs)
• Foreign keys (FKs)
• Composite keys
• Surrogate keys
Using Surrogate Keys
Advantages of surrogate keys include:
• Control over data
• Reduced fact table size
Avoid using the following as data warehouse keys:
• OLTP Production (natural) keys
• Smart keys (embedded meaning)
Product key: 38972
Surrogate key
Surrogate Keys Example
Emp_FK Salesperson_ID Salesperson_Name Manager_ID Emp_Change_Date
1 1 Smith 200 030199
2 2 Jones 300 050599
3 3 Harvey 300 060599
22 1 Smith 400 061001
Surrogate Keys
Prod_FK Prod_ID Prod_Name Prod_Grouping Brand_Code Prod_Change_Date
1 073258 Coffee Hot YUBN 032200
2 073258 Coffee Hot MAXH 110100
3 011172 Pop Cold SCHW 061001
4 011173 Tea Hot RRSE 061001
Adding a Surrogate Key
Channel, product, promotion, and time surrogate keys are added.
These can be used to build aggregate tables.
Business Dimensionfor Sales
Sales_channel_FK(surrogate key)
Sales_promotion_FK (surrogate key)
Units
Sales_product_FK (surrogate_key)
Cost
Discount
Margin
Sales_amt
Sales_time
Bracketed Dimensions
• Enhance performance and analytical capabilities
• Create groups of values for attributes with many unique values, such as income ranges and age brackets
• Minimize the need for full table scans by pre-aggregating data
Bracketing Dimensions
Customer_PKBracket_FK
Bracket_PK
Customer_PKBracket_FK
Bracket dimension
Customer dimension
Income fact
Bracket_PK Income (10Ks) Marital Status Gender Age
1 60-90 Single Male <21
2 60-90 Single Male 21-35
3 60-90 Single Male 35-55
4 60-90 Single Male >55
5 60-90 Single Female <21
6 60-90 Single Female 21-35
Models for Hierarchical Data
• Analytical activities using hierarchies are supported through different models:
– Business
– Multiple
– Multiple time
• Hierarchical data is stored in dimension tables.
• Dimensions can contain one or more hierarchies.
Identifying Analytical Hierarchies
Store dimension
Store IDStore DescLocationSizeTypeDistrict IDDistrict DescRegion IDRegion Desc
Business hierarchies describe organizational structure and logical parent-child relationships within the data.
Region
District
Store
Organization hierarchy
Multiple Hierarchies
Store IDStore DescLocationSizeTypeDistrict ID District DescRegion IDRegion DescCity IDCity DescCounty IDCounty DescState IDState Desc
Region
City
Store
Organization hierarchy
Store dimension
State
District
Store
Geography hierarchy
County
Multiple Time Hierarchies
Fiscal year
Fiscal quarter
Fiscal month
Fiscal time hierarchy
Fiscal week
Calendar year
Calendar quarter
Calendar month
Calendar time hierarchy
Calendar week
Store 5Store 1 Store 2
Region 2
District 2 District 4
Drilling Up and Drilling Down
Store 4
Group
Market Hierarchy
Region 1
District 1
Store 6Store 3
District 3
Region
District
Drilling Across
Stores > 20,000 sq. ft.
Group
Market hierarchy
Region
District
Store Store City
City
City hierarchy
Documenting the Granularity of Dimensions
• Is an important design consideration
• Determines the level of detail
• Is determined by business needs
Low-level grain (Transaction-level data)
High-level grain (Summary data)
Defining Time Granularity
Fiscal Time Hierarchy
Current dimension grain
Fiscal Year
Fiscal Quarter
Fiscal Month
Fiscal Week
Day Future dimension grain
History_PK
. . . .
A Star Dimensional Model
Denormalized Model
Customer
HistoryOrder
History_FKCustomer_FKProduct_FKChannel_FK
Item_nbrItem_descQuantityDiscnt_priceUnit-priceOrder_amt…
Product
Channel
Channel_PK
. . . .
Customer_PK
. . . .
Product_PK
. . . .
Star Dimensional Model Characteristics
• The model is easy for users to understand.
• Primary keys represent a dimension.
• Nonforeign key columns are values.
• Facts are usually highly normalized.
• Dimensions are completely denormalized.
• Fast response to queries is provided.
• Performance is improved by reducing table joins.
• End users can express complex queries.
• Support is provided by many front-end tools.
Snowflake Model
Web
History_PK
. . . .
Customer
HistoryOrder
History_FKCustomer_FKProduct_FKChannel_FK
Item_nbrItem_descQuantityDiscnt_priceUnit-priceOrder_amt…
Product
Channel
Channel_PK
Web_PKChannel_desc
Customer_PK
. . . .
Product_PK
. . . .
Web_PK
Web_url
Constellation Configuration
Atomic fact
Updating the Meta Data
• Dimensions and attributes detail (Primary key, attribute definition, and so on)
• Facts and measures detail (Measure description, additivity, and so on)
• Data source definitions (Business owner, platform, description, and so on)
• Source to target data mappings (Data type, length, target column description, and so on)