data warehouse modeling
DESCRIPTION
TRANSCRIPT
![Page 1: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/1.jpg)
Data Warehouse Modeling
Thijs Kupers
Vivek Jonnaganti
![Page 2: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/2.jpg)
Agenda• Introduction
• Data Warehousing Concepts
• OLAP
• Dimension Modeling
• Conceptual Modeling
• Indexing
• Conclusion
![Page 3: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/3.jpg)
Introduction
![Page 4: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/4.jpg)
The Evolution• 1960 - DSS processing using Fortron or COBOL
• 1970 - DBMS systems and the advent of DASD
• 1975 - OLTP systems facilitating faster access to data
• 1980 - PC/4GL technology and the advent of MIS
• 1985 - OLAP systems and separation of analytical processing from transactional processing
• 1994 - Architectured environments with integrated OLAP engines and tools
![Page 5: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/5.jpg)
What is a Data Warehouse?• A copy of transaction data specifically structured to Query
and Analysis (Ralph Kimball, 1996)
• A collection of integrated, subject oriented databases designed to support the DSS function where each unit of data is relevant at some moment of time (Bill Inmon, 1991)
• The data characteristics of a Data Warehouse are;• Subject-oriented
• Time-variant
• Non-volatile
• Integrated
![Page 6: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/6.jpg)
What is a Data Warehouse? (cont’d)
• A single, complete and consistent store of data obtained from a variety of different sources made available to end users, in what they can understand and use in a business context (Barry Devlin 1992)
• A process of transforming data into information and making it available to users in a timely enough manner to make a difference (Forrester Research 1996)
![Page 7: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/7.jpg)
Data Warehouse Goals/Characteristics• It must make an organization’s information easily accessible
(slicing and dicing)
• It must present the organization’s information consistently
• It must be adaptive and resilient to change
• It must be a secure bastion that protects our information assets
• It must serve as the foundation for improved decision making
• The business community must accept the DW, if it is to be deemed successful
![Page 8: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/8.jpg)
Data Warehouse Applications• Retail Industry
• Forecasting, Market research, Merchandising etc.
• Manufacturing and distribution• Sales history/trends, Market demand projects etc.
• Banks• Spot market trends, Marketing, Credit cards etc.
• Insurance Companies• Property and casualty fraud etc.
• Health Care Providers• Fraud detection, Patient matching etc.
![Page 9: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/9.jpg)
Data Warehouse Applications• Government Agencies
• Auditing tax records, information sharing across different agencies etc.
• Internet Companies• Analyzing shopping behavior, CRM etc.
• Telecommunications• Telemarketing, Product development etc.
• Sports• Analyzing strategies, Winning player combinations etc.
![Page 10: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/10.jpg)
Data Warehouse Sizes• Terabyte (10^12) - Walmart (24 TB)
• Petabyte (10^15) - Geographic Information Systems
• Exabyte (10^18) - National Medical Association
• Zettabyte (10^21) - Weather Images
• Zottabyte (10^24) - Intelligence Agency (Video)
![Page 11: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/11.jpg)
Data Warehousing Concepts
![Page 12: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/12.jpg)
Data Warehouse (OLAP) and OLTP
CharacteristicsOn-Line Transaction Processing (OLTP) Data Warehouse
Data Content Current values Historical data, summarized data, calculated data
Data Organization Application by application Subject areas across enterprise
Nature of Data Dynamic Static until refreshed, based on frequency
Data Manipulation Updated on a field-by-field basis
Accessed & manipulated usually no direct update
UsageHighly structured, repetitive processing (Clerical User)
Highly structured, analytical processing (Knowledge User)
Response TimeCritical (Sub-Second to several seconds) Several seconds to minutes
Updates vs. Reports
Real-time Updates, Batch Reporting
Batch Updates,Real-time Reporting
![Page 13: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/13.jpg)
Data Warehouse Architecture
Enterprise
Data
Warehouse
Enterprise
Data
Warehouse Data MartData Mart
Data MartData Mart
ExecutionSystems
• CRM• ERP• Legacy• e-Commerce
ExecutionSystems
• CRM• ERP• Legacy• e-Commerce
•Reporting Tools
•OLAP Tools
•Ad Hoc Query Tools
•Data Mining Tools
•Reporting Tools
•OLAP Tools
•Ad Hoc Query Tools
•Data Mining Tools
•External Data
• Purchased Market Data• Spreadsheets
•External Data
• Purchased Market Data• Spreadsheets
•Oracle•SQL Server•Teradata•DB2
•Custom Tools•HTML Reports•Cognos•Business Objects•MicroStrategy•Oracle Discoverer•Brio•Data Mining Tools•Portals
Data and Metadata Repository Layer
•Informatica PowerMart•Ab Initio•Data Stage•Oracle Warehouse Builder•Custom programs•SQL scripts
Extract, Transformation, and Load (ETL)
Layer
• Cleanse Data• Filter Records• Standardize Values• Decode Values• Apply Business Rules• Householding• Dedupe Records• Merge Records
Extract, Transformation, and Load (ETL)
Layer
• Cleanse Data• Filter Records• Standardize Values• Decode Values• Apply Business Rules• Householding• Dedupe Records• Merge Records
Presentation Layer
ETL LayerOperational
Source Systems
Technologies:
Metadata RepositoryMetadata Repository
ODSODS
•PeopleSoft•SAP•Siebel•Oracle Applications•Custom Systems
Data MartData Mart
![Page 14: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/14.jpg)
Data Warehouse Structure
DepartmentallyStructured
IndividuallyStructured
Data WarehouseData WarehouseOrganizationallyStructured
Data
Information
Highly Summarized
Lightly Summarized
Atomic/Detailed
![Page 15: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/15.jpg)
Data Warehouse Architecture DriversThe requirements that drive the DW architecture are;
• Granularity of data
• Data retention and timeliness
• Reporting capability
• Availability
• Scalability
![Page 16: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/16.jpg)
Data Mart Centric
Data Marts
Data Sources
Data Warehouse
![Page 17: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/17.jpg)
Data Mart Centric
If you end up creating multiple warehouses, integrating them is a problem
![Page 18: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/18.jpg)
Data Warehouse Centric
Data Marts
Data Sources
Data Warehouse
![Page 19: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/19.jpg)
OLAP
![Page 20: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/20.jpg)
OLAP: 3 Tier DSS
Data Warehouse
Database Layer
Store atomic data in industry standard Data Warehouse.
OLAP Engine
Application Logic Layer
Generate SQL execution plans in the OLAP engine to obtain OLAP functionality.
Decision Support Client
Presentation Layer
Obtain multi-dimensional reports from the DSS Client.
![Page 21: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/21.jpg)
OLAP Servers• Support multidimensional OLAP queries
• Characterized by how the underlying data is stored
• Multidimensional OLAP (MOLAP) Servers• Data stored in array based structures e.g. Hyperion
Essbase
• Relational OLAP (ROLAP) Servers• Data stored in relational tables e.g. Microstrategy, IBM
Informix
• Hybrid OLAP (HOLAP) Servers• Data distributed between relational and specialized
storage e.g. Cognos, Microsoft Analysis Services
![Page 22: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/22.jpg)
OLAP Operations• Rollup; summarize operations
• E.g. given sales data, summarize sales for last year by product category and region
• Drill down; get more details• E.g. given summarized sales as above, find breakup of
sales within each region
• Slice and dice; select and project• Sales of soft-drinks in Gothenburg over the last quarter
• Pivot; change the view of data
![Page 23: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/23.jpg)
Strengths of OLAP• It is a powerful visualization tool
• It provides fast, interactive response times
• It is good for analyzing time series
• It can be useful to find some clusters and outliners
• Many vendors offer OLAP tools
![Page 24: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/24.jpg)
Dimensional Modeling
![Page 25: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/25.jpg)
What is Dimensional Modeling?• Logical design technique that seeks to present the
data in a standard, intuitive framework that allows for high-performance access.
• Adheres to a discipline that uses the relational model with some important restrictions.
• Composed of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables.
![Page 26: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/26.jpg)
DM v/s ER Models
DM ERUsed to design database for Online Analytical Processing (OLAP)
Used to design database for Online Transaction Processing (OLTP)
Support ad hoc end-user queries Support defined queries
Intuitive & facilitates high-performance retrieval of data
Removes redundancy of data
De-normalized Normalized
![Page 27: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/27.jpg)
Fact Tables• Primary table in the DM
• Each row corresponds to a measurement
• Facts in the fact table are numeric and additive
• Narrow rows with a few columns
• Large number of rows (billions)
• Express many-to-many relationships between dimensions
![Page 28: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/28.jpg)
Dimension Tables• Define business in terms already familiar to users
• Implement the user interface to the DW
• Wide rows with lots of descriptive text
• Small tables (about a million rows)
• Joined to fact table by a foreign key
• Heavily indexed
• E.g. of typical dimensions• time periods, geographic region (markets, cities),
products, customers, salesperson, etc.
![Page 29: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/29.jpg)
Four Step Dimensional Design Process
• Step 1 - Select the business process to model• The first step in converting an ER diagram to a set of
DM diagrams is to separate the ER diagram into its discrete business processes and to model each one separately.
• Step 2 - Choose The Grain of the Business Process
• The grain is the fundamental atomic level of data to be represented in the fact table.
![Page 30: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/30.jpg)
Four Step Dimensional Design Process (cont’d)
• Step 3 - Designate the Fact Tables• The third step is to select those many-to-many
relationships in the ER model containing numeric and additive non-key facts and to designate them as fact tables.
• Step 4 - Choose the dimensions that will apply to each fact table record
• This involves de-normalizing all of the remaining tables into flat tables with single-part keys that connect directly to the fact tables.
![Page 31: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/31.jpg)
Classic Star Schema Model
![Page 32: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/32.jpg)
Snowflake Schema
![Page 33: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/33.jpg)
Fact Constellation Schema
![Page 34: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/34.jpg)
Slowly Changing Dimensions• Type 1: Overwrite the value
![Page 35: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/35.jpg)
Slowly Changing Dimensions (cont’d)• Type 2: Add a Dimension row
• Type 3: Add a Dimension column
![Page 36: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/36.jpg)
Conceptual Modeling
![Page 37: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/37.jpg)
Graph Theory
• Directed, acyclic, weakly connected graph
• Quasi-tree
![Page 38: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/38.jpg)
The Dimensional Fact Model
• Fact Schemes• Facts
• Measures
• Dimensions
• Hierarchies Dimension attributes Non-dimension attributes
![Page 39: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/39.jpg)
The Dimensional Fact Model
![Page 40: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/40.jpg)
Why Formalize?
![Page 41: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/41.jpg)
Why Formalize?
• Give meaning to the model
• Tool support• Transformation Algorithms
• CASE-Tool (Computer Aided Software Engineering)
![Page 42: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/42.jpg)
Fact Scheme SORNAMf ,,,,,
• M is a set of measures
• A is a set of dimension attributes
• N is a set of non-dimension attributes
• R is a set of ordered couples, having the form (ai, aj), indicating the ‘edges’ of the scheme
ji
j
i
aa
NAa
aAa
0
![Page 43: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/43.jpg)
Fact Scheme SORNAMf ,,,,,
• O is a set of optional relationships
• S is a set of aggregation statements, in the form (mj, di, Ω)
RO
,...,,,,, ORANDMAXCOUNTAVGSUM
fDimd
Mm
i
j
![Page 44: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/44.jpg)
Fact Scheme SORNAMf ,,,,,
• We call the set Dim(f) a dimension pattern. Each element in Dim(f) is a dimension
RaaAafDim ii ,0
![Page 45: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/45.jpg)
Fact Scheme SORNAMf ,,,,,
![Page 46: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/46.jpg)
Algorithm
From ER to Conceptual Design
1) Define Facts
2) For each facta) Build attribute tree
b) Prune & Graft
c) Define Dimensions
d) Define Measures
e) Define Hierarchies
![Page 47: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/47.jpg)
Sample Schema
![Page 48: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/48.jpg)
Define Facts• Entity F
• Relationship R between entities E1…En
• Transform R into an entity F
• Frequently updated archives are good candidates for defining facts• E.g. Sale
• Not: Store, City
• Each Fact becomes a root in a fact scheme
![Page 49: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/49.jpg)
Transform Relation
![Page 50: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/50.jpg)
Build Attribute Tree
• Each vertex corresponds to an attribute of the scheme
• Root corresponds to the identifier of F
![Page 51: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/51.jpg)
Build Attribute Tree
root=newVertex(identifier(F));
translate(F, root);
![Page 52: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/52.jpg)
Build Attribute Tree
translate(E,v) { for each attribute a E | a identifier(E) addChild(v, newVertex({a})); for each entity G connected to E by a relationship R | max(E,R) = 1 { for each attribute b R addChild(v, newVertex({b})); next=newVertex(identifier(G)); addChild(v, next); translate(G, next); }}
![Page 53: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/53.jpg)
Exampletranslate(E=SALE, v=sale)
addChild(v, qty);
addChild(v, unitPrice);
for G=PURCHASE TICKET
addChild(v, ticketNumber);
translate(PURCHASE TICKET, ticketNumber)
for G=PRODUCT
addChild(v, product);
translate(PRODUCT, product);
![Page 54: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/54.jpg)
Attribute Tree
![Page 55: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/55.jpg)
Attribute Tree
• Label the root with the name of the entity F instead of his identifier
• Optional relationships not in algorithm if min(E,R)=0
![Page 56: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/56.jpg)
From ER till Conceptual Design
a) Build attribute tree
b) Prune & Graft
c) Define Dimensions
d) Define Measures
e) Define Hierarchies
![Page 57: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/57.jpg)
Prune & Graft
• Prune or graft to eliminate unnecessary level of detail
• Pruning: Drop a subtree from the quasi-tree
• Grafting: Vertex contains uninteresting information but its descendants must be preserved
![Page 58: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/58.jpg)
Graftgraft(v) {
for each v’ | v’ is father of v
for each v’’ | v’’ is child of v
addChild(v’, v’’);
drop(v);
}
![Page 59: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/59.jpg)
Graft
• 1-to-1 relation is a good candidate
• When an optional vertex is grafted, all his children inherit the optional dash
![Page 60: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/60.jpg)
Prune & Graft
![Page 61: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/61.jpg)
Prune & Graft
![Page 62: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/62.jpg)
Dimensions
• Determines the granularity of fact instances
• Time is a key dimension• Snapshot
• Temporal
![Page 63: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/63.jpg)
Measures
• Numerical attributes of the attribute tree
• Glossary• How measure can be calculated from source
scheme
• e.g. qty sold, no. of customers
![Page 64: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/64.jpg)
Hierarchies
• Tree has already a kind of hierarchy• We can still prune/graft details
• Add new levels for aggregation• E.g. month-quarter-year
• Identify non-dimension attributes• E.g. address
![Page 65: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/65.jpg)
Aggregation
• Primary fact instances• Null assumption
• Zero assumption
• Roll-up
• Sum, Avg, Count, Min, Max, …
![Page 66: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/66.jpg)
Aggregation
• Graphical Notation• Sum
![Page 67: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/67.jpg)
Multi-Aggregation
![Page 68: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/68.jpg)
Multi-Aggregation
• Order matters• {week, product} {month, type}
• Time-Dimension: Min
• Product-Dimension: Sum
![Page 69: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/69.jpg)
Multi-Aggregation
![Page 70: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/70.jpg)
Multi-Aggregation
![Page 71: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/71.jpg)
typemonthtypeweekproductweek MINSUM ,,,
![Page 72: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/72.jpg)
typemonthtypeweekproductweek MINSUM ,,,
![Page 73: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/73.jpg)
typemonthtypeweekproductweek MINSUM ,,,
![Page 74: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/74.jpg)
typemonthproductmonthproductweek SUMMIN ,,,
![Page 75: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/75.jpg)
typemonthproductmonthproductweek SUMMIN ,,,
![Page 76: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/76.jpg)
typemonthproductmonthproductweek SUMMIN ,,,
![Page 77: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/77.jpg)
Indexing
![Page 78: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/78.jpg)
Cost Model
• Cost of answering a query is number of rows processed
• Subcubes• Powerset of the dimensions
![Page 79: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/79.jpg)
Cost Model
![Page 80: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/80.jpg)
Indexes
• B-tree indexes to speed up query processing
• E.g. for cube ps, we can construct the following indexes• Ips
• Isp
![Page 81: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/81.jpg)
Example
• Consider Q1:• Using subcube ps: 0,8M rows
• Using subcube psc: 6M rows
• What if we use index Isp on subcube ps?
• 80 rows
sp
s
ps
![Page 82: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/82.jpg)
Indexes
• Ideal situation• All subcubes
• All indexes
![Page 83: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/83.jpg)
Algorithms
• Balance space subcubes – indexes
• Greedy Algorithm• Given a set of queries
• Every step select index/subcube with the highest benefit
![Page 84: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/84.jpg)
?
![Page 85: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/85.jpg)
References• Text books
• Ralph Kimball, The Data Warehouse Toolkit, John Wiley and Sons, 1996
• W.H. Inmon, Building the Data Warehouse, Second Edition, John Wiley and Sons, 1996
• Barry Devlin, Data Warehouse from Architecture to Implementation, Addison Wesley Longman, Inc 1997
• Research Papers/Whitepapers• M. Golfarelli, D. Maio, S. Rizzi, The Dimensional Fact Model: a Conceptual
Model for Data Warehouses, International Journal of Cooperative Information, Vol.7 (issue 2/3), pages 215-247, 1998.
• H. Gupta, V. Harinarayan, A. Rajaraman, J.D. Ullman, Index Selection for OLAP, Proceedings of the Thirteenth international Conference on Data Engineering, April 07 - 11, pages 208-219, 1997.
• S. Luján-Mora J. Trujillo. A comprehensive method for data warehouse design. Proc. DMDW, 2003.
![Page 86: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/86.jpg)
References (cont’d)• Luján-Mora, S., Trujillo, J., and Song, I. Extending the UML for
Multidimensional Modeling. Lecture Notes In Computer Science, Vol. 2460, pages 290-304., 2002.
• Husemann, B., Lechtenborger, J., Vossen, G.: Conceptual Data Warehouse Design.
• In: Proc. of the 2nd. Intl. Workshop on Design and Management of Data Warehouses (DMDW'2000), Stockholm, pages 3-9, 2000.
• Lehner, W., Albrecht, J., and Wedekind, H. 1998. Normal Forms for Multidimensional Databases. In Proceedings of the 10th international Conference on Scientific and Statistical Database Management (July 01 – 03), pages 63-72, 1998.
• Web Articles• http://en.wikipedia.org/wiki/Data_warehouse
• http://en.wikipedia.org/wiki/Online_analytical_processing
• http://en.wikipedia.org/wiki/OLTP
![Page 87: Data Warehouse Modeling](https://reader033.vdocuments.us/reader033/viewer/2022061306/54b418814a795990418b45ee/html5/thumbnails/87.jpg)
References (cont’d)• http://www.sidadelman.com/data_warehouse_applications.htm
• http://infolab.stanford.edu/infoseminar/Archive/FallY97/slides/ncr
• www.cdd.go.th/it/file/DataWarehousing_and_DataMining.pdf
• http://www.ciobriefings.com/whitepapers/StarSchema.asp