1
DATA WAREHOUSINGConcepts and Components
Chapters 1, 2, 7, 14, 15
2
Agenda
Background Data Warehouse vs Operational Data
Store Characteristics of a Data Warehouse Improvements in Data Warehousing Relationship to Business Intelligence
3
Evolution of Decision Support Technologies Business people need information to make plans,
decisions, and assess results 60's
Batch reports 70’s
DSSs 80’s
Info Centers 90’s
Early DWs 2000's
Business Intelligence Issues:
Dependency on IT resources Based on OLTP or extracts Functionality often pre-programmed "Big Data" Analytics
4
DW vs. Business Intelligence Short:
DW = populating structures with data BI = using DW data
Long: DW = body of historical data, separate from the
operations of the organization, used to create BI BI = the delivery of timely, accurate, and
useful information to decision-makers Broad:
BI = a broad category of applications, technologies, and organizational processes for gathering, storing, accessing, and analyzing data to help business users make better decisions
5
Need for Decision-Optimized Data Storage Business people need information to
make plans, decisions, and assess results What were sales volumes by region and
product category for the last 3 years? Which of two new medications will result in
the best outcomes (higher recovery rate and shorter hospital stay)?
Data captured by complex operational systems (OLTPs) optimized to support well-defined transaction requirements
Difficult to get needed information from data grounded in OLTPs
6
Operational vs. Informational Data Operational (ie,
OLTP)Informational (ie, OLAP)
Data Content Current Values Historical, derived, summarized
Data Structure
Optimized for transactions
Optimized for complex queries
Data Volume MB/GB of data GB/TB/PB… of data
Access Frequency
High Medium to low
Access Type Read, update, delete Read-only
Usage Predictable, repetitive Ad hoc, random, heuristic
Response Time
Sub-seconds Several seconds to minutes
Users Large number; operational & data workers
Relatively smaller number; data & knowledge workers
7
Data Warehouse“… a subject-oriented, integrated, nonvolatile,
and time variant collection of data in support of management decisions.” Managing the Data Warehouse, W. H. Inmon, John
Wiley & Sons, December, 1996.
“… a copy of transaction data specifically structured for query and analysis.” The Data Warehouse Toolkit, R. Kimball, John Wiley
& Sons, February, 1996.
Enterprise data, transformed, integrated, accumulated over time, optimized for decision-making, and accessible via analytical tools
8
Characteristics of a DW (ala Inmon) Subject-Oriented
As opposed to business-process oriented Integrated
Multiple sources, internal and external Critical part of DW implementation
Time-Variant History, time periods important
Non-Volatile DW data not changed once stored
9
Subject Orientation Data organized based on:
How users refer to it Subject areas of interest to users Areas important for tracking success, performance
Often based on transactions
Graphic Source: http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=22&ved=0CHUQFjALOAo&url=http%3A%2F%2Fwww.csun.edu%2F~hcmgt004%2FDWPresSp05.ppt&ei=U1S9VMuOJsTCgwSes4HADg&usg=AFQjCNGvjoMayU2-79jot8f2zb4J_IbZGQ&bvm=bv.83829542,d.eXY&cad=rja
10
Characteristics of a DW, cont… Subject-Oriented
Needs are business subject-focused Integrated
Multiple sources, internal and external Time-Variant
History, time periods important Non-Volatile
DW data not changed once stored Data Granularity
11
Data Granularity*
Level of detail stored in database Operational focus Analytical focus
Examples: Life Insurance Policy vs. Life Insurance
Coverage Product Category vs. Product Sales
High granularity (eg, transactional grain) is most flexible
* See http://www.kimballgroup.com/2007/07/keep-to-the-grain-in-dimensional-modeling/ for excellent (and brief) description of granularity
12
Challenges in Early DW Implementation1. Improper or infeasible architectures,
approaches2. Insufficient attention to
organizational strategy and culture3. Early information delivery tools too
complex for business users4. Storage technology made it difficult
to store much detail or history, and slow to process
13
Improved Technology User-friendly tools for analysis,
visualization Excel Tableau Reporting Services, …
Improved technology for accessing, aggregating, partitioning data
Advances in processing technology Parallel processing In-Memory data warehouses
Advances in storage technology RAID Solid State
14
Improved Architectures
Based on Data Marts Conformed dimensions BI-emphasis
15
Data Warehouse vs. Data Marts
Enterprise Data Warehouse Information about ALL subjects
important to the organization
Pros Cons
Requires a corporate effort Requires a corporate effort
Single, central storage Longer (costlier) to implement
Centralized control, architecture
Higher risk of failure
16
Data Warehouse vs. Data Marts, cont…
Pros Cons
Faster implementation Can introduce redundant data
Earlier return on investment
Can make data mart integration more complex
Less risk of failure
Gives project team time to learn, grow
Data Marts Subsets of data warehouse that focus on
a selected subject area; typically departmental in nature
17
Data Warehouse Architecture: Basic
18
Data Warehouse Architecture: Types
19
BI Architecture
Source: Chaudhuri et. al. , An Overview of Business Intelligence Technology, Communications of the ACM, 54(8), August 2011, pp. 88-98.
20
BI Architecture, cont…
Source: Oracle Corporation. Information Management and Big Data: A Reference Architecture, Oracle White Paper, February 2013, p. 12.
21
BI Architecture as “LDW”
Source: http://skylandtech.net/2014/09/22/a-modern-data-warehouse-architecture-part-1-add-a-data-lake/
22
Architecture Components
1. Data Sources2. Data Staging (Movement)3. Data Storage (Warehouse) 4. Data Analysis/Discovery (Mid-tier)5. Information Delivery (Front-end
Presentation)
23
1. Data Sources
Identifying required business data from Production Internal, Personal Archived External
24
2. Data Staging Extract
From source systems Transform
Cleanse Supplement Convert Combine…
Load Populate data warehouse/mart tables
25
3. Data Storage
Data Warehouse / Data Mart Relational database for structured data Non-relational (e.g., Hadoop) data store
for "loosely-structured" data Metadata
Relational database Catalog Extended properties Custom tables
External products/tools Spreadsheets…
26
4. Data Analysis: Supporting Knowledge Discovery
Forms of Discovery
Know What Info to Look For
Know How to Explore Info
Layout-Led Discovery
X X
Data-Led Discovery
X
Model-Led Discovery
Layout-Led Discovery Pre-Designed Reports
Data-Led Discovery OLAP Analysis
Model-Led Discovery Data Mining
27
Pre-Defined Reports
Information pushed to user Content and layout pre-determined Can be parameter-driven Can support some drill-down May also include basic report
development
28
OLAP Online Analytical Processing
Providing On-Line Analytical Processing to User Analysts, E. F. Codd, Codd & Date, Inc 1993.
Short Definition: Class of applications or tools that support
ad-hoc analysis of multidimensional data Longer Definition:
“…technology that enables [users]… to gain insight into data through…fast, consistent, interactive access [to]…information that has been transformed…to reflect the real dimensionality of the enterprise…” OLAP Council (www.olapcouncil.org)
29
Data Mining
Search for patterns in large amounts of data Making connections/associations with
data Predicting future outcomes
OLAP vs. Data Mining “Report on the past” vs. “Predict the
future”
Part of Knowledge Discovery…
30
5. Information Delivery
31
Examples of Uses of BI/DW
Industry Use
Retail Customer Loyalty
Customer Service Effectiveness
Financial Fraud Detection
Profitability by product/LOB
Airlines Route Profitability
Customer Profitability
Manufacturing Cost Reduction Opportunities
Product Shipments
Non Profits Giving Campaign Effectiveness
Salvation Army Bell Ringer Effectiveness
Government Manpower Planning
School Academic Performance
Medical Patient Risk for Disease
32
More Examples…
Industry Use
Marketing Churn Reduction
Brand/Product/Company Perception
Sales Cross-Selling Opportunities
Technology/Consulting
Estimate future engagements
33
Next Time…
Data Warehouse Design (Dimensional Modeling)