[db tech showcase tokyo 2015] data warehouse basics by wiliiam inmon
TRANSCRIPT
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
DATA WAREHOUSE BASICS
a presentation by
W H Inmon
The data warehouse
- a definition
A subject oriented, non volatile,
integrated, time variant collection
of data for the support of management’s
decisions
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
Granular, detailed data and lots of it
Data that can be shaped and reshaped
A foundation of reconcilability
A basis for new, unknown analysis
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
key
time
primary data
secondary data
What a typical record of the data warehouse
looks like
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
key
An identifier
Unique or non unique
Often a compound key
May be natural or blind
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
time
Time variancy
- continuous
- from date/to date
- periodic discrete
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
Name
Address
Phone
Zip
…….. A continuous
time span record
from
date
to
date
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
Name
Address
Phone
Zip
……..
from
date
to
dateName
Address
Phone
Zip
……..
from
date
to
dateName
Address
Phone
Zip
……..
from
dateto
date
A sequence of time span records
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
No overlap
Discontinuity is a possibility
999000 From the beginning of time to the end of time
Continuous time span data
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
Periodic discrete structure
Jan 1Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Feb 1Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Mar 1Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Apr 1Expenses
Revenues
No of employees
Stock price
Price per share
………………….
The notion of taking a snapshot as of some one
moment in timeForest Rim Technology
Copyright Inmon Consulting Services, 2008C
Periodic discrete structure
Jan 1Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Feb 1Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Mar 1Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Apr 1Expenses
Revenues
No of employees
Stock price
Price per share
………………….
The structure says nothing about values as of any other
date
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
Periodic discrete structure
For few variables
For slow changing variables
Continuous time span data
For many variables
For quickly changing variables
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
Primary data
Primary data relates directly to the key
Example – key – ssno
- primary data – name, date of birth
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
Secondary data
Secondary data relates directly to
the primary data
Example – key – ssno
- primary data – name, date of birth
- secondary data – address, zip, phoneForest Rim Technology
Copyright Inmon Consulting Services, 2008C
The granular data in the
data warehouse –
- serves as a basis for
many other forms of DSS
- is instantly available
- forms a foundation of
reconcilability
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
Relational
structures Star joins
requirements
The data warehouse is shaped by the data model;
The star join world is shaped by requirements
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
Often called
Multi dimensional data
Often called
Atomic data
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
applications
Legacy data
Operational data
Transactional data
Atomic
data
Data
warehouse
The source of data warehouse data
is the operational environment
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
m/f
1/0
x/y
male/
female
gender
m/f
integration of data in the data warehouse
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
inches
cms
feet
miles
unit of
measure
cms
units of measurement need
to be integrated
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
ETL
Extract/transform/load
The integration and conversion of data
is the most difficult part of the data warehouse
processForest Rim Technology
Copyright Inmon Consulting Services, 2008C
Transformation code can
be generated manually or
automatically.
Automatically is always
preferred
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
The functions performed
by the ETL process are
not trivial -
Convert
Reformat
Add time element
Restructure
New key
Add default values
Change dbms
Change operating system
Summarize
Break into multiple records
Convert key structure
Merge records
Collect metadata
Conform to data model
Select data/reject data
Add indexes
Change encoding
Change hardware environments
Resequence data
Ascii to ebcdic;ebcdic to ascii
Partition data
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
ETL performed in host
environment
ETL performed in
source environment
ETL processing can be
performed in different places
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C
data warehouse –
at the center of the
decision making of
the corporation
Forest Rim Technology
Copyright Inmon Consulting Services, 2008C