Allan Peda
Enterprise Application Integration
Y52.3250Prof. Aldrich Wright
Allan PedaWeek 3, Kimball vs Inmon Data Warehouse Designs
Due Date: February 17, 2007, 10PM
Page 1 of 7
Allan Peda
AssignmentDescribe a Business Problem or set of business problems that the your company
is trying to solve through a data warehouse or data mart. Compare and contrast
the Kimball and Inmon approaches to data warehouse. Pick the one that best
solves your business problem and why.
Problem StatementThe chosen business problem for this data warehouse is the determination of
enrollment trends and subject interest within a university. This information
derived from these data is of tremendous value as it will be used to develop long
term growth plans for the university including campus expansion, departmental
funding levels and potential areas for enhanced research and development
funding. The initial focal point of this from a business unit perspective will be
student registration data and library collection access records. This will define
initial data feeds for ETL processing.
Problem Resolution
The chosen design is the relational (Inmon) database design as it is considered
to be quicker to develop, supports gradual expansion over time, is robust, flexible
and scalable. These attributes are considered more important than speed of
implementation or raw performance. Kimballs multidimensional system (also
known as BUS for reasons discussed below) is considered more rigid because
well defined conforming dimension and fact tables must be selected at the outset
of system development. The fact that the BUS architecture is modeled on the
analytical requirements of the end users creates difficulties when these
requirements change.
Page 2 of 7
Allan Peda
Relational Multidimensional
High Performance
Considered Quicker To Implement
Gradual Development Supported
Robust
Flexible
Scalable Performance
A more extensive discussion of each system is developed below.
Common Attributes among all data warehouse designs
No matter which design paradigm is used to approach the development of a data
warehouse design, the basic definition of a data warehouse remains the same. It
is a subject oriented, integrated, nonvolatile and time variant collection of data in
support of management's decisions. As such it can be viewed as one type of
read intensive online analytical processing (OLAP) system which must integrate
a large amount of data derived from multiple business units. These data must be
consistent and summarized at the appropriate level of granularity, consequently
an Extract, Transform and Load (ETL) step is necessary in drawing data from the
operational (OLTP) systems.1 Both designs emphasize the importance of the
ETL step in developing cleansed data which is loaded from a common Integrated
Data Store. There must be no direct dependence on legacy or operational data.
The Multidimensional (Kimball) Approach
As noted earlier, the Kimball design is developed along the requirements of the
end user. As such the data must be denormalized and refactored using a well
defined set of end user requirements. The underlying logical architecture is not
1 This step is sometimes altered slightly with data transformation done primarily within the database. This is referred to as Extract, Load and Transform (ELT).
http://www.b-eye-network.com/blogs/linstedt/archives/2005/05/elt_and_etl_can.php
Page 3 of 7
Allan Peda
relational, instead it is multidimensional (MOLAP, often referred to as a
multidimensional hypercube). It should be noted that star schemas can be
developed using tables within an RDBMS. In such cases it is possible to reduce
data redundancy and lower storage requirements by reintroducing some level of
normalization to the data. Such controlled reintroduction of normalization is
known as snowflaking of a star schema. Physical MOLAP database systems (as
opposed to the logical schema) are typically proprietary commercial off the shelf
(COTS) products, and as such there is little publicly information regarding the
actual implementation of such systems.
Kimball data warehouses are comprised of multiple subject oriented data marts
which each center on a few core fact tables. These tables contain data which are
easily aggregated. Surrounding these fact tables are multiple dimension tables,
which contain the associated characteristics of these facts (time being an almost
universal attribute for example). These dimension tables are considered to be the
entry points used to access and aggregate the associated facts. They are used
for filtering, grouping and labeling.
In order for queries to determine what correlations between facts and attributes, it
is desirable for dimension and fact tables to be shared among several subject
areas. The careful design and selection of this set of conforming fact and
dimension tables is referred to by Kimball as the BUS architecture.
Page 4 of 7
Allan Peda
The Relational (Inmon) Approach
In contrast with the architecture developed by Ralph Kimball, Bill Inmon's design
is based on a more flexible relational design, which is shaped by the enterprise
data model. The relational design is the more abstract and more flexible one,
however this flexibility is achieved at a cost, and it is generally accepted that
relational model is not optimal in terms of reporting performance requirements.
Data which are spread among several parent-child relationships will require
multiple lookups in order to fetch each record. In order to obtain acceptable
performance levels, the relational data warehouse is often systematically
denormalized in places. The following techniques are recommended by Inmon:
1. Tables may be physically merged (possibly by creating materialized views during load step).
2. Data may be refactored so they are no longer atomic; this may be accomplished via embedded arrays.
Page 5 of 7
Illustration 1: Data flow in the Kimball Data Warehouse Architecture (from The Data Warehouse Toolkit, Kimball 2nd edition,2003)
Allan Peda
3. Data which has a low probability of access may be placed in a separate tables.
4. Redundant data may be deliberately introduced to reduce the need to lookup other tables.
5. Derived data may be placed within the database. This is a special instance of data redundancy. This may be accomplished via regularly scheduled batch processes or by use of a creative index during the data loading step.
6. Most noteworthy perhaps is the fact that Inmon states that it is “patently incorrect” to enforce referential integrity within a data warehouse. As such any parent-child integrity within a data warehouse is referred to by Inmon as a relationship “artifact”.
Inmon's design also relies on star (and snowflake) joins, however these design
techniques are restricted to the summarized data presented at the data
warehouse level. In Ralph Inmon's data warehouse designs, data marts are not
linked together via a data bus of conforming fact and dimension tables, instead
the RDBMS system upstream of the data marts is considered the warehouse
proper, and it is there where relations are derived allowing queries across subject
areas (contrast his to the use of conforming dimensions and facts within Kimball's
architectures). As noted earlier Kimball's logical architectures naturally map
directly to a physical multidimensional online analytical processing (OLAP)
database system known as MOLAP, while Bill Inmon's designs are inherently
relational OLAP (ROLAP). Combinations, or hybrids of both systems are known
as HOLAP.
Page 6 of 7
Allan Peda
References
1. Inmon, W.H., Building The Data Warehouse, 4th edition. Indianapolis, Indiana: Wiley Publications, 2005.
2. Kimball, R and Ross, M. The Data Warehouse Toolkit, 2nd edition. Indianapolis, Indiana: Wiley Publications, 2003.
3. Website for The Kimball Grouphttp://www.kimballgroup.com/
4. Website for Bill Inmons Corporate Information Factory (CIF) http://www.inmoncif.com/home/
Page 7 of 7
Illustration 2: Data flow for the Inmon Relational Data Warehouse