chapter 31
DESCRIPTION
Chapter 31. Data Warehousing Design Transparencies. Chapter 31 - Objectives. Issues associated with designing a data warehouse. Technique for designing the database component of a data warehouse called dimensionality modeling. How a dimensional model (DM) differs from an ER model. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/1.jpg)
Chapter 31
Data Warehousing DesignTransparencies
![Page 2: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/2.jpg)
Chapter 31 - Objectives
• Issues associated with designing a data warehouse. • Technique for designing the database component of a
data warehouse called dimensionality modeling. • How a dimensional model (DM) differs from an ER
model. • A step-by-step methodology for designing a data
warehouse.• Criteria for assessing degree of dimensionality
provided by a data warehouse.
![Page 3: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/3.jpg)
Designing Data Warehouses
• To begin a data warehouse project, need to find answers for questions such as:
– Which user requirements are most important and which data should be considered first?
– Which data should be considered first? – Should project be scaled down into something more
manageable?– Should infrastructure for a scaled down project be
capable of ultimately delivering a full-scale enterprise-wide data warehouse?
![Page 4: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/4.jpg)
Designing Data Warehouses • For many enterprises, the way to avoid the
complexities associated with designing a data warehouse is to start by building one or more data marts.
• Data marts allow designers to build something that is far simpler and achievable for a specific group of users.
![Page 5: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/5.jpg)
Designing Data Warehouses
• Few designers are willing to commit to an enterprise-wide design that must meet all user requirements at one time.
• Despite the interim solution of building data marts, goal remains same: ie., the ultimate creation of a data warehouse that supports the requirements of the enterprise.
![Page 6: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/6.jpg)
Designing Data Warehouses
• Requirements collection and analysis stage of a data warehouse project involves interviewing appropriate members of staff (such as marketing users, finance users, and sales users) to enable identification of prioritized set of requirements that data warehouse must meet.
![Page 7: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/7.jpg)
• At same time, interviews are conducted with members of staff responsible for operational systems to identify which data sources can provide clean, valid, and consistent data that will remain supported over next few years.
Designing Data Warehouses
![Page 8: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/8.jpg)
Designing Data Warehouses
• Interviews provide the necessary information for the top-down view (user requirements) and the bottom-up view (which data sources are available) of the data warehouse.
• The database component of a data warehouse is described using a technique called dimensionality modeling.
![Page 9: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/9.jpg)
Dimensionality Modelling
• Logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access
• Uses the concepts of ER modeling with some important restrictions.
• Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables.
![Page 10: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/10.jpg)
Dimensionality Modelling
• Each dimension table has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table.
• Forms ‘star-like’ structure, which is called a star schema or star join.
![Page 11: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/11.jpg)
Dimensionality Modelling
• All natural keys are replaced with surrogate keys (branch Id instead of branch #). Means that every join between fact and dimension tables is based on surrogate keys, not natural keys.
• Surrogate keys allows data in the warehouse to have some independence from the data used and produced by the OLTP systems.
![Page 12: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/12.jpg)
Star schema for property sales of DreamHome Figure 31.1 –Page 1080
Propertyid(pk)
Fact Table
PropertySaletimeId keypropertyid keybranchid keyClinetid keyPromotionid keyStaffid keyOwnerid key
Time PropertyforSale
Time Id
Branchid (PK)
Owner
Ownerid (pk)
Branch
(PK)
Promotionid(PK)
PromotionStaffid
(pk)
Day od weekQuarteryear
Client
Staff
Clientid(pk)
![Page 13: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/13.jpg)
Dimensionality Modelling
• Star schema is a logical structure that has a fact table containing factual data in the center surrounded by dimension tables containing reference data, which can be denormalized.
• Facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed.
![Page 14: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/14.jpg)
Dimensionality Modelling
• Bulk of data in data warehouse is in fact tables, which can be extremely large.
• Important to treat fact data as read-only reference data that will not change over time.
• Most useful fact tables contain one or more numerical measures, or ‘facts’ that occur for each record and are numeric and additive.
![Page 15: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/15.jpg)
Dimensionality Modelling
• Dimension tables usually contain descriptive textual information.
• Dimension attributes are used as the constraints in data warehouse queries.
• Star schemas can be used to speed up query performance by denormalizing reference information into a single dimension table.
![Page 16: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/16.jpg)
Dimensionality Modelling
• Snowflake schema is a variant of the star schema where dimension tables do not contain denormalized data.
• Starflake schema is a hybrid structure that contains a mixture of star (denormalized) and snowflake (normalized) schemas. Allows dimensions to be present in both forms to cater for different query requirements.
![Page 17: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/17.jpg)
Simple Hierarchies (Roll up) & Classes Simple Hierarchies (Roll up) & Classes Within Dimensions --Dimension HierarchiesWithin Dimensions --Dimension Hierarchies
Region Total
East West Central
Everett Renton Bellevue
![Page 18: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/18.jpg)
Some OLAP servers support multiple hierarchies withinone dimension. One child can have many parents
City SalesZone
Branch
State Sales Region
![Page 19: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/19.jpg)
Roll up
Region Zone Branch
State CityBranch
Without multiple hierarchies, the previous database would have to be represented with separate dimensions for each roll-up.
![Page 20: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/20.jpg)
Cube
sale p Branch
week
price
week 1
week 2
P1P4
p1p2p3p4
c1c2c3c1
C1 C2
C31221
1243 1
p2p3
c1 c2 c3
3
Fact table view multi-dimensional cube
Property sale
P = property #
24
![Page 21: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/21.jpg)
Property sales with normalized version of Branch dimension table Figure 31.2 – Page 1081
PropertySaletimeId keypropertyid keybranchid keyClinetid keyPromotionid KeyStaffid keyOwnerid key
Dimension Table
City ID(PK)Region ID (FK)
City
Region ID(PK)
Region
Branch Id (PK)Branch noBranch typeCity (FK)
Roll Up(Dimension Hierarchies)Roll Up(Dimension Hierarchies)
![Page 22: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/22.jpg)
Dimensionality Modelling
• Predictable and standard form of the underlying dimensional model offers important advantages:– Efficiency – Ability to handle changing requirements – Extensibility – Ability to model common business situations – Predictable query processing.
![Page 23: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/23.jpg)
Comparison of DM and ER models
• A single ER model normally decomposes into multiple DMs.
• Multiple DMs are then associated through ‘shared’ dimension tables.
![Page 24: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/24.jpg)
PropertySale Advertisement
Time
Branch
Promotion
PropertyFor sale
owner Newspaper
Fact Table Fact Table
Shared Dimension Tables
![Page 25: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/25.jpg)
Database Design Methodology for Data Warehouses
• Nine-Step Methodology’ includes following steps:– Choosing the process – Choosing the grain – Identifying and conforming the dimensions – Choosing the facts – Storing pre-calculations in the fact table – Rounding out the dimension tables – Choosing the duration of the database – Tracking slowly changing dimensions – Deciding the query priorities and the query modes.
![Page 26: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/26.jpg)
Step 1: Choosing the process (Subject Area)
• The process (function) refers to the subject matter of a particular data mart.
• First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions.
![Page 27: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/27.jpg)
ER model of an extended version of DreamHome Figure 31.3 – Page 1084
PropertuSale
LeaseProperty
Maintenance
PropertyViewing
Advert
Ward Mgt PatientMgt
Supply Mgt
Wellmeadow Case
![Page 28: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/28.jpg)
ER model of property sales business process of DreamHome Figure 31.4 – Page 1085
Process identified by subject area
![Page 29: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/29.jpg)
Step 2: Choosing the grain
• Decide what a record of the fact table is to represent.
• Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table.
• Also include time as a core dimension, which is always present in star schemas.
![Page 30: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/30.jpg)
Step 3: Identifying and conforming the dimensions
• Dimensions set the context for asking questions about the facts in the fact table.
• If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other.
• A dimension used in more than one data mart is referred to as being conformed.
![Page 31: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/31.jpg)
Star schemas for property sales and property advertising
Figure 31.5 – Page 1086
Star Schemas for property sale and property advertisingWith Time, PropertyForSale, Branch, and PromotionAs Conformed (shared) dimension tables
![Page 32: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/32.jpg)
Step 4: Choosing the facts
• The grain of the fact table determines which facts can be used in the data mart.
• Facts should be numeric and additive.
• Unusable facts include:– non-numeric facts– non-additive facts– fact at different granularity from other facts in table.
![Page 33: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/33.jpg)
Fact Criteria
Weight the Fact attributes based upon the following criteria:• They exhibit measurable results to the Users and
Management.
• They are visible within the business and through management.
• They are manageable.
![Page 34: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/34.jpg)
Step 5: Storing pre-calculations in the fact table
• Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations.
![Page 35: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/35.jpg)
Step 6: Rounding out the dimension tables
• Text descriptions are added to the dimension tables.
• Text descriptions should be as intuitive and understandable to the users as possible.
• Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.
![Page 36: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/36.jpg)
Step 7: Choosing the duration of the database
• Duration measures how far back in time the fact table goes. For ex. Insurance &Tax Considerations.
• Very large fact tables raise at least two very significant data warehouse design issues. – Often difficult to source increasing old data. – It is mandatory that the old versions of the important
dimensions be used, not the most current versions. Known as the ‘Slowly Changing Dimension’ problem.
![Page 37: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/37.jpg)
Time Based Criteria
Due to disk space constraint, data selected must be time relevant in terms of trend, predictability, and profitability for the enterprise.
![Page 38: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/38.jpg)
Step 8: Tracking slowly changing dimensions
• Slowly changing dimension problem means that the proper description of the old dimension data must be used with old fact data.
• Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time.
![Page 39: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/39.jpg)
Step 8: Tracking slowly changing dimensions (Cont’d)
• Three basic types of slowly changing dimensions: – Type 1, where a changed dimension attribute is
overwritten– Type 2, where a changed dimension attribute causes
a new dimension record to be created – Type 3, where a changed dimension attribute causes
an alternate attribute to be created so that both the old and new values of the attribute are simultaneously accessible in the same dimension record
![Page 40: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/40.jpg)
Step 9: Deciding the query priorities and the query modes
• Most critical physical design issues affecting the end-user’s perception includes:– physical sort order of the fact table on disk– presence of pre-stored summaries or aggregations.
• Additional physical design issues include administration, backup, indexing performance, and security.
![Page 41: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/41.jpg)
Database Design Methodology for Data Warehouses
• Methodology designs a data mart that supports requirements of particular business process and allows the easy integration with other related data marts to form the enterprise-wide data warehouse.
• A dimensional model, which contains more than one fact table sharing one or more conformed dimension tables, is referred to as a fact constellation.
![Page 42: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/42.jpg)
Fact and dimension tables for each business process of DreamHome
![Page 43: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/43.jpg)
Dimensional model (fact constellation) for the DreamHome data warehouse
Reference Figure 31.8 – Page 1090
A dimensional model, which contains more than one FactTable sharing one or more shared dimension tables, is referredTo as a Fact constellation.
![Page 44: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/44.jpg)
Criteria for assessing the dimensionality of a data warehouse
• Criteria proposed by Ralph Kimball to measure the extent to which a system supports the dimensional view of data warehousing.
• Twenty criteria divided into three broad groups: architecture, administration, and expression.
![Page 45: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/45.jpg)
Criteria for assessing the dimensionality of a data warehouse
![Page 46: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/46.jpg)
Criteria for assessing the dimensionality of a data warehouse
• Architectural criteria describes way the entire system is organized.
• Administration criteria are considered to be essential to the ‘smooth running’ of a dimensionally-oriented data warehouse.
• Expression criteria are mostly analytic capabilities that are needed in real-life situations.
![Page 47: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/47.jpg)
Data Warehouse Project Scope Document
I Executive Summary -- Business needsII Project Background -- How did the project start? -- Who is the sponsor?III Project Definition -- Project Objectives -- Project Organization -- Project Critical Success Factor -- Measurements of Success
![Page 48: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/48.jpg)
Data Warehouse Project Scope Document
IV Project Scope What’s in the Data Warehouse? What’s not in the Data Warehouse? Samples of Queries & Reports
V Methodology and Approach Methodology Employed Techniques Employed
![Page 49: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/49.jpg)
Data Warehouse Project Scope Document
VI Project Cost/BenefitsVII Project Schedule, Budget and Resources -- The plan should include the following milestones:• Logical Data Modeling• Data Warehouse Data Modeling• Data Warehouse Physical Model• Source System of Record• Extraction/Transformation Program• Populated Data Warehouse• Populated Metadata• End User Access Application• End User Training• Ongoing Support Plan
![Page 50: Chapter 31](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816835550346895dddecd2/html5/thumbnails/50.jpg)
Data Warehouse Project Scope Document
VIII Project Planning Assumptions and Issues -- Project Assumptions -- Project Risks -- Project ContingenciesIX Expected Follow-on Projects