1 dimensional modeling dr. jerry rosenbaum [email protected] the rose tree group 410-764-8443...

69
1 Dimensional Modeling Dr. Jerry Rosenbaum [email protected] The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

Upload: jacquelyn-hamson

Post on 01-Apr-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

1

Dimensional Modeling

Dr. Jerry [email protected]

The Rose Tree Group410-764-8443

Myriad Solutions301-476-9190

Page 2: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

2

Agenda

1. Dimensional Example

2. The Bigger Picture

3. Steps to Build a Dimensional Model

Page 3: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

3

An Understanding of the Problemis key to the solution

To rewire the Empire State Building you must•Understand the current wiring•Understand the goals for the new wiring•Design the new wiring system•Execute your design•Monitor the results

•The “Fire, Ready, Aim” approach can lead to a hole in your foot

Page 4: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

4

Business User Perspective on Data

• In the view of the business user, there are only two important things about data– Accessibility

• Can I get the data that I need• Am I allowed to get the data I need (security)

– Quality• When I get the data, can I trust it• Without quality, business intelligence devolves into

business stupidity

Page 5: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

5

Part IDimensional Model Example

Page 6: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

6

Up Front Points

• Dimensional Models are generally a star schema, but may be a snow flake

• Provides a slice of the total available data and is focused about the needs of a single department or user

• Contains a fact table and multiple dimension tables

• Attributes may be source data or derived data• A dimensional model is one type of data mart.

Page 7: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

7

Sales Analysis Star Schema

Customer Sales

Sales Rep Id (FK)Masnager Rep Id (FK)Product Id (FK)Day Id (FK)Address Id (FK)Customer Id (FK)Customer Demographics Id (FK)Internal Organization Id (FK)

QualtityGross SalesProduct Cost

Sales Rep

Sales Rep IdMasnager Rep Id

Sales Rep Last NameSales Rep First NameManager Rep Last NameManager Rep First Name

Product

Product Id

Product DescriptionCategory IdCategory Description Time By Day

Day Id

Fiscal YearQuarterMonthWeekDay

Address

Address Id

Address 1Address 2City NameState AbbrivPostal CodeCountry

Customer

Customer Id

Customer Name

Customer Demographics

Customer Demographics Id

Credit RatingMarital StatusAge

Internal Organization`

Internal Organization Id

Internal Organization Name

From Len SilverstonUniversal Data Models

Page 8: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

8

Dimensional Model• One Fact table

– Customer Sales• Fact is either one data item (or a group of tightly coupled

data items of the same granularity)• Multiple Dimension Tables (each with one or more

dimensions of a similar type)– Sales Rep– Product– Time by Day– Address– Customer– Customer Demographics– Internal Organization

Page 9: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

9

Business Points

• Data in Fact Table, and the Dimension Tables is based on– Business requirements– Data extracted from other databases

• Internal or external data

• Allows business users to slice and dice the data by any combination of dimensions to produce a business report

Page 10: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

10

Business Points (2)

• Design of the Dimensional Model is only one aspect, how about the data to be loaded– Are there any data quality issues with the data

that will be loaded– Are there alternate sources of the data that

differ from the data source we used– Are we violating any security or privacy issues

Page 11: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

11

Questions

• Where did the data come from• How often do we update the data

– Is it an update or a complete refresh– Are the rules different for internal and external data

sources

• Why didn’t we just use the original databases• Was any of the data transformed• Why did we choose those 7 dimensions• What is the quality of the data

Page 12: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

12

Truly Answering These Questions

• We must look at– Business drivers– Where dimensional modeling fits into the

bigger data picture– How do we do dimensional modeling

• In other words, it helps if we understand the bigger picture so we can build the right solution the first time

Page 13: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

13

Some Hidden Issues

• This dimensional model can not reasonably help us answer– What was the typical total check out amount for all

purchases by a customer– How many items did a typical customer purchase– Adding these items to every row of the fact table is

not a good solution

This requires a second, but related dimensional model.

Page 14: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

14

Hidden Issues (2)

• How does one keep a set of dimensional models in synch

• How does one ensure that for a given “fact” and set of dimensions that the sourcing of the data is consistent across multiple models

• If a transformation (the T in ETL) is used, how can one ensure that everyone uses a consistent transformation

• What about the data quality issues – were they resolved the same way every time.

Page 15: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

15

Part IIThe Bigger Picture

Page 16: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

16

Sagely Advice

“Where should I begin your majesty”

“Begin at the beginning”, the king said gravely

- Louis Carroll

Alice’s Adventure in Wonderland

Page 17: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

17

Business IntelligenceBusiness Intelligence

• On of the earliest applications in the history of computing was a program to generate reports on operating system performance

• FAST FORWARD TO TODAY

• Today we call this process Business Intelligence (BI) – we gather and analyze data to increase business process efficiency.

From J. O’Conner

Page 18: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

18

Business IntelligenceBusiness Intelligence• C - Level – Considerations

– Costly ERP and Major systems are implemented to provide information to management to make the best decisions relating to improving the “Bottom Line.”

– IT Customer satisfaction does not seem to meet expectations – manage expectations

– IT takes the blame for bad decisions due to the information available to the executives

– UNWARRANTED ???UNWARRANTED ??? ––

From J. O’Conner

Page 19: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

19

Business IntelligenceBusiness Intelligence

• C - Level Executive ConsiderationsC - Level Executive Considerations

– ERP or Major System’s Value is dependent on how the information is disseminated to management (regions, functions, divisions, products, etc.) – but IT needs to understand what data is available and assure its accuracy.

– IT must establish test and check points to assure data accuracy using standard data integrity methods.

– IT must provide training and support in development of reports for accuracy – become detectives – look for anomalies, bringing them to the attention of the clients.

From J. O’Conner

Page 20: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

20

Business IntelligenceBusiness Intelligence

• C - Level Executive C - Level Executive ConsiderationsConsiderations

Information Business Leaders need:

• Customer Information – trends in product selection, billing management, order tracking, fulfillment visibility, marketing planning, campaign management , telemarketing, lead generation, lead generation, and custom segmentation.

From J. O’Conner

Page 21: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

21

Business IntelligenceBusiness Intelligence

• C - Level Executive ConsiderationsC - Level Executive Considerations

Information Business Leaders need:

• Supplier Information – cost of purchased goods and services, optimization of supplier selection, compress cycle times, align the purchase of goods with the corporate strategy.

From J. O’Conner

Page 22: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

22

Business IntelligenceBusiness Intelligence

• C - Level Executive ConsiderationsC - Level Executive Considerations

Information Business Leaders need:

• Product Lifecycle Management data• Supply Chain Management data• Financial Data that links Business controls to Finance and

comply with Sarbanes Oxley (SOX404).

From J. O’Conner

Page 23: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

23

Key Business Driver: The Need to Improve Business Intelligence

• Nine out of 10 executives from the largest U.S. companies say they need stronger business intelligence capabilities that provide better analysis of, and insight into, their operations if they are to grow successfully in an uncertain economic and political environment

• Accenture survey of 150 senior executives of Fortune 1000 firms.

From J. O’Conner

Page 24: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

24

Top Needs

• 91% selected stronger analytical and business intelligence

• 84% selected an organizational culture that better accommodates change

• 74% selected a more robust information technology infrastructure

Page 25: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

25

Building a Business and IT Foundation for BI

• Organizations have never been so eager to adopt business intelligence (BI) technology. Unfortunately, lack of alignment between people, process, and technology has led to many misguided business intelligence deployments. Using a business-centric methodology and process improvement type of approach, organizations can leverage BI efficiently to enable Performance Management

• M.A.Smith – Data Management Review

Page 26: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

26

Zachman FrameworkList of Locations

Important to Business

Node=Major Business Location

Data Function Network People Time MotivationList of Things

Important to Business

Entity=Class ofBusiness Thing

List of Processes theBusiness Performs

Function=Class of Business Process

List of OrganizationsImportant to Business

Agent=Major Org Unit

List of EventsSignificant to Business

Time=Major BusinessEvent

List of BusinessGoals/Strategies

End/Means=MajorBusiness Goal/CSF

e.g., EntityRelationship

Diagram

Ent=Business EntityRel=Business Rule

e.g., Function FlowDiagram

Function=Business Process

e.g., Data Model

Entity=Data EntityRelationship= Data

Relationship

e.g., Structure Chart

Funct=Computer FunctArg=Screen/Device

Formats

e.g., System Architecture

Node=Hardware/System Software

Link=Line Specification

e.g., Logistics Network

Node=Business Location

Link=BusinessLinkage

e.g., Program

Funct=Language StmtsArg=Control Blocks

e.g., NetworkArchitecture

Node=AddressesLink=Protocols

e.g., OrganizationChart

Agent=Org UnitWork=Work Product

e.g., Business Plan

End=Business Objectives

Means=BusinessStrategy

e.g., Human InterfaceArchitecture

Agent=RoleWork=Deliverable

e.g., Security Architecture

Agent=IdentityWork=Transaction

e.g., Processing Structure

Time=System EventCycle=Processing Cycle

e.g., Control Structure

Time=ExecuteCycle=Component Cycle

e.g., Timing Definition

Time=InterruptCycle=Machine Cycle

e.g., KnowledgeArchitecture

End=CriterionMeans=Option

e.g., Knowledge Design

End=ConditionMeans=Action

e.g., KnowledgeDefinition

End=SubconditionMeans=Step

e.g., Data DefinitionDescription

Ent=FieldsRel=Addresses

e.g., Data Design

Entity=Segment/RowRelationship=Pointer/

Key

e.g., Data Flow Diagram

Funct=Appl FunctionArg=User Views

Analyst Engineer Secretary

e.g., Human/Technology Interface

Agent=UserWork=Job

Analyst Engineer

e.g., Master Schedule

Time= Business EventCycle=Business Cycle

e.g., DistributedSystem Architecture

Node=Info Sys FunctLink=Line Char

Secretary

Planner’sView

Owner’sView

Designer’sView

Builder’sView

Subcontractor’sView

FunctioningEnterprise

Page 27: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

27

DoDAF

Page 28: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

28

Row 1 - Planner

Data – What• List of Subject Areas

about which data is stored

• Often presented as a taxonomy tree or a multilevel outline

Process - How• Business functional

areas• Often presented as a

set of functional decompositions

•Also important to know•Relationship between data and process•Present, and proposed data and process•Transition plan / Road map

Page 29: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

29

Row 2 – Business View/OwnerWhat - Data• Conceptual (or Business) Data

Model. Has two components– Business data objects (forms,

reports, etc) and their decomposition into data components and

– A high level organization of the business data components and their relationships

How - System• Business work flow

– End to end (incl. people)– Includes all aspects of the

target system

• The business workflow is the structure for organizing the business steps

• There are rules for moving along the steps in the workflow

• The business steps communicate to each other via the data

Page 30: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

30

Row 3 – Logical View / Designer

What - Data• Logical Data Model

– Fully normalized (through fifth normal form)

– Determine which services use which data elements (entity level and attribute level)

How - System• For each business

object– Develop a set of

services to meet the business need

– Search for potential common services

– User interactions

Metadata for both data and systems should be collected,

Organized, and maintained

Page 31: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

31

What is in a Logical Data Model

• Graphic that depicts entities, attributes, primary keys, etc (Data items + structural rules + relationships) Plus

• Metadata for Entities, Attributes and Relationships (CRUCIAL FOR USING DATA MARTS)– Definition– Data Domain values set (including possible representation)– Units of Measure– Cardinality (and optionality)– Management of synonyms and antonyms– Semantic rules– Status of an entity, or attribute

Page 32: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

32

Logical Data Model (2)– Primary key, foreign key, uniqueness, use of nulls and default

values– Data integrity and business rules (Data Quality Rules)– Originating Data Source– System of Record/Authoritative Source– Lead and steward business domain– Usage of data in an information exchange– Security – Notes for physical data model designer– Example

• A data modeler may use abstract design templates (generalization and specialization) and bottom up design based current systems plus new requirements to build a complete model

• All other models (especially the process models) are used as potential sources of data elements

Page 33: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

33

Some Data Model Notes

• The Conceptual Data Model tends to be a very wide scope, but limited detail (sets the context for data sharing).

• Logical Data Model is developed in detail as needed

• Physical Data Model is based on all or part of the Logical Data Model AND it may have a similar or very different data structure – Structure is based on planned usage

Page 34: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

34

Row 4 – Physical View/Builder

What - Data• Physical Database

Design– Transaction Path

Analysis– Analysis of the need for

indexes to improve performance

– Determine Physical Data Structure

– Determine if any services will be stored procedures

How - System• Transformation of the

business flow to a physical flow– Determine groupings of

services for implementation

– Build physical flow based on business flow and services

– SOA

Add physical details to the metadata and discovery services

Page 35: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

35

Row 5 – Detail Representation / Subcontractor

• What - Data• Determine the layout of

the database tables across the disk farm

• Develop the DCL for the physical database structure

• Determine backup & recovery strategy

• Determine SAN strategy

How - System• Determine detail

specifications for each element of the physical flow

• Write the programs for implementing the flow as well as each element of the flow

Page 36: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

36

Row 6 – Information System(Actual physical system

• What• Performance

monitoring and adjustments

• Business continuity• Archiving and

retrieval• SPC & Audit• Data Stewardship

How• “Help Desk”• Change management

(including data)

Page 37: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

37

Corporate Information Factory

AR

AP

Order Entry

Etcetera

DataWarehouse

DataMart forAccounting

Data Mart forSales

Etcetera

Metadata

System Measurements

ETL ETL

Operational Systems Reporting & BI Systems

Based on work of C. Imhof

Page 38: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

38

Data Quality Points• You must measure your actual data quality

– Quality must start in the production systems– Your ETL processes along with the data warehouse

and data marts are not meant to be a sewerage treatment plant for bad data

• If the quality is not very high in the operational systems, then– Your quest for business intelligence devolves into

producing business stupidity• Solve your DQ problem in the operational

systems (root cause) and then go forward.• Bad data can be created faster than you can

correct it.

Page 39: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

39

System Measuresfor Data and Systems

• Frequency of use

• Pattern of use (monthly, weekly, daily, hourly)

• Resources consumed– CPU, Disk, Network– For Data both volume and rate of growth

• Performance Metrics

• Utilization Metrics

Page 40: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

40

Metadatafor Data and Systems

• Need Metadata about Data and Systems– Written in Business Terms + Technical Terms

• Metadata for data includes information about Entities, Attributes, and Relationships– Definition– Data Domain values set (including possible

representation)– Units of Measure– Cardinality (and optionality)

Page 41: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

41

Metadata (2)– Management of synonyms and antonyms– Semantic rules– Status of an Entity and Attribute – Primary key, foreign key, uniqueness, use of nulls and

default values– Data integrity and business rules– Originating Data Source– System of Record / Authoritative Source– Lead and steward business domain– Usage of data in an information exchange– Example– AND Business Rules for Data Validity

Page 42: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

42

Metadata (3)

• Without Metadata, – You may not be sure what you are looking at

• For example what does a length of 6.2 mean (a Mars Lander crashed because of this problem)

– You may not be sure what process you should use to execute a business process

– Etc

Page 43: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

43

Key Question

• Where do you get the data for the data marts.– Inmon: from the data warehouse– Kimball: directly from the operational systems– The problem is that source data may be available

from several sources• What do you do if the sources do not have identical data• For derived data, does everyone use the exact same method

(including data sources)• Are the semantics the same

Page 44: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

44

A man with one watch knows what time it isA man with two watches is never sure

-Louis Carroll

Page 45: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

45

Operational Systems

• These are the systems that run the business on a daily basis

• Most Customer interactions are with operational systems

AR

AP

Order Entry

Etcetera

Operational Systems

Page 46: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

46

Data Warehouse

• Serves as the single, officially accepted and approved, valid source for all data needed for business analysis /intelligence

• Helps insure that all Data Marts are reading from the same book and using the same rules.

• Much easier change management

• Fully normalized RDB with summary data added

DataWarehouse

----------------Archive

Page 47: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

47

Data Mart• Geared to meet the

business users need• Uses range from

– Simple reports to– Data delivered in a

manipulability form• Word• Excel• Small Data Warehouse• Star Schema

– Delivery media depends upon planned usage

DataMart forAccounting

Data Mart forSales

Etcetera

Reporting & BI Systems Reporting & BI Systems

Page 48: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

48

Getting the Data In• Vendors promise us that

the ETL process for moving data from operational systems to the Data Warehouse is “simple”

• But we must deal with– Conflict resolution– Data Quality Issues– Duplicate data– etc

AR

AP

Order Entry

Etcetra

DataWarehouse

----------------Archive

ETL

Operational Systems

Page 49: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

49

Getting the Data Out

• Generally use SQL queries

• Care must still be taken to keep dimensions consistent

• Marts sourced from a single Data Warehouse can be merged

DataWarehouse

----------------Archive

DataMart forAccounting

Data Mart forSales

Etcetra

ETL

Reporting & BI Systems

Page 50: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

50

Other Important Big PictureData Issues

Page 51: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

51

Net-Centricity

• Business Issue– Business Users may be located all over the place and

each may move around• Home, Office, Customer site, etc

– There are many computers in the network and each one

• Can execute a specified set of processes• Maintains a specified set of data bases• Is not necessarily the same as any other computer.

• The business user must be able to access any desired data and execute any desired process by connecting into any point on the network

Page 52: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

52

Key Net Centric Services

• Data services

• Process services

• Discovery services– Relies very heavily on metadata

• Minimize SPOFs

Page 53: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

53

Net-CentricData Related Issues

• If a piece of the network (or a processor) goes down, the overall system must still be functional

• Data Discovery Services• Stress on Data Quality• Data bottlenecks must be mitigated• Data replication must be planned to

– Guarantee latest version of data as soon as possible and reasonable

– Inform the user of “old” data being presented

Page 54: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

54

Governance• What are the key tasks• Who is responsible for the tasks

– Management roles and responsibilities– Worker bee roles and responsibilities

• What to do is things go wrong• Some key tasks include

– Data Stewardship– Data Quality– Business Continuity– Performance Monitoring and Tuning– Testing

Page 55: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

55

Part IIIBuilding a Dimensional Model

Page 56: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

56

Why Build a Data Mart

• OLTP systems are designed for rapid response and high transaction rates– Reporting from them is slow and degrades

performance

• Data Warehouse is big (hard for user to find things) and not designed for slicing and dicing. It is very well suited for general reporting

Page 57: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

57

Design Options for Data Marts

Design chosen must consider the needs and abilities of the user

• Excerpt from big Data Warehouse using the same or similar design

• Dimensional Model (Star or Snow Flake)

• Spreadsheet

• Word Document

etcetera

Page 58: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

58

Basic Steps for Building A Dimensional Data Mart

1. Determine the Business Needs

2. Determine Sources of Data in Data Warehouse

3. Identify the Granularity of the Data

4. Determine the data in the Fact Table(s)

5. Determine the Dimensions

6. Build the Dimensional Model

Page 59: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

59

Sales Data Warehouse

Page 60: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

60

1. Determining Business Needs

• Must be done in face to face meetings

• Ask for samples or mock up reports

• Probe for possible extensions

• Determine which type of data mart is best suited to the set of users.

Page 61: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

61

2. Determine Source of Data in DW

• Which table(s) in the data warehouse contain the data needed by the user for– Fact Tables– Dimension tables

• Is any of the data from external sources– How do you plan to get that data– How reliable is the external data

• Did you check it yourselfFortunately you have already solved your internal DQ

problems

Page 62: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

62

Dimensional Definitions

• A dimensional model requires– Fact table

• Contains the measurements or metrics or facts of business processes PLUS foreign keys to the several dimension table

– Dimension Table(s)• Context of the measurements (or facts) are

represented in dimension tables. Typically the what, where, when, when, who and how of the measurement (or fact)

Page 63: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

63

3. Identify Granularity Needed

• For the fact table– What is the lowest level of granularity (detail)

needed– Is it atomic data or summarized data– How many fact tables are needed (each will

have its own star schema)– If you wish to join data across star schemas,

you must have consistent sourcing of data and consistent dimensions. and appropriately matching granularities.

Page 64: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

64

4. Determine Data in Fact Table

• For each fact table,– What single fact (or tightly coupled group of

facts) will be included.– Determine how the fact will be extracted

from the source– What dimensions are available in the source

data (Foreign keys)– Does your source database support the

desired foreign keys.

Page 65: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

65

5. Determine the Dimensions

• Dimensions generally follow the standard who, what, where, when, why, and how– Who are there people or organizations

associated with the fact– What are the ways of categorizing the fact in

the fact table– Where: what are the locations associated with

the fact • May be physical locations or conceptual locations

Page 66: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

66

5. Dimensions continued

• When: What are the time factors involved– Day, week, month, quarter, day of week, etc

• Why is the fact included (not very common)

• How: Was any special process or method associated with the fact– E.g. store sale, phone sale, internet sale

Page 67: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

67

6. Build the Dimensional Model

• For the fact table, extract the data and all the associated dimensions– Often requires data from several tables

• For each dimension, determine the set of possible values (domain) and populate each dimension table– Note that all the needed data may not be in

the source database (e.g. suppose there were no sales from Wyoming)

Page 68: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

68

The Finale

• Test the dimensional database you built

• Deliver it to the user and train the user

Questions

Jerry Rosenbaum [email protected] 443-253-6054

Page 69: 1 Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

69

References

• Ralph Kimball and Margy Ross – The Data Warehouse Toolkit, John Wiley & Sons

• Len Silverston – The Data Model Resource Book, Vol 1 & 2 John Wiley & Sons

• Graeme Simsion – Data Modeling Essentials, 3rd EditionMorgan Kaufman