designing a data warehouse with sql 2008

22
© 2005-2009 Kimball Group. All rights reserved. Page 1 DESIGNING A DATA WAREHOUSE WITH SQL SERVER 2008 Joy Mundy, [email protected] Introductions and Background Presenter Joy Mundy, Kimball Group Kimball Group The authors of the Data Warehouse Toolkit series of books, including the Microsoft Data Warehouse Toolkit Kimball University DW / BI courses DW / BI strategic consulting 2

Upload: thomduclos

Post on 05-Dec-2014

2.021 views

Category:

Documents


6 download

DESCRIPTION

The right reason to build a DW/BI system

TRANSCRIPT

Page 1: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 1

DESIGNING A DATA

WAREHOUSE WITH SQL

SERVER 2008

Joy Mundy, [email protected]

Introductions and Background

� Presenter

� Joy Mundy, Kimball Group

� Kimball Group

� The authors of the Data Warehouse Toolkit series of books, including the Microsoft Data Warehouse Toolkit

� Kimball University DW / BI courses

� DW / BI strategic consulting

2

Page 2: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 2

Agenda

� The “Right” reason to build a DW/BI system

� The Kimball Architecture

� The Kimball Method and Lifecycle

� Business Requirements

� Technology Track

� Data Track

� BI Applications

� Operations, Maintenance, and Growth

3

Some Possible Reasons

A. The CIO told us to

B. It sounds like fun

C. It’s a great opportunity for us to add significant value to the business

D. We’re not building a DW/BI system, just an Executive Dashboard

Which one of these is “Right”?

4

Page 3: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 3

Answer: C.

� It is a great opportunity to add business value

� It is also a great opportunity to:

� Work with senior management

� Advance your career

� Play with fun, new technology

� However, there are a few risks…

5

DW / BI System Risks

� High profile� Success (and failure) is visible to senior management

� Business driven – can be hard for technologists

� Technology focus is rarely successful � “Build it and they will come” doesn't work

� Dashboards are appropriate for mature DW/BI systems, but are not a starting point

� Data quality and integration are hard problems, even if the technology works well

� The project is complex and politically challenging� Follow a proven approach

6

Page 4: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 4

Agenda

� The “Right” reason to build a DW/BI system

� The Kimball Architecture

� The Kimball Method and Lifecycle

� Business Requirements

� Technology Track

� Data Track

� BI Applications

� Operations, Maintenance, and Growth

7

Architecture Principles

� Business requirements determine architecture

� Listen to business requirements and translate them

into functional components

� This means your DW/BI system architecture will not

be the same as your neighbor’s

� Do not build major DW/BI components because you

are supposed to

8

Page 5: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 5

Architectural Approaches

� Build reports directly from the transaction systems

� Standalone marts

� Normalized data warehouse feeding downstream marts

� Kimball dimensional data warehouse

9

Others

ESRT

DIM

Sources•HR•Projects•Siebel•Skills Dtb•CustSat files•Sales•SAP•FeedWrx•Business lists•Many others

Sales Mart

Sales+

KPI View

CSAT old

PCD

CFR

RoBCSAT+

EMR Capacity Planning

Standalone Marts

Pros•Marts reflect business requirements•Get business value this year

Cons•Multiple extracts of the same data•Multiple transforms•Inconsistent versions of the same data•10th mart takes as long to build as first

Page 6: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 6

Others

ESRT

DIM

Sources•HR•Projects•Siebel•Skills Dtb•CustSat files•Sales•SAP•FeedWrx•Business lists•Many others

Sales+

KPI View

CSAT+

Capacity Planning

Normalized DW and Downstream Marts

Enterprise Data Warehouse

(not dimensional)•Integrated•Historical•Design reflects source systems

RoB

Pros•Data extracted and consolidated only once•Marts reflect business requirements

Cons•Takes too long to build a new mart•Too many business rules between EDW & marts; we still get inconsistencies•EDW is by (and for) IT, using its language and structures•Marts are for the business

Mart B

Sources•HR•Projects•Siebel•Skills Dtb•CustSat files•Sales•SAP•FeedWrx•Business lists•Many others

Mart A

Kimball Dimensional Data Warehouse

Kimball-style Dimensional

Enterprise Data Warehouse

•Integrated & historical•Design reflects analytic requirements•Built incrementally•Contains the most detailed data possible•Fact data hooks together via shared (conformed) dimensions•Presentation area is relational or OLAP•OLAP is recommended for Msft platform•(Still need relational DW)

Pros•Data extracted and consolidated only once•DW design meets business requirements•Data is structured to support easy analytic use with good perf•Data and terms are consistent•Once data is in the DW, building new KPIs or BI applications is much easier

Cons•Takes longer to get biz value than simply throwing together a mart

User applications

•Most “marts” become views into the enterprise system•Ad hoc use is supported and encouraged

We may supplement the main DW/BI system with a handful of custom BI apps that meet specific needs. These are the exception.

Page 7: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 7

Summary of Architectures

Approach Trxn system

burden

Ease of use Time to market

Report directly from trxn systems

Very high Very poor Poor

Departmental marts Moderate Good until you need something new. Navigation challenges

“90 days”, no economies of scale

Normalized DW + marts

Low DW = poorMarts = good until you need something newNavigation challenges

Huge up-front investment. Marts are“60 days”

Kimball dimensional DW

Low Very good Large up-front investment. Excellent economies of scale.

13

Sou

rce

Sys

tem

sS

ourc

e S

yste

ms

Bus

ines

s/E

xtra

ct

Bus

ines

s/E

xtra

ct

Rul

esR

ules

Bus

ines

s/E

xtra

ct

Bus

ines

s/E

xtra

ct

Rul

esR

ules RDBMSRDBMS

OLAPOLAP

MetadataMetadataMetadataMetadataB

usin

ess

Use

rsB

usin

ess

Use

rs

Dat

a Q

ualit

yD

ata

Qua

lity

Dim

ensi

onal

izat

ion

Dim

ensi

onal

izat

ion

The Microsoft DW/BI Technical Architecture

••SharePointSharePoint••Report BuilderReport Builder••Performance Performance PointPoint

Page 8: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 8

Agenda

� The “Right” reason to build a DW/BI system

� The Kimball Architecture

� The Kimball Method and Lifecycle

� Business Requirements

� Technology Track

� Data Track

� BI Applications

� Operations, Maintenance, and Growth

15

Kimball Method Basic Principles

� Business driven

� Iterative Lifecycle

� Dimensional model for data delivery

� Enterprise data framework

� Bus Matrix

� Conformed dimensions

� Full solution from extracts to business value

16

Page 9: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 9

The Kimball DW/BI Lifecycle

Key Concepts:- Business centric- Full solution- Enterprise aware

- Dimensional delivery- Iterative process- Incremental growth

TechnicalArchitecture

Design

BIApplicationSpecification

BIApplicationSpecification

BIApplicationDevelopment

ProjectPlanning

BusinessRequire-

mentsDefinition

BusinessRequire-

mentsDefinition

Maintenance

Project ManagementProject Management

PhysicalDesign

ETL Design &Development

DimensionalModeling

DimensionalModeling

Deployment

Growth

ProductSelection &Installation

17

Agenda

� The “Right” reason to build a DW/BI system

� The Kimball Architecture

� The Kimball Method and Lifecycle

� Business Requirements

� Technology Track

� Data Track

� BI Applications

� Operations, Maintenance, and Growth

18

Page 10: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 10

Business Requirements (1)

� Interview key people across the org

� Ask “What do you do?” not “What do you want?”

� It is our job to design the solution, not theirs

� Look for common analytic themes

� Better promotion response rate

� Improve sales performance

� Break themes down into business processes that generate needed data

� Promotions Responses Orders

19

Business Requirements (2)

� Design the data warehouse Enterprise Bus Matrix

� Prioritize themes with senior management

� Summarize finding in a Requirements Document

� Identify and recruit good business sponsor(s)

� Visionary

� Influential

� Reasonable

20

Page 11: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 11

Profile the Data

� Early and often

� Does the data exist to support the required analysis?

� Where are the problems affecting ETL design

� Primary keys

� Referential integrity

� NULL values

� Junk values

� The dreaded “Notes” field

� SSIS 2008 has useful data profiling functionality

21

Requirements Prioritization Based on Value and Feasibility

Low

High

Bu

sin

ess

Val

ue

/ Im

pac

t

HighLow Feasibility

Orders

CallTracking

OrdersForecast

Returns

ExchangeRates

ManufacturingCosts

ProductProfitability

CustomerProfitability

Shipping

Key Concepts:� Created in a meeting with Senior Mgmt.� Boxes come from Bus. Requirements

� Relative value is a business decision� Relative feasibility needs IT input

Promotions

Page 12: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 12

Enterprise Bus Matrix

Adventure Works <-- Conformed Dimensions -->Data Warehouse Bus Matrix

Business Process Bus

ines

s P

rior

ity

Dat

e (O

rder

, Sta

rt, S

hip)

Pro

duct

Pro

mot

ion

End

Cus

tom

er

Em

ploy

ee

Res

elle

r

Pag

e

Inte

rnet

Reg

iste

red

Use

r

Par

t

Ven

dor

Shi

pper

Pro

blem

Orders Forecasting 2 x x x x xReseller Orders 1 x x x x xInternet Orders 1 x x x x x xPurchasing x x x x x x xParts Inventory x x x x xManufacturing 6 x x xFinished Goods Inv. x x xShipping 3 x x x x x x xReturns 5 x x x x x xCustomer Calls 4 x x x x x x x xWeb Support 4 x x x x x x x x

Key Concepts:Key Concepts:� The high level DW/BI data architecture� Rows = Business Processes� Columns = Conformed Dimensions� DW/BI system implemented row by row based on business priority

23

Agenda

� The “Right” reason to build a DW/BI system

� The Kimball Architecture

� The Kimball Method and Lifecycle

� Business Requirements

� Technology Track

� Data Track

� BI Applications

� Operations, Maintenance, and Growth

24

Page 13: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 13

Microsoft Technology for the DW Back Room

� Integration Services is a competitive ETL tool� Great performance, solid toolbox

� Relational Database is strong BI platform

� Key BI-related features, including partitioning, compression, and star join optimization

� Analysis Services is OLAP market leader� Dimensional design is flexible

� More scalable and manageable

� Data Mining – strong mining platform, leverages AS for speed; good integration

25

Relational vs. OLAP (Why OLAP?)

� Relational strengths

� Data management

� Flexibility

� OLAP strengths

� Analytic language

� Ad hoc query performance

� Metadata layer

� Security, especially for ad hoc queries

26

Page 14: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 14

Microsoft Technology for the DW/BI Front Room

� Reporting Services� Good enterprise platform

� Programmer-oriented report designer

� Limited ad hoc query

� Data presentation� Office, SharePoint, [ProClarity]

� Integrated development (VS) and management environments

� Scale – technology can scale to multi-TBs� Plan to spend more time and $, including on significant consulting

expertise.

� Real-time features

27

Agenda

� The “Right” reason to build a DW/BI system

� The Kimball Architecture

� The Kimball Method and Lifecycle

� Business Requirements

� Technology Track

� Data Track

� BI Applications

� Operations, Maintenance, and Growth

28

Page 15: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 15

The Dimensional Model (the Target)

� Based on top business priority data area

� Fact table = measurement of business events

� Dimension tables = objects that participate in business events (Customer, Product, Date, …)

� Surrogate keys (meaningless integer)

� Slowly changing dimensions

� Type 1 = Overwrite old values with new

� Type 2 = Add a new row when values change

� Identify data quality issues now

29

Relational Dimensional Model

Product KeyProduct KeyCustomer KeyCustomer KeyDate KeyDate Key… other keys… other keys

Sales AmountSales AmountSales QuantitySales Quantity… other … other measuresmeasures

Sales FactSales Fact

ProductProductDateDate

CustomerCustomerOther dims…Other dims…

Page 16: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 16

Surrogate Keys

� Dimension PKs should be surrogate (meaningless) keys

� Managed by the DW

� Usually an integer type

� Usually populated via IDENTITY keyword in dimension table definition

� Why?� Small (int) keys are vital for performance

� The source system will re-use keys. They swear they won’t. But they will.

� Enables dimension attribute change tracking

31

Surrogate Keys and ETL

� Dimensions

� Carry source system key(s) as non-key attributes in the dimension

� New rows automatically get a new surrogate key

� Facts

� Fact table usually does not contain source system keys

� Final step of fact processing is to exchange the source system keys for DW surrogate keys

� Lookup to dimension tables based on source key, returning surrogate key

32

Page 17: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 17

Conformed Dimensions

� One master dimension table that all fact tables subscribe to

� Get agreement organization-wide on:

� What the dimensions are called

� Which hierarchies you have

� Similar-but-different attributes and hierarchies have different names

� Which attributes are managed by restating history and which by tracking history

� Create two sets of attributes if you need it both ways

� Why?

� Single version of the truth

� Flexibility of basic design

33

Agenda

� The “Right” reason to build a DW/BI system

� The Kimball Architecture

� The Kimball Method and Lifecycle

� Business Requirements

� Technology Track

� Data Track

� BI Applications

� Operations, Maintenance, and Growth

34

Page 18: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 18

The Need for BI Applications

� Approximately 10% of your user population will learn how to build ad hoc queries

� They must learn the tool AND the data

� This means you must build applications to provide access to the other 90%

� Structured

� Flexible (parameters, pick lists, formats)

� Well organized

35

BI Application Steps

1. BI Application design and specs

� Right after business requirements

� Template, mock-ups, specs, navigation framework

2. BI Application development

� Can’t start until data and tools are available

� Pull out your specs and get to work

� Best to do this as part of Beta testing

36

Page 19: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 19

Standard Reports

� We recommend going live with a modest number of reports (8-12)

� Enlist business users in creating and QA-ing reports

� Users don’t know what they want until you show them something

� Lots of reports are “theme and variations” –parameterize them!

� Build a BI portal to host the reports

� Brand it with the DW/BI logo

� Add useful info about operations, contents, and help

37

Advanced BI Applications

� Planning and forecasting applications� You need a decent history of fairly accurate data before you can

plan / forecast

� Planning and forecasting activities are highly analytic, with a little bit of writeback

� Heavy emphasis on “what-if”

� Data mining

� Collection of statistical techniques to identify trends and correlations

� Requires detailed (atomic) data

� Can be the most valuable thing you do with your DW/BI system

� Advanced BI apps are not Phase 1 projects

38

Page 20: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 20

Agenda

� The “Right” reason to build a DW/BI system

� The Kimball Architecture

� The Kimball Method and Lifecycle

� Business Requirements

� Technology Track

� Data Track

� BI Applications

� Operations, Maintenance, and Growth

39

Deployment, Maintenance, and Growth

� Deployment has two major components� Software and data availability (dev, test, prod)

� User preparedness (training, documentation, and support)

� Maintenance

� Monitor usage and performance

� Anticipate problems

� Growth� Iterate back through the Lifecycle with the next priority business process

40

Page 21: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 21

Session Summary

� The DW/BI system can be high value, but it is definitely high risk

� Reduce risk by using an approach based on

� business requirements

� a flexible data architecture

� delivering the full solution

� Microsoft SQL Server 2008 provides the full technology stack for DW/BI systems

� SQL Server 2008 is well suited for the Kimball Method

41

Next Steps

� Learn about your business

� Strategies, challenges, opportunities, terms

� Industry, competition, trends

� Learn the Kimball Method

� Learn about adding business value

� Learn the Lifecycle approach

� Learn the Microsoft SQL Server 2008 DW/BI toolset

� Get started!

� Do a high level requirements definition and prioritization

42

Page 22: Designing A Data Warehouse With Sql 2008

© 2005-2009 Kimball Group. All rights reserved. Page 22

For More Information…

� Kimball University� Next 4-day Microsoft class on 3/31 (Chicago). Stockholm in May.

� Other classes in modeling, lifecycle, and ETL throughout the year

� Websites� www.kimballgroup.com: articles and design tips

� forum.kimballgroup.com: the Kimball Forum

� Kimball Books� The Microsoft Data Warehouse Toolkit, Joy Mundy and Warren

Thornthwaite with Ralph Kimball, Wiley, 2006 (the Microsoft book)

� The Data Warehouse Toolkit 2nd Edition, Ralph Kimball and Margy Ross, Wiley, 2002 (the modeling book)

� The Data Warehouse Lifecycle Toolkit 2nd Edition, Kimball, Ross and Thornthwaite, Wiley, 2008 (how to build a DW)

� The Data Warehouse ETL Toolkit, Kimball and Caserta, Wiley, 2004 (ETL theory and practice)

43