data warehouse prerequisites familiarity with microsoft sql server familiarity with microsoft sql...

23
Data Warehouse Prerequisites Data Warehouse Prerequisites Familiarity with Microsoft SQL Server Familiarity with Microsoft SQL Server System Administration for Microsoft SQL Server 7.0 System Administration for Microsoft SQL Server 7.0 and and Implementing a Database on Microsoft SQL Server 7.0 Implementing a Database on Microsoft SQL Server 7.0 Knowledge of Transact-SQL Usage in Knowledge of Transact-SQL Usage in Developing Developing OLTP Systems OLTP Systems Basic Understanding of Programming Basic Understanding of Programming Principles and Experience with a Scripting Principles and Experience with a Scripting Language Language Understanding of Basic Database Design, Understanding of Basic Database Design, Administration, and Implementation Concepts Administration, and Implementation Concepts

Upload: pearl-patrick

Post on 03-Jan-2016

242 views

Category:

Documents


2 download

TRANSCRIPT

Data Warehouse PrerequisitesData Warehouse Prerequisites Familiarity with Microsoft SQL ServerFamiliarity with Microsoft SQL Server System Administration for Microsoft SQL Server 7.0System Administration for Microsoft SQL Server 7.0

and and Implementing a Database on Microsoft SQL Implementing a Database on Microsoft SQL Server 7.0Server 7.0

Knowledge of Transact-SQL Usage in Developing Knowledge of Transact-SQL Usage in Developing OLTP SystemsOLTP Systems

Basic Understanding of Programming Principles and Basic Understanding of Programming Principles and Experience with a Scripting LanguageExperience with a Scripting Language

Understanding of Basic Database Design, Understanding of Basic Database Design, Administration, and Implementation ConceptsAdministration, and Implementation Concepts

What is Data Warehousing?What is Data Warehousing?

Special Thanks to Bill Inmon, the “grandfather” Special Thanks to Bill Inmon, the “grandfather”

of data warehousing.of data warehousing.

Peter RawsthornePeter Rawsthorne

OLTP vs. DSSOLTP vs. DSS Online Transaction Processing (OLTP)Online Transaction Processing (OLTP) Decision Support System (DSS)Decision Support System (DSS)

OLTPOLTP– ATM, Bank Teller, Ticket Master, POS…ATM, Bank Teller, Ticket Master, POS…

DSSDSS– Marketing, What if?, Inventory, Health (Walmart)…Marketing, What if?, Inventory, Health (Walmart)…– Click-through analysisClick-through analysis

In-class ExerciseIn-class Exercise

1.1. Break into teams of threeBreak into teams of three

2.2. Think of a business or subject areaThink of a business or subject area

3.3. Determine three OLTP systems required to support Determine three OLTP systems required to support

business or subjectbusiness or subject

4.4. Determine two DSS systems required to support Determine two DSS systems required to support

business or subjectbusiness or subject

5.5. Provide one example of how the DSS system could Provide one example of how the DSS system could

be used to predict the futurebe used to predict the future

Exercise ExampleExercise Example Business: Yacht ClubBusiness: Yacht Club OLTPOLTP

– Membership SystemMembership System– Accounting SystemAccounting System– Yacht Racing Results SystemYacht Racing Results System

DSSDSS– Quarterly and Yearly ExpensesQuarterly and Yearly Expenses– Race ResultsRace Results

How much beer will we need for next years regatta How much beer will we need for next years regatta for the male non-members who are crew on yachts for the male non-members who are crew on yachts over 40 feet?over 40 feet?

What then is a data warehouse?What then is a data warehouse?

A data warehouse is a:A data warehouse is a: subject oriented,subject oriented, integrated,integrated, time variant,time variant, non volatilenon volatile

collection of data in support of collection of data in support of management's decision management's decision making process.making process.

Subject OrientationSubject Orientation

Data is organized via Data is organized via

subject rather than process subject rather than process

or business function.or business function.

The application world is The application world is

concerned both with data concerned both with data

base design and process base design and process

design.design.

The data warehouse world The data warehouse world

focuses on data modeling focuses on data modeling

and database design and database design

exclusively.exclusively.

IntegrationIntegration Easily the most important Easily the most important

aspect of the data aspect of the data warehouse environment is warehouse environment is that data found within the that data found within the data warehouse is data warehouse is integrated. ALWAYS. WITH integrated. ALWAYS. WITH NO EXCEPTIONS.NO EXCEPTIONS.

consistent naming consistent naming conventions,conventions,

consistent measurement of consistent measurement of variables,variables,

consistent encoding consistent encoding structures,structures,

consistent physical consistent physical attributes of data,attributes of data,

and so forth.and so forth.

Time VariantTime Variant

OPERATIONALOPERATIONAL

Current valued dataCurrent valued data

Time horizon: 60 – 90 daysTime horizon: 60 – 90 days

Key fields may or may not Key fields may or may not

have an element of timehave an element of time

Data can be updatedData can be updated

DATA WAREHOUSEDATA WAREHOUSE

Snapshot dataSnapshot data

Time horizon: 5 – 10 daysTime horizon: 5 – 10 days

Keys do not have an Keys do not have an

element of timeelement of time

Once snapshot is made, Once snapshot is made,

records cannot be updatedrecords cannot be updated

non volatilenon volatile

Inserts, deletes, and changes - are done regularly to the Inserts, deletes, and changes - are done regularly to the

operational environment on a record by record basis.operational environment on a record by record basis.

There are only two kinds of operations that occur in the data There are only two kinds of operations that occur in the data

warehouse - the initial loading of data, and the access of warehouse - the initial loading of data, and the access of

data.data.

The structure of the warehouseThe structure of the warehouse

The different components of The different components of the data warehouse are:the data warehouse are:

metadata,metadata, current detail data,current detail data, old detail data,old detail data, lightly summarized data, lightly summarized data,

andand highly summarized data.highly summarized data.

Current Detail DataCurrent Detail Data

Most recent happeningsMost recent happenings

VoluminousVoluminous

Lowest level of granularityLowest level of granularity

Almost always stored on disk storageAlmost always stored on disk storage

Fast to accessFast to access

Expensive and complex to manageExpensive and complex to manage

Older detail dataOlder detail data

Stored on some form of mass storageStored on some form of mass storage

Infrequently accessedInfrequently accessed

Stored at a level of detail consistent with current Stored at a level of detail consistent with current

detailed datadetailed data

Often stored on an alternate storage mediumOften stored on an alternate storage medium

Anticipated large volumeAnticipated large volume

Lightly summarized dataLightly summarized data

Distilled from the low level of detailDistilled from the low level of detail

Almost always stored on diskAlmost always stored on disk

Design issues facing the data architect are;Design issues facing the data architect are;

– what unit of timewhat unit of time

– what contents – attributeswhat contents – attributes

Frequently mined data, a lot of “what if?”Frequently mined data, a lot of “what if?”

Highly summarized dataHighly summarized data

Compact and easily accessibleCompact and easily accessible

Sometimes found in the data warehouseSometimes found in the data warehouse

Sometimes found outside the data warehouseSometimes found outside the data warehouse

In any case, the highly summarized data is part of In any case, the highly summarized data is part of

the data warehousethe data warehouse

Yearly or multi year summariesYearly or multi year summaries

MetadataMetadata

Sits in a different dimensionSits in a different dimension

Contains no data directly taken from the Contains no data directly taken from the

operational environmentoperational environment

Special and very important roleSpecial and very important role

Metadata is used as:Metadata is used as:

– a directory to locate the contentsa directory to locate the contents

– a guide to the mapping of dataa guide to the mapping of data

– a guide to the algorithms used for summarizationa guide to the algorithms used for summarization

Metadata – levels of summarizationMetadata – levels of summarization

Flow of DataFlow of Data

data enters from the data enters from the

operational environment, it operational environment, it

is transformedis transformed

data goes into the current data goes into the current

detail level of detaildetail level of detail

It resides there and is used It resides there and is used

there until one of three there until one of three

events occurs:events occurs:

– it is purged,it is purged,

– it is summarized, and/orit is summarized, and/or

– it is archived.it is archived.

Using the Data WarehouseUsing the Data Warehouse

ExampleExample

SummarySummary A data warehouse is a A data warehouse is a

subject oriented, subject oriented,

integrated, integrated,

time variant, time variant,

non volatilenon volatile

collection of data in support of management's decision needs.collection of data in support of management's decision needs. Four levels of data warehouse data:Four levels of data warehouse data:

old detail,old detail,

current detail,current detail,

lightly summarized data, andlightly summarized data, and

highly summarized data.highly summarized data. Metadata is a very important partMetadata is a very important part

Lab deliverablesLab deliverables

W2KS InstallW2KS Install

SQL7.0 InstallSQL7.0 Install

SQL7.0 OLAP Services InstallSQL7.0 OLAP Services Install

MSPress installMSPress install

Complete MSPress Chapter 1Complete MSPress Chapter 1

Contact InformationContact Information

Peter Rawsthorne, B.Tech, MCSD, MCT, CCRPeter Rawsthorne, B.Tech, MCSD, MCT, CCR

President, Eclectic Endeavours Inc.President, Eclectic Endeavours Inc.

559A Artisan Lane559A Artisan Lane

PO Box 281 PO Box 281

Bowen Island, BCBowen Island, BC

CANADA V0N 1G0CANADA V0N 1G0

Phone: 604-947-2760Phone: 604-947-2760

Fax: 604-947-2715Fax: 604-947-2715

email: [email protected]: [email protected]

web: http://www.endeavours.comweb: http://www.endeavours.com