data warehouse prerequisites familiarity with microsoft sql server familiarity with microsoft sql...
TRANSCRIPT
Data Warehouse PrerequisitesData Warehouse Prerequisites Familiarity with Microsoft SQL ServerFamiliarity with Microsoft SQL Server System Administration for Microsoft SQL Server 7.0System Administration for Microsoft SQL Server 7.0
and and Implementing a Database on Microsoft SQL Implementing a Database on Microsoft SQL Server 7.0Server 7.0
Knowledge of Transact-SQL Usage in Developing Knowledge of Transact-SQL Usage in Developing OLTP SystemsOLTP Systems
Basic Understanding of Programming Principles and Basic Understanding of Programming Principles and Experience with a Scripting LanguageExperience with a Scripting Language
Understanding of Basic Database Design, Understanding of Basic Database Design, Administration, and Implementation ConceptsAdministration, and Implementation Concepts
What is Data Warehousing?What is Data Warehousing?
Special Thanks to Bill Inmon, the “grandfather” Special Thanks to Bill Inmon, the “grandfather”
of data warehousing.of data warehousing.
Peter RawsthornePeter Rawsthorne
OLTP vs. DSSOLTP vs. DSS Online Transaction Processing (OLTP)Online Transaction Processing (OLTP) Decision Support System (DSS)Decision Support System (DSS)
OLTPOLTP– ATM, Bank Teller, Ticket Master, POS…ATM, Bank Teller, Ticket Master, POS…
DSSDSS– Marketing, What if?, Inventory, Health (Walmart)…Marketing, What if?, Inventory, Health (Walmart)…– Click-through analysisClick-through analysis
In-class ExerciseIn-class Exercise
1.1. Break into teams of threeBreak into teams of three
2.2. Think of a business or subject areaThink of a business or subject area
3.3. Determine three OLTP systems required to support Determine three OLTP systems required to support
business or subjectbusiness or subject
4.4. Determine two DSS systems required to support Determine two DSS systems required to support
business or subjectbusiness or subject
5.5. Provide one example of how the DSS system could Provide one example of how the DSS system could
be used to predict the futurebe used to predict the future
Exercise ExampleExercise Example Business: Yacht ClubBusiness: Yacht Club OLTPOLTP
– Membership SystemMembership System– Accounting SystemAccounting System– Yacht Racing Results SystemYacht Racing Results System
DSSDSS– Quarterly and Yearly ExpensesQuarterly and Yearly Expenses– Race ResultsRace Results
How much beer will we need for next years regatta How much beer will we need for next years regatta for the male non-members who are crew on yachts for the male non-members who are crew on yachts over 40 feet?over 40 feet?
What then is a data warehouse?What then is a data warehouse?
A data warehouse is a:A data warehouse is a: subject oriented,subject oriented, integrated,integrated, time variant,time variant, non volatilenon volatile
collection of data in support of collection of data in support of management's decision management's decision making process.making process.
Subject OrientationSubject Orientation
Data is organized via Data is organized via
subject rather than process subject rather than process
or business function.or business function.
The application world is The application world is
concerned both with data concerned both with data
base design and process base design and process
design.design.
The data warehouse world The data warehouse world
focuses on data modeling focuses on data modeling
and database design and database design
exclusively.exclusively.
IntegrationIntegration Easily the most important Easily the most important
aspect of the data aspect of the data warehouse environment is warehouse environment is that data found within the that data found within the data warehouse is data warehouse is integrated. ALWAYS. WITH integrated. ALWAYS. WITH NO EXCEPTIONS.NO EXCEPTIONS.
consistent naming consistent naming conventions,conventions,
consistent measurement of consistent measurement of variables,variables,
consistent encoding consistent encoding structures,structures,
consistent physical consistent physical attributes of data,attributes of data,
and so forth.and so forth.
Time VariantTime Variant
OPERATIONALOPERATIONAL
Current valued dataCurrent valued data
Time horizon: 60 – 90 daysTime horizon: 60 – 90 days
Key fields may or may not Key fields may or may not
have an element of timehave an element of time
Data can be updatedData can be updated
DATA WAREHOUSEDATA WAREHOUSE
Snapshot dataSnapshot data
Time horizon: 5 – 10 daysTime horizon: 5 – 10 days
Keys do not have an Keys do not have an
element of timeelement of time
Once snapshot is made, Once snapshot is made,
records cannot be updatedrecords cannot be updated
non volatilenon volatile
Inserts, deletes, and changes - are done regularly to the Inserts, deletes, and changes - are done regularly to the
operational environment on a record by record basis.operational environment on a record by record basis.
There are only two kinds of operations that occur in the data There are only two kinds of operations that occur in the data
warehouse - the initial loading of data, and the access of warehouse - the initial loading of data, and the access of
data.data.
The structure of the warehouseThe structure of the warehouse
The different components of The different components of the data warehouse are:the data warehouse are:
metadata,metadata, current detail data,current detail data, old detail data,old detail data, lightly summarized data, lightly summarized data,
andand highly summarized data.highly summarized data.
Current Detail DataCurrent Detail Data
Most recent happeningsMost recent happenings
VoluminousVoluminous
Lowest level of granularityLowest level of granularity
Almost always stored on disk storageAlmost always stored on disk storage
Fast to accessFast to access
Expensive and complex to manageExpensive and complex to manage
Older detail dataOlder detail data
Stored on some form of mass storageStored on some form of mass storage
Infrequently accessedInfrequently accessed
Stored at a level of detail consistent with current Stored at a level of detail consistent with current
detailed datadetailed data
Often stored on an alternate storage mediumOften stored on an alternate storage medium
Anticipated large volumeAnticipated large volume
Lightly summarized dataLightly summarized data
Distilled from the low level of detailDistilled from the low level of detail
Almost always stored on diskAlmost always stored on disk
Design issues facing the data architect are;Design issues facing the data architect are;
– what unit of timewhat unit of time
– what contents – attributeswhat contents – attributes
Frequently mined data, a lot of “what if?”Frequently mined data, a lot of “what if?”
Highly summarized dataHighly summarized data
Compact and easily accessibleCompact and easily accessible
Sometimes found in the data warehouseSometimes found in the data warehouse
Sometimes found outside the data warehouseSometimes found outside the data warehouse
In any case, the highly summarized data is part of In any case, the highly summarized data is part of
the data warehousethe data warehouse
Yearly or multi year summariesYearly or multi year summaries
MetadataMetadata
Sits in a different dimensionSits in a different dimension
Contains no data directly taken from the Contains no data directly taken from the
operational environmentoperational environment
Special and very important roleSpecial and very important role
Metadata is used as:Metadata is used as:
– a directory to locate the contentsa directory to locate the contents
– a guide to the mapping of dataa guide to the mapping of data
– a guide to the algorithms used for summarizationa guide to the algorithms used for summarization
Flow of DataFlow of Data
data enters from the data enters from the
operational environment, it operational environment, it
is transformedis transformed
data goes into the current data goes into the current
detail level of detaildetail level of detail
It resides there and is used It resides there and is used
there until one of three there until one of three
events occurs:events occurs:
– it is purged,it is purged,
– it is summarized, and/orit is summarized, and/or
– it is archived.it is archived.
SummarySummary A data warehouse is a A data warehouse is a
subject oriented, subject oriented,
integrated, integrated,
time variant, time variant,
non volatilenon volatile
collection of data in support of management's decision needs.collection of data in support of management's decision needs. Four levels of data warehouse data:Four levels of data warehouse data:
old detail,old detail,
current detail,current detail,
lightly summarized data, andlightly summarized data, and
highly summarized data.highly summarized data. Metadata is a very important partMetadata is a very important part
Lab deliverablesLab deliverables
W2KS InstallW2KS Install
SQL7.0 InstallSQL7.0 Install
SQL7.0 OLAP Services InstallSQL7.0 OLAP Services Install
MSPress installMSPress install
Complete MSPress Chapter 1Complete MSPress Chapter 1
Contact InformationContact Information
Peter Rawsthorne, B.Tech, MCSD, MCT, CCRPeter Rawsthorne, B.Tech, MCSD, MCT, CCR
President, Eclectic Endeavours Inc.President, Eclectic Endeavours Inc.
559A Artisan Lane559A Artisan Lane
PO Box 281 PO Box 281
Bowen Island, BCBowen Island, BC
CANADA V0N 1G0CANADA V0N 1G0
Phone: 604-947-2760Phone: 604-947-2760
Fax: 604-947-2715Fax: 604-947-2715
email: [email protected]: [email protected]
web: http://www.endeavours.comweb: http://www.endeavours.com