1 teaching the data warehouse course paul gray isecon 2001

42
1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

Upload: cristal-marris

Post on 30-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

1

TEACHING THE DATA WAREHOUSE COURSE

Paul GrayISECON 2001

Page 2: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

2

Outline

Overview of what is Data Warehousing

The 5 and 10 week courses The Indiana University of

Pennsylvania course

Page 3: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

3

ORIGINS Data warehouses are the results of

two software solutions needing and finding one another: Data base firms developed data

warehouses and were looking for applications

EIS and DSS software developers and vendors needed to deal with ever-increasing data bases

About 10 years ago, the two groups started interacting with the results described here.

Page 4: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

4

ORIGINS

Database developers long understood that their software was required for both transactional and analytic processing

However, their principal developments were directed to ever-larger transactional data bases. This process occurred even through operational and analytic data are separate with different requirements and different user communities.

Page 5: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

5

ORIGINS Once these differences were

understood, new data bases were created specifically for analysis use.

Today, data warehouses have 3 major applications On-line analytic processing for business

intelligence Data Mining Customer Relationship Management

Page 6: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

6

WHAT IS DATA WAREHOUSING A data warehouse is typically a dedicated data

base system for decision making that is separate from the production data base(s) used operationally. It differs from production system in that:

it covers a much longer time horizon than transaction systems

it includes multiple data bases that have been processed so that the warehouse’s data are defined uniformly (i.e., ‘clean’ data)

it is optimized for answering complex queries from managers and analysts.

Page 7: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

7

WHAT IS DATA WAREHOUSING? In the last 5 years, data warehousing has

become a major industry within computing which has brought together the ideas of data bases and decision support. It has also been the foundation for efforts in data mining and in CRM

Data mining refers to finding answers about an organization from the information in the data warehouse that the executive or the analyst had not thought to ask. Data mining is made possible by the very presence of large databases in the data warehouse. It provides techniques that allow managers to obtain managerial information from their legacy systems. Its objective is to identify valid, novel, potentially useful, and understandable patterns in data.

Page 8: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

8

WHAT IS DATA WAREHOUSING?

The objective of a data warehouse is to create a “single truth”

Data warehousing is a major new application area. It rates extremely high salaries (up to $100,000 for specialists, $300,000 for consultants).

Page 9: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

9

DEFINITION

A data warehouse is a: Subject oriented Integrated Time-variant Non-volatileCollection of data in support of

management decision processes

Page 10: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

10

NOTE:

Data warehouse is physically separated from operational systems and operational data bases

Data warehouses hold both aggregated and detailed data for management separate from the databases used for On-Line Transaction Processing (OLTP)

Page 11: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

11

CHARACTERISTICS Subject oriented

Data are organized by how users refer to it

Integrated Inconsistencies are removed in both nomenclature and conflicting information; (i.e. data are ‘clean’)

Non-volatile Read-only data. Data do not change over time.

Time series Data are time series, not current status

Page 12: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

12

CHARACTERISTICSSummarized Operational data are

mapped into decision usable form

Larger Time series implies much more data is retained

Non normalized

Data can be redundant

Metadata =Data about data

Input Unintegrated, operational en-vironment (‘legacy systems’)

Page 13: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

13

SUBJECT ORIENTATION

Data is organized around major subjects of the enterprise

Example:OPERATIONAL DATA WAREHOUSE•Loans Customer•Savings Vendor•Bank card Product•Trust Activity An application A subjectorientation orientation

Page 14: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

14

USING THE WAREHOUSE

The higher the level of summarization, the more the data is used

The more summarized the data, the quicker it is to retrieve

However, the higher the level of summarization, the lower the level of detail

Page 15: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

15

DATA MARTS Data Mart: A scaled-down version of

the data warehouse A data mart is a small warehouse

designed for the SBU or department level.

It is often a way to gain entry and provide an opportunity to learn

Major problem: if they differ from department to department, they can be difficult to integrate enterprise-wide

Page 16: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

16

COST Data warehouses are not cheap Median cost to create (does not

include operating cost) = $2.2M Multimillion dollar costs are common Their design and implementation is

still an art and they require considerable time to create

Page 17: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

17

SIZE

Being designed for the enterprise so that everyone has a common data set, they are large and increase in size with time.

Typical storage sizes run from 50 Gigabytes to several Terabytes

Page 18: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

18

SIZE OF INDUSTRY Data warehouses are a major

industry within information systems. 6B$/year

Estimates vary but it is clear that many more than 90% of Fortune 1000 have data warehouse projects

Major players include: Oracle IBM+Informix Sybase NCR + BI companies (Brio, Cognos, Pilot,….)

Page 19: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

19

MARKET COMPONENTS

High end business intelligence (OLAP)

Low end query tools Data cleansing Data marts Data mining Customer relationship management

(CRM)

Page 20: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

20

APPLICATION – OLAP/BI

OLAP = On Line analytic processing Basic idea of OLAP: managers should

be able to manipulate enterprise data models across many dimensions to understand changes that are occurring

Vendors claim they are OLAP compliant even if they are not

Page 21: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

21

APPLICATION - DATA MINING

Also known as Knowledge Data Discovery (KDD)

Mining terminology refers to finding answers about a business from the data warehouse that the executive or analyst had not thought to ask

Page 22: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

22

APPLICATION - DATA MINING

KDD applies techniques mostly from artificial intelligence and statistics to discover new information.

It is designed to find information that queries and reports don’t reveal effectively

KDD uses AI and statistics to find pattern in data and to infer rules.

Page 23: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

23

APPLICATION - DATA MINING

Some successes: People who buy scuba gear take

Australian vacations Fraud detection, consumer loan analysis Optimizing production lines IBM’s SCOUT (apocryphal?) Men who buy diapers buy

beer

Page 24: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

24

APPLICATION - DATA MINING

Associations Things done together (buy groceries)

Sequences Events over time (house.refrigerator)

Classifications Pattern recognition (rules)

Clusters Define new groups

Forecasting Predictions from time series

Page 25: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

25

CUSTOMER RELATIONSHIP MANAGEMENT (CRM)

Successor to data base marketing Implies marketing to customers on

a 1:1 basis Requires data granularity at the

level of the individual customer large amounts of data

Data warehouse is only a part of the CRM concept.

Page 26: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

26

History of Course I

First given as PhD Seminar on Management of Information Systems 1996

Overview lecture by instructor Students present topics each week

– typically 2 or 3 Great source of material!

Page 27: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

27

History of Course II Attend Data Warehouse Institute

conference in San Diego. Agree to write book with Watson. January 1997

Complete book in late 1997. Desk-top published. Used sabbatical at UCI to do the writing.

Move course to regular MS course in Spring 1997.

Page 28: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

28

Two Versions:

14 week for Claremont Graduate University

5 Week for UC Irvine Taught 1/year for last 5 years WHY DO STUDENTS TAKE

COURSE? High salaries (100K specialist, $300K

for consultant)

Page 29: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

29

Both Versions

Text: Decision Support in the Data Warehouse by P. Gray and H.J. Watson

Prentice Hall 1998 PLUS readings PLUS hands-on exercises PLUS term paper

Page 30: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

30

5 WEEK VERSION

WkPart 1 Part 21 Overview of DW OLAP, applications2 Demo Startracker Strategic use,

software Framework 3 Data sources, Planning and

cleansing,metadata operating the DW4 Data marts,ODS Industry, economics5 Mining, BI CRM, Developments

Page 31: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

31

Date 14-WEEK COURSE9-9 Introduction to DW

9-16 Architecture, Data Input

9-23 Metadata, Multiple Dimensions

9-30 Interface, Strategic Uses

10-7 Design Methods; Data Marts

10-14 Progress Report; ODS; Maintaining the DW

10-21 Midterm

10-28 Economics of DW; Personnel

11-7 Querying, Data Mining

11-14 Progress Report; CRM

11-21 Business Intelligence ; Future Developments

11-28 Student Reports

12-2 Student Reports

12-9 Final Exam

Page 32: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

32

Using Software—Star Tracker

Simple DW The Data Warehouse Toolkit :

Practical Techniques for Building Dimensional Data Warehouses by Ralph Kimball (Wiley 1996)

Commercial version “Synchrony” from If… no longer available

Page 33: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

33

Using Software – Commercial

Red Brick (now part of Informix which is now owned by IBM)

Gift from Red Brick Mostly used with student projects

Page 34: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

34

Data Sources Journal of Data Warehousing DM Review

On-line (dmreview.com) Hard copy

Lots and lots on Internet e.g., www.Dw-institute.com Pwp.starnetinc.com/larryg www.datawarehouse.org

Page 35: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

35

Assignments

Find 10 articles on Internet teaches search techniques,

nomenclature, rapid course immersion

Exercise with software The DW industry

Page 36: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

36

Term Projects for DW

Next chart shows 22 of the 30 suggested topics. Last topic allows student to choose topic subject to instructor approval.

Page 37: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

37

DW ArchitectureData sources, loading, cleaning, summarizing, granularityMetadata in a DW Data Modeling and the DWMultiple Dimensions in DWs via ROLAP and MOLAPIndexing in DWsDW Interfaces –conventional, window, browserDesign principles for DWsStrategic use of the DWJustification of the DW including C/B analysis DW SecurityData MartsOperational Data StoresMaintaining the DWEconomics of DWsQuerying and reporting in a DWOLAP and the DWDecision Support Systems and the DWDatabase marketing and the DWData Mining and the DWPersonnel considerations for building the DW and for maintaining the DWOrganizing the DW Project (Project management etc.)

Page 38: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

38

Elizabeth Pierce’s Course at Indiana University of Penn.

Paper published in Communications of AIS (CAIS) Sept. 1999 (Vol. 2 Article 16)

Developing and Delivering a Data Warehousing and Mining Course

“Introduces students to the strategies, technologies, and techniques”

Page 39: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

39

Elizabeth Pierce’s Course at Indiana University of Penn.

“Students learn what is involved in planning, designing, building, using, and managing a data warehouse. Students also learn about how a data warehouse must fit into an over-all corporate data architecture that may include legacy systems, operational data stores, enterprise data warehouses, and data marts. In addition, students are exposed to the different data mining techniques used by organizations to derive information from the data warehouse for strategic and long-term business decision making.” (abstract)

Page 40: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

40

U. Of Indiana Course

14 weeks –7 weeks on DW Challenges:

Evolving nature of the subject Hands on experience for students Textbook selection—mostly

tradebooks

Page 41: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

41

Useful Tables in Pierce

List of Internet sources Available software demos Some on-line data warehouses

with public access Schools that offered data mining

courses in 1999

Page 42: 1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001

42

Conclusions Students come because it is a job

source A good 2nd course for data base (can

be taken w/o database course) Allows covering the new topics of the

90’s and 00’s—OLAP,CRM,Mining, in more depth than Intro to IS course.

It’s a fun topic and you can even do research on it.