data warehousing: a proven solution to sustaining a vibrant business jerry hammons art brooks...

37
Data Warehousing: A Proven Solution to Sustaining a Vibrant Business Jerry Hammons Art Brooks EDUCAUSE 2006 Dallas, Texas Poster Session Copyright Jerry Hammons 2006. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.

Upload: darrin-roche

Post on 11-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Data Warehousing: A Proven Solution to Sustaining a

Vibrant Business

Jerry HammonsArt Brooks

EDUCAUSE 2006Dallas, TexasPoster Session

Copyright Jerry Hammons 2006. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.

The Evolving Data Warehouse• UMR Started data warehouse adventure in 1986.UMR Started data warehouse adventure in 1986.• Over 18 years the data warehouse and reporting Over 18 years the data warehouse and reporting

has transitioned:has transitioned:– from local mainframe SQL/DS to local from local mainframe SQL/DS to local

client/server with Informixclient/server with Informix– from Informix to Oraclefrom Informix to Oracle– from native SQL for report creation to InfoMakerfrom native SQL for report creation to InfoMaker– through four strategic methodological changesthrough four strategic methodological changes– through three version upgrades in the Advance through three version upgrades in the Advance

systemsystem– through implementation of PeopleSoftthrough implementation of PeopleSoft– through one PeopleSoft version upgrade.through one PeopleSoft version upgrade.

• Is user oriented.Is user oriented.

• Is an entity separate from the Is an entity separate from the transactional system.transactional system.

• Must be insulated from transactional Must be insulated from transactional system changes.system changes.

• Must include a transitional bridge.Must include a transitional bridge.

• Can begin BEFORE a system conversion.Can begin BEFORE a system conversion.

ReportingReporting::

Strategic StatementStrategic Statement• ““If data can be converted from legacy to If data can be converted from legacy to

PeopleSoft, then data can be translated PeopleSoft, then data can be translated from PeopleSoft to legacy.” (We from PeopleSoft to legacy.” (We NEVERNEVER stated ALL of the data could be translated.)stated ALL of the data could be translated.)

• Terms:Terms:– ConvertedConverted – data in a legacy format, modified – data in a legacy format, modified

and loaded in PeopleSoft tablesand loaded in PeopleSoft tables– Translated Translated – data in a PeopleSoft format, – data in a PeopleSoft format,

modified and loaded in legacy defined tables.modified and loaded in legacy defined tables.

The UMR ApproachThe UMR Approach• FFree standingree standing

• AAmbiguouslymbiguously

• RRelatedelated

• EEntitiesntities

In contrast to such conventional In contrast to such conventional approaches as star schemas, fact approaches as star schemas, fact tables and dimension tables.tables and dimension tables.

The Reporting FoundationThe Reporting Foundation

• ‘‘Event oriented, functional tables’:Event oriented, functional tables’:relational tables designed for a specific relational tables designed for a specific reporting need that draws data from reporting need that draws data from multiple data warehouse tables, multiple data warehouse tables, focusing on the needs of the user and focusing on the needs of the user and not the technical staff.not the technical staff. GoalGoal – reduce – reduce technical requirements to formatting technical requirements to formatting the output page. the output page.

SIMPLICITY!!SIMPLICITY!!

‘‘Functional’ Table Concept was Functional’ Table Concept was developed to:developed to:

– 1. ‘Empower the users’1. ‘Empower the users’– 2. Simplify the data structure2. Simplify the data structure– 3. Reduce report development time3. Reduce report development time– 4. Reduce processing time for the 4. Reduce processing time for the

server (quicker response)server (quicker response)– 5. Improve programmer efficiency5. Improve programmer efficiency– 6. Provide another tool for 6. Provide another tool for

reportingreporting

YOU can do this!YOU can do this!

ConceptConcept– SimplicitySimplicity– Zero table joinsZero table joins– Zero ‘Where’ statementsZero ‘Where’ statements– Event orientedEvent oriented– Report design centricReport design centric– User takes less than five minutes to User takes less than five minutes to

develop querydevelop query– User concentrates on report appearanceUser concentrates on report appearance– Estimated eleven fold reduction in code Estimated eleven fold reduction in code

to create a reportto create a report

Functional TablesFunctional Tables

• Subsets of existing of existing application(s)Subsets of existing of existing application(s)

• Single table can contain data from disparate Single table can contain data from disparate applicationsapplications

• Do not HAVE to be homogeneousDo not HAVE to be homogeneous

• Created from pre-selection routinesCreated from pre-selection routines

• Create logic continuityCreate logic continuity

• May be a view or a physical tableMay be a view or a physical table

Functional Table -- Proven ApproachFunctional Table -- Proven Approach• Concept developed in 1998Concept developed in 1998

• First put into use in 1999 with conversion of First put into use in 1999 with conversion of University Advancement system from University Advancement system from mainframe to client/servermainframe to client/server

• Over 500 new reports created for that Over 500 new reports created for that department using this techniquedepartment using this technique

• Utilized in 2000 to retain orientation system Utilized in 2000 to retain orientation system when Admissions implemented PeopleSoftwhen Admissions implemented PeopleSoft

• An integral part of reporting solution at UMR An integral part of reporting solution at UMR prior to 2004 Registrar’s PeopleSoft prior to 2004 Registrar’s PeopleSoft implementationimplementation

• ProcessProcess– User identifies report needed (presents User identifies report needed (presents

sample or attempts to sketch idea)sample or attempts to sketch idea)– User and professional staff member User and professional staff member

meet to discuss new reportmeet to discuss new report– Programmer identifies transactional Programmer identifies transactional

data needed to create reportdata needed to create report– Programmer creates and loads tableProgrammer creates and loads table– User creates report.User creates report.

Legacy PeopleSoft

UMRDW

UMR Download ProcessUMR Download Process

Download scripts

Functional table scripts

Campussystems

Approximately 300 tables

Approaching 5 gigabytes

Report Prepared direct from PeopleSoftReport Prepared direct from PeopleSoft

PS table1

PS table2

PS table7

PS table3

PS table4

PS table5

PS table6

12 Processtables

Query

Report:Count number of

Freshmen forSpecified term

Sub Query1

Sub Query2

Sub Query3

Sub Query4

Sub Query5

12 Processtables

12 Processtables

12 Processtables

12 Processtables

12 Processtables

Report prepared from UMR Report prepared from UMR functional tablefunctional table

Report:Count number of

Freshmen forSpecified term

UMR functiona

l table

Query

The ChallengeThe Challenge• With modules being implemented over a With modules being implemented over a

six year period of time, applications and six year period of time, applications and reports had to continue to function when reports had to continue to function when some data was in the mainframe in a some data was in the mainframe in a legacy format and other data was in legacy format and other data was in PeopleSoft in a client server/relational PeopleSoft in a client server/relational environment. environment.

• This required applications and reports to This required applications and reports to draw data from tables that had diverse draw data from tables that had diverse sources.sources.

• Inventory indicated over 3,000 reports and Inventory indicated over 3,000 reports and 25 applications built over a 10+ year 25 applications built over a 10+ year period of time would cease to function. period of time would cease to function.

Report Creation Comparisons

select a.admajor, count(*)from admfresh a where a.adterm=’FS2001’and a.fraction=’A’and a.fractiondate<=sysdategroup by a.admajor”;

select f.acad_plan, count(*)from ps_pers_data_effdt a, ps_pers_dtef_sa_vw a1, ps_stdnt_career b,ps_adm_appl_data c, ps_adm_app_car_seq d,ps_adm_appl_prog e, ps_adm_appl_plan f,where a.emplid=a1.emplid andfrom ps_pers_data_effdt a_eda.effdt=a1.effdt and(a.effdt=(select max(a_ed.effdt)where a.emplid=a_ed.emplid anda_ed.effdt<=sysdate) anda.emplid=b.emplid and b.acad_career=c.acad_career and c.emplid=d.emplid and c.acad_career=d.acad_career and c.stdnt_car_nbr=d.stdnt_car_nbr and c.adm_appl_nbr=d.adm_appl_nbr and d.emplid=e.emplid and d.acad_career = e.acad_career and d.stdnt_car_nbr = e.stdnt_car_nbr and d.adm_appl_nbr = e.adm_appl_nbr and d.appl_prog_nbr = e.appl_prog_nbr and e.effdt=(select max(e_ed.effdt)from ps_adm_appl_prog e_edwhere e.emplid=e_ed.emplid ande.acad_career=e_ed.acad_career ande.stdnt_car_nbr=e_ed.stdnt_car_nbr ande.adm_appl_nbr=e_ed.adm_appl_nbr ande.appl_prog_nbr = e_ed.appl_prog_nbr and e_ed.effdt <=sysdate) andfrom ps_adm_appl_prog e_eswhere e.emplid=e_es.emplid and e.acad_career=e_es.acad_career ande.stdnt_car_nbr=e_es.stdnt_car_nbr and e.adm_appl_nbr=e_es.adm_appl_nbr ande.appl_prog_nbr=e_es.appl_prog_nbr and e.effdt=e_es.effdt ande.prog_status=e_es.prog_status) and e.admit_term=’FS2001’ ande.prog_status in (‘AC’, ‘AD’) and e.action_dt <= sysdate andc.admit_type=’FTC’ and d.acad_career=’UGRD’ ande.emplid = f.emplid ande.acad_career=f.acad_career and e.stdnt_car_nbr=f.stdnt_car_nbr and e.adm_appl_nbr=f.adm_appl_nbr and e.appl_prog_nbr = f.appl_prog_nbr and f.effdt=(select max(f_ed.effdt)from ps_adm_appl_plan f_ed where f.emplid=f_ed.emplid and f.acad_career=f_ed.acad_career andf.stdnt_car_nbr=f_ed.stdnt_car_nbr and f.adm_appl_nbr=f_ed.adm_appl_nbr andf.appl_prog_nbr=f_ed.appl_prog_nbr andf_ed.effdt<=e.effdt) andf.effseq=(select max(f_es.effseq)from ps_adm_appl_plan f_es where f.emplid=f_es.emplid andf.acad_career=f_es.acad_career and f.stdnt_car_nbr=f_es.stdnt_car_nbr and f.adm_appl_nbr=f_es.adm_appl_nbr and f.appl_prog_nbr=f_es.appl_prog_nbr and f.effdt=f_es.effdt))group by f.acad_plan;

1query

1 table

0 joins

3 'where' statements

6 lines

1 query

7 distinct tables

12 process tables

6 joins

5 sub queries

50 'where' statements

70 lines

UMR Functional Table Format PeopleSoft Format

Each of these queries counts

the number of freshmen

admitted for a specified

term. The results are the same.

Comparison of Relational Comparison of Relational ApproachesApproaches

Fact HR table

Appt data

Ed. data

Benefit data

Address data

Dept data

Bio data

Fact and dimension structure with Star Schema

Functional HR table with same data

UMR Approach

PeopleSoftReporting Instance

Original DataWarehouse tables

Phase 1Retain current Data Warehouse

model

Phase 2

Create Transitional Bridge

Insert PeopleSoft columns incurrent table shemas

Phase 3

Final Data Warehouse model

Cease populating original DataWarehouse columns with

translated data

Modified DataWarehouse tables

New DataWarehouse model

Maintaining the BusinessContinuity in a Major System

Implementation

Month 1 -2 Month 2 - Year 2Year 2 and

Beyond

Focus — retaining reports/applications Focus — re-engineering

downloads, modifyingreports/applications &creating new reports/

applications usingmixed forms of data

Focus — creating newreports/applications

using PS vales solely

Translate data valueswhere feasible

Hybrid tables, i.e. mix ofdata values

Remove original columnsif no longer used. New

tables created as needed

• Hybrid tablesHybrid tables– After further experience and discussion it was After further experience and discussion it was

realized the functional tables could be hybridized realized the functional tables could be hybridized to satisfy specific reporting needs and to provide to satisfy specific reporting needs and to provide a transitional bridge to the future.a transitional bridge to the future.

– Definition – a hybrid functional table is one that Definition – a hybrid functional table is one that has data derived from disparate systems. has data derived from disparate systems. (normally legacy and PeopleSoft)(normally legacy and PeopleSoft)

– Hybrid tables can become transitional tables.Hybrid tables can become transitional tables.– With time, hybrid tables can become normal With time, hybrid tables can become normal

functional tables. (When the legacy data is no functional tables. (When the legacy data is no longer required, the columns cease to be filled longer required, the columns cease to be filled or are removed.)or are removed.)

Evolution of a Hybrid Table

Stuno Gender Vet_code Citizenship Fin_Aid_Int Mar_Stat Eth_Org DOB

Studnt # M/F 1 Jpn Y/N S 6 MM/DD/YYYY

BIO Table

-- PeopleSoft value

Original -- UMR Data Warehouse Table Name/Structure and Column Names

StunoOld Stuno Gender

Vet_code

PS_Mil_St

Citizenship

Fin_Aid_Int Mar_Stat

PS_Mar_St Eth_Org DOB

Sp Interests

Emplid Studnt # M 1 Vet Jpn   S Single 6   Band

   -- translated values PS to UMR DW   -- new PeopleSoft data item

First Degree of Hybridization

Second Degree of HybridizationBIO Table

BIO Table

Stuno

Old Stuno

Gender

PS_Gender

Vet_code

PS_Mil_St

Citizenship

PS_Citzn

Mar_Stat

PS_Mar_St

Eth_Org

PS_Eth_Org DOB

Sp Interests

Emplid

Studnt # M Male 1 Vet Jpn Japan S Single 6

Caucasian   Band

Not needed or found at this level

Third Degree of HybridizationBIO Table

Stuno Gender PS_Mil_St PS_Citzn PS_Mar_StPS_Eth_Org DOB Sp Interests

Ambassador

Collection_ID

Emplid Male Vet Japan Single Caucasian YYYY/MM/DD Band YES 12345

* the BIO table draws its data from 11 PeopleSoft tables

Approach ComparisonsApproach ComparisonsFact and dimensionFact and dimension

1.1. Less complex downloads.Less complex downloads.

2.2. Better potential for Better potential for documentation.documentation.

3.3. Better potential to Better potential to electronically trace electronically trace schema.schema.

4.4. Tables and column Tables and column names more likely to be names more likely to be familiarfamiliar with power with power users.users.

5.5. More potential for More potential for system-wide assistance.system-wide assistance.

6.6. Better understood by Better understood by trained technicians.trained technicians.

Functional tablesFunctional tables

1.1. More simple More simple presentation.presentation.

2.2. Faster ramp up.Faster ramp up.

3.3. Less impact with Less impact with transactional changes.transactional changes.

4.4. Faster execution.Faster execution.

5.5. Extended user Extended user potential.potential.

6.6. Continues longitudinal Continues longitudinal studies.studies.

7.7. Tables and column Tables and column names more likely to names more likely to be understandable by be understandable by casual users.casual users.

UMR Startup Tactic ApproachUMR Startup Tactic Approach

Users did not see the difference.Users did not see the difference.

UMR DWUMR DW

ExistingReports &

Applications

Legacy Tables

Bridge tables

Hybrid tables

PeopleSoft System

NewReports

Multi-year reportsMulti-year reports

ExistingReportsLegacy Tables

Bridge tables

Hybrid tables

NewReports Longitudina

l studies/ trend

analysis requests

PeopleSoft tables

UMR DW

ResultsResults• Translated PeopleSoft admissions data for 3 years Translated PeopleSoft admissions data for 3 years

without interrupting production.without interrupting production.

• Integrated PeopleSoft HR with student legacy for 2 Integrated PeopleSoft HR with student legacy for 2 years without interrupting production.years without interrupting production.

• Integrated legacy and PeopleSoft grant data for a fiscal Integrated legacy and PeopleSoft grant data for a fiscal year report.year report.

• Amended HR data without affecting applications.Amended HR data without affecting applications.

• Had 400 reports and all applications needing student Had 400 reports and all applications needing student data in production by the end of the 4data in production by the end of the 4thth week of classes. week of classes.

• Integrated admissions PeopleSoft data with legacy Integrated admissions PeopleSoft data with legacy student data until student system went live with student data until student system went live with PeopleSoftPeopleSoft

Greek Life ReportsSet of 6 reports

UMR DW2 tables

1 PeopleSoft table

Greek Life Report Prepared from PeopleSoftPull ‘major and level

PeopleSoft — 12 tables,effective dated

500 lines of code,multiple steps

Pull GPA data

PeopleSoft — 5tables effective

dated

Pull Student Organization

Pull gender

CURRTERM tableWS2004

pull‘major’, level,

jurisdiction

PeopleSoft — 5 tableseffective dated

BIO tablepull

gender, studentorganization

TERM tableWS2004

Greek Life ReportsSet of 6 reports

Greek Life Reports Created from PeopleSoft and UMRData Warehouse

Pull GPA dataUMR data warehouseexisting

UMR Data warehouseexisting

UMR data warehouseexisting

AppropriateData Warehouse

TablesFAS

Legacy System

PeopleSoft System

Conversion to PeopleSoft

Translation scripts

Translation scripts

Table names could be legacy UMDW

names

Table names could be new PS names

Conceptual Approach to Providing Data to FAS

Campus conversions to PeopleSoft and PeopleSoft upgrades become transparent to FAS

Names of tables or columns are irrelevant in developing new

applications/reports. If existing applications/reports are involved, then legacy names could be

used and the tables become hybrid tables and transitional bridges.

What’s Next at UMRWhat’s Next at UMR

The Next Generation UMR Data Warehouse

PeopleSoft prod

PeopleSoft rpt

UMR server

Physical tables

Logical tables

Download scripts

Formatting scripts

Columbia

Rolla UMR DW

Reports

Applications / Feeds

Extract

Load

Transform

Current UMR Data Warehouse Process

6 -7 hours process-- points of failure

infrequent

inconsistent

completion

infrequent

Nighttime primarily

infrequent

Data can be out of dateNear daily load failures

Numerous points of failureNighttime connection not as stable

PS process unpredictable

Labor intensive

Lengthy process time

Vendor prod server

Vendor rpt server

Logical table scripts

Logical tables

Central

Local

Enterprise DW

Local data

Spec appl

Other data

Local Virtual Data Warehouse Nightly Load Concept

Hour or less process

Allows transparent and phased transition

Existing local scripts

-- points of failure

inconsistent

completion

infrequent

Very limited physical data space

Data same state as vendor rpt

Fewer points of failureMore flexible

ODS

Shorter processing time

Nightly process

Load process strictly at Central site

potential sources

primary source

firewall

Vendor rpt server

Logical tables

Central

LocalReports

Applications/Feeds

Enterprise DW

Local data

Spec appl

Other data

Virtual Local Data Warehouse Reporting Concept

Allows transparent and phased transition

-- points of failure

infrequent

Daytime primarily

Data same state as vendor rpt

More flexible

ODS

potential data sources

primary data source

Response time could be slightly longer

Quicker response to connectivity issues

Solution ComparisonsSolution Comparisons

Labor intensive

Highly vulnerable to process disruptions

Longer refresh window

Failures not found until start of bus. day

Protected against telecommunication issues

Faster response time

Slower response time

Vulnerable to telecommunication issues

Fewer points of failure

Greatly shortened refresh window

Data at same point as vendor source

More flexible

Requires less space

Traditional Solution “Virtual” Solution

The Transitional BridgeThe Transitional BridgeLegacy mainframe system

PeopleSoft – Std Adm/ver 7.6

Client/server

UMR Data Warehouse

~ 3,000 reports

23 applications

Converted from IMS legacy system Jan, 2004

ALL required reports and applications operating by second day of classes

400 reports and ALL applications functioning by end of fourth week of classes

And legacy removal

The Transitional BridgeThe Transitional Bridge

PeopleSoft – Std Adm/ver 7.6

Client/serverUMR Data Warehouse

~ 3,000 reports

23 applications

Upgraded to ver 8.0 Jul, 2004

Upgraded to ver 8.9 Jul, 2005

No impact observed with ANY report or application for either upgrade

And version upgrades

It works.It works.

It CAN work for you.

Jerry HammonsJerry HammonsSupervisor, Data Warehousing/ReportingSupervisor, Data Warehousing/ReportingInformation TechnologyInformation TechnologyUniversity of Missouri – RollaUniversity of Missouri – RollaE-mail: E-mail: [email protected]@umr.edu

Art BrooksArt BrooksAdjunct Professor/IT Appl Dir (retired) Adjunct Professor/IT Appl Dir (retired) University of Missouri – RollaUniversity of Missouri – RollaE-mail: E-mail: [email protected]@umr.edu

ContactContact

SORRY. We’re out of SORRY. We’re out of handouts.handouts.

If you would like a copy of our handout, please leave your business card, with email address and we will send you a copy.