data warehousing: a proven solution to sustaining a vibrant business jerry hammons art brooks...
TRANSCRIPT
Data Warehousing: A Proven Solution to Sustaining a
Vibrant Business
Jerry HammonsArt Brooks
EDUCAUSE 2006Dallas, TexasPoster Session
Copyright Jerry Hammons 2006. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.
The Evolving Data Warehouse• UMR Started data warehouse adventure in 1986.UMR Started data warehouse adventure in 1986.• Over 18 years the data warehouse and reporting Over 18 years the data warehouse and reporting
has transitioned:has transitioned:– from local mainframe SQL/DS to local from local mainframe SQL/DS to local
client/server with Informixclient/server with Informix– from Informix to Oraclefrom Informix to Oracle– from native SQL for report creation to InfoMakerfrom native SQL for report creation to InfoMaker– through four strategic methodological changesthrough four strategic methodological changes– through three version upgrades in the Advance through three version upgrades in the Advance
systemsystem– through implementation of PeopleSoftthrough implementation of PeopleSoft– through one PeopleSoft version upgrade.through one PeopleSoft version upgrade.
• Is user oriented.Is user oriented.
• Is an entity separate from the Is an entity separate from the transactional system.transactional system.
• Must be insulated from transactional Must be insulated from transactional system changes.system changes.
• Must include a transitional bridge.Must include a transitional bridge.
• Can begin BEFORE a system conversion.Can begin BEFORE a system conversion.
ReportingReporting::
Strategic StatementStrategic Statement• ““If data can be converted from legacy to If data can be converted from legacy to
PeopleSoft, then data can be translated PeopleSoft, then data can be translated from PeopleSoft to legacy.” (We from PeopleSoft to legacy.” (We NEVERNEVER stated ALL of the data could be translated.)stated ALL of the data could be translated.)
• Terms:Terms:– ConvertedConverted – data in a legacy format, modified – data in a legacy format, modified
and loaded in PeopleSoft tablesand loaded in PeopleSoft tables– Translated Translated – data in a PeopleSoft format, – data in a PeopleSoft format,
modified and loaded in legacy defined tables.modified and loaded in legacy defined tables.
The UMR ApproachThe UMR Approach• FFree standingree standing
• AAmbiguouslymbiguously
• RRelatedelated
• EEntitiesntities
In contrast to such conventional In contrast to such conventional approaches as star schemas, fact approaches as star schemas, fact tables and dimension tables.tables and dimension tables.
The Reporting FoundationThe Reporting Foundation
• ‘‘Event oriented, functional tables’:Event oriented, functional tables’:relational tables designed for a specific relational tables designed for a specific reporting need that draws data from reporting need that draws data from multiple data warehouse tables, multiple data warehouse tables, focusing on the needs of the user and focusing on the needs of the user and not the technical staff.not the technical staff. GoalGoal – reduce – reduce technical requirements to formatting technical requirements to formatting the output page. the output page.
SIMPLICITY!!SIMPLICITY!!
‘‘Functional’ Table Concept was Functional’ Table Concept was developed to:developed to:
– 1. ‘Empower the users’1. ‘Empower the users’– 2. Simplify the data structure2. Simplify the data structure– 3. Reduce report development time3. Reduce report development time– 4. Reduce processing time for the 4. Reduce processing time for the
server (quicker response)server (quicker response)– 5. Improve programmer efficiency5. Improve programmer efficiency– 6. Provide another tool for 6. Provide another tool for
reportingreporting
YOU can do this!YOU can do this!
ConceptConcept– SimplicitySimplicity– Zero table joinsZero table joins– Zero ‘Where’ statementsZero ‘Where’ statements– Event orientedEvent oriented– Report design centricReport design centric– User takes less than five minutes to User takes less than five minutes to
develop querydevelop query– User concentrates on report appearanceUser concentrates on report appearance– Estimated eleven fold reduction in code Estimated eleven fold reduction in code
to create a reportto create a report
Functional TablesFunctional Tables
• Subsets of existing of existing application(s)Subsets of existing of existing application(s)
• Single table can contain data from disparate Single table can contain data from disparate applicationsapplications
• Do not HAVE to be homogeneousDo not HAVE to be homogeneous
• Created from pre-selection routinesCreated from pre-selection routines
• Create logic continuityCreate logic continuity
• May be a view or a physical tableMay be a view or a physical table
Functional Table -- Proven ApproachFunctional Table -- Proven Approach• Concept developed in 1998Concept developed in 1998
• First put into use in 1999 with conversion of First put into use in 1999 with conversion of University Advancement system from University Advancement system from mainframe to client/servermainframe to client/server
• Over 500 new reports created for that Over 500 new reports created for that department using this techniquedepartment using this technique
• Utilized in 2000 to retain orientation system Utilized in 2000 to retain orientation system when Admissions implemented PeopleSoftwhen Admissions implemented PeopleSoft
• An integral part of reporting solution at UMR An integral part of reporting solution at UMR prior to 2004 Registrar’s PeopleSoft prior to 2004 Registrar’s PeopleSoft implementationimplementation
• ProcessProcess– User identifies report needed (presents User identifies report needed (presents
sample or attempts to sketch idea)sample or attempts to sketch idea)– User and professional staff member User and professional staff member
meet to discuss new reportmeet to discuss new report– Programmer identifies transactional Programmer identifies transactional
data needed to create reportdata needed to create report– Programmer creates and loads tableProgrammer creates and loads table– User creates report.User creates report.
Legacy PeopleSoft
UMRDW
UMR Download ProcessUMR Download Process
Download scripts
Functional table scripts
Campussystems
Approximately 300 tables
Approaching 5 gigabytes
Report Prepared direct from PeopleSoftReport Prepared direct from PeopleSoft
PS table1
PS table2
PS table7
PS table3
PS table4
PS table5
PS table6
12 Processtables
Query
Report:Count number of
Freshmen forSpecified term
Sub Query1
Sub Query2
Sub Query3
Sub Query4
Sub Query5
12 Processtables
12 Processtables
12 Processtables
12 Processtables
12 Processtables
Report prepared from UMR Report prepared from UMR functional tablefunctional table
Report:Count number of
Freshmen forSpecified term
UMR functiona
l table
Query
The ChallengeThe Challenge• With modules being implemented over a With modules being implemented over a
six year period of time, applications and six year period of time, applications and reports had to continue to function when reports had to continue to function when some data was in the mainframe in a some data was in the mainframe in a legacy format and other data was in legacy format and other data was in PeopleSoft in a client server/relational PeopleSoft in a client server/relational environment. environment.
• This required applications and reports to This required applications and reports to draw data from tables that had diverse draw data from tables that had diverse sources.sources.
• Inventory indicated over 3,000 reports and Inventory indicated over 3,000 reports and 25 applications built over a 10+ year 25 applications built over a 10+ year period of time would cease to function. period of time would cease to function.
Report Creation Comparisons
select a.admajor, count(*)from admfresh a where a.adterm=’FS2001’and a.fraction=’A’and a.fractiondate<=sysdategroup by a.admajor”;
select f.acad_plan, count(*)from ps_pers_data_effdt a, ps_pers_dtef_sa_vw a1, ps_stdnt_career b,ps_adm_appl_data c, ps_adm_app_car_seq d,ps_adm_appl_prog e, ps_adm_appl_plan f,where a.emplid=a1.emplid andfrom ps_pers_data_effdt a_eda.effdt=a1.effdt and(a.effdt=(select max(a_ed.effdt)where a.emplid=a_ed.emplid anda_ed.effdt<=sysdate) anda.emplid=b.emplid and b.acad_career=c.acad_career and c.emplid=d.emplid and c.acad_career=d.acad_career and c.stdnt_car_nbr=d.stdnt_car_nbr and c.adm_appl_nbr=d.adm_appl_nbr and d.emplid=e.emplid and d.acad_career = e.acad_career and d.stdnt_car_nbr = e.stdnt_car_nbr and d.adm_appl_nbr = e.adm_appl_nbr and d.appl_prog_nbr = e.appl_prog_nbr and e.effdt=(select max(e_ed.effdt)from ps_adm_appl_prog e_edwhere e.emplid=e_ed.emplid ande.acad_career=e_ed.acad_career ande.stdnt_car_nbr=e_ed.stdnt_car_nbr ande.adm_appl_nbr=e_ed.adm_appl_nbr ande.appl_prog_nbr = e_ed.appl_prog_nbr and e_ed.effdt <=sysdate) andfrom ps_adm_appl_prog e_eswhere e.emplid=e_es.emplid and e.acad_career=e_es.acad_career ande.stdnt_car_nbr=e_es.stdnt_car_nbr and e.adm_appl_nbr=e_es.adm_appl_nbr ande.appl_prog_nbr=e_es.appl_prog_nbr and e.effdt=e_es.effdt ande.prog_status=e_es.prog_status) and e.admit_term=’FS2001’ ande.prog_status in (‘AC’, ‘AD’) and e.action_dt <= sysdate andc.admit_type=’FTC’ and d.acad_career=’UGRD’ ande.emplid = f.emplid ande.acad_career=f.acad_career and e.stdnt_car_nbr=f.stdnt_car_nbr and e.adm_appl_nbr=f.adm_appl_nbr and e.appl_prog_nbr = f.appl_prog_nbr and f.effdt=(select max(f_ed.effdt)from ps_adm_appl_plan f_ed where f.emplid=f_ed.emplid and f.acad_career=f_ed.acad_career andf.stdnt_car_nbr=f_ed.stdnt_car_nbr and f.adm_appl_nbr=f_ed.adm_appl_nbr andf.appl_prog_nbr=f_ed.appl_prog_nbr andf_ed.effdt<=e.effdt) andf.effseq=(select max(f_es.effseq)from ps_adm_appl_plan f_es where f.emplid=f_es.emplid andf.acad_career=f_es.acad_career and f.stdnt_car_nbr=f_es.stdnt_car_nbr and f.adm_appl_nbr=f_es.adm_appl_nbr and f.appl_prog_nbr=f_es.appl_prog_nbr and f.effdt=f_es.effdt))group by f.acad_plan;
1query
1 table
0 joins
3 'where' statements
6 lines
1 query
7 distinct tables
12 process tables
6 joins
5 sub queries
50 'where' statements
70 lines
UMR Functional Table Format PeopleSoft Format
Each of these queries counts
the number of freshmen
admitted for a specified
term. The results are the same.
Comparison of Relational Comparison of Relational ApproachesApproaches
Fact HR table
Appt data
Ed. data
Benefit data
Address data
Dept data
Bio data
Fact and dimension structure with Star Schema
Functional HR table with same data
UMR Approach
PeopleSoftReporting Instance
Original DataWarehouse tables
Phase 1Retain current Data Warehouse
model
Phase 2
Create Transitional Bridge
Insert PeopleSoft columns incurrent table shemas
Phase 3
Final Data Warehouse model
Cease populating original DataWarehouse columns with
translated data
Modified DataWarehouse tables
New DataWarehouse model
Maintaining the BusinessContinuity in a Major System
Implementation
Month 1 -2 Month 2 - Year 2Year 2 and
Beyond
Focus — retaining reports/applications Focus — re-engineering
downloads, modifyingreports/applications &creating new reports/
applications usingmixed forms of data
Focus — creating newreports/applications
using PS vales solely
Translate data valueswhere feasible
Hybrid tables, i.e. mix ofdata values
Remove original columnsif no longer used. New
tables created as needed
• Hybrid tablesHybrid tables– After further experience and discussion it was After further experience and discussion it was
realized the functional tables could be hybridized realized the functional tables could be hybridized to satisfy specific reporting needs and to provide to satisfy specific reporting needs and to provide a transitional bridge to the future.a transitional bridge to the future.
– Definition – a hybrid functional table is one that Definition – a hybrid functional table is one that has data derived from disparate systems. has data derived from disparate systems. (normally legacy and PeopleSoft)(normally legacy and PeopleSoft)
– Hybrid tables can become transitional tables.Hybrid tables can become transitional tables.– With time, hybrid tables can become normal With time, hybrid tables can become normal
functional tables. (When the legacy data is no functional tables. (When the legacy data is no longer required, the columns cease to be filled longer required, the columns cease to be filled or are removed.)or are removed.)
Evolution of a Hybrid Table
Stuno Gender Vet_code Citizenship Fin_Aid_Int Mar_Stat Eth_Org DOB
Studnt # M/F 1 Jpn Y/N S 6 MM/DD/YYYY
BIO Table
-- PeopleSoft value
Original -- UMR Data Warehouse Table Name/Structure and Column Names
StunoOld Stuno Gender
Vet_code
PS_Mil_St
Citizenship
Fin_Aid_Int Mar_Stat
PS_Mar_St Eth_Org DOB
Sp Interests
Emplid Studnt # M 1 Vet Jpn S Single 6 Band
-- translated values PS to UMR DW -- new PeopleSoft data item
First Degree of Hybridization
Second Degree of HybridizationBIO Table
BIO Table
Stuno
Old Stuno
Gender
PS_Gender
Vet_code
PS_Mil_St
Citizenship
PS_Citzn
Mar_Stat
PS_Mar_St
Eth_Org
PS_Eth_Org DOB
Sp Interests
Emplid
Studnt # M Male 1 Vet Jpn Japan S Single 6
Caucasian Band
Not needed or found at this level
Third Degree of HybridizationBIO Table
Stuno Gender PS_Mil_St PS_Citzn PS_Mar_StPS_Eth_Org DOB Sp Interests
Ambassador
Collection_ID
Emplid Male Vet Japan Single Caucasian YYYY/MM/DD Band YES 12345
* the BIO table draws its data from 11 PeopleSoft tables
Approach ComparisonsApproach ComparisonsFact and dimensionFact and dimension
1.1. Less complex downloads.Less complex downloads.
2.2. Better potential for Better potential for documentation.documentation.
3.3. Better potential to Better potential to electronically trace electronically trace schema.schema.
4.4. Tables and column Tables and column names more likely to be names more likely to be familiarfamiliar with power with power users.users.
5.5. More potential for More potential for system-wide assistance.system-wide assistance.
6.6. Better understood by Better understood by trained technicians.trained technicians.
Functional tablesFunctional tables
1.1. More simple More simple presentation.presentation.
2.2. Faster ramp up.Faster ramp up.
3.3. Less impact with Less impact with transactional changes.transactional changes.
4.4. Faster execution.Faster execution.
5.5. Extended user Extended user potential.potential.
6.6. Continues longitudinal Continues longitudinal studies.studies.
7.7. Tables and column Tables and column names more likely to names more likely to be understandable by be understandable by casual users.casual users.
UMR Startup Tactic ApproachUMR Startup Tactic Approach
Users did not see the difference.Users did not see the difference.
UMR DWUMR DW
ExistingReports &
Applications
Legacy Tables
Bridge tables
Hybrid tables
PeopleSoft System
NewReports
Multi-year reportsMulti-year reports
ExistingReportsLegacy Tables
Bridge tables
Hybrid tables
NewReports Longitudina
l studies/ trend
analysis requests
PeopleSoft tables
UMR DW
ResultsResults• Translated PeopleSoft admissions data for 3 years Translated PeopleSoft admissions data for 3 years
without interrupting production.without interrupting production.
• Integrated PeopleSoft HR with student legacy for 2 Integrated PeopleSoft HR with student legacy for 2 years without interrupting production.years without interrupting production.
• Integrated legacy and PeopleSoft grant data for a fiscal Integrated legacy and PeopleSoft grant data for a fiscal year report.year report.
• Amended HR data without affecting applications.Amended HR data without affecting applications.
• Had 400 reports and all applications needing student Had 400 reports and all applications needing student data in production by the end of the 4data in production by the end of the 4thth week of classes. week of classes.
• Integrated admissions PeopleSoft data with legacy Integrated admissions PeopleSoft data with legacy student data until student system went live with student data until student system went live with PeopleSoftPeopleSoft
Greek Life ReportsSet of 6 reports
UMR DW2 tables
1 PeopleSoft table
Greek Life Report Prepared from PeopleSoftPull ‘major and level
PeopleSoft — 12 tables,effective dated
500 lines of code,multiple steps
Pull GPA data
PeopleSoft — 5tables effective
dated
Pull Student Organization
Pull gender
CURRTERM tableWS2004
pull‘major’, level,
jurisdiction
PeopleSoft — 5 tableseffective dated
BIO tablepull
gender, studentorganization
TERM tableWS2004
Greek Life ReportsSet of 6 reports
Greek Life Reports Created from PeopleSoft and UMRData Warehouse
Pull GPA dataUMR data warehouseexisting
UMR Data warehouseexisting
UMR data warehouseexisting
AppropriateData Warehouse
TablesFAS
Legacy System
PeopleSoft System
Conversion to PeopleSoft
Translation scripts
Translation scripts
Table names could be legacy UMDW
names
Table names could be new PS names
Conceptual Approach to Providing Data to FAS
Campus conversions to PeopleSoft and PeopleSoft upgrades become transparent to FAS
Names of tables or columns are irrelevant in developing new
applications/reports. If existing applications/reports are involved, then legacy names could be
used and the tables become hybrid tables and transitional bridges.
PeopleSoft prod
PeopleSoft rpt
UMR server
Physical tables
Logical tables
Download scripts
Formatting scripts
Columbia
Rolla UMR DW
Reports
Applications / Feeds
Extract
Load
Transform
Current UMR Data Warehouse Process
6 -7 hours process-- points of failure
infrequent
inconsistent
completion
infrequent
Nighttime primarily
infrequent
Data can be out of dateNear daily load failures
Numerous points of failureNighttime connection not as stable
PS process unpredictable
Labor intensive
Lengthy process time
Vendor prod server
Vendor rpt server
Logical table scripts
Logical tables
Central
Local
Enterprise DW
Local data
Spec appl
Other data
Local Virtual Data Warehouse Nightly Load Concept
Hour or less process
Allows transparent and phased transition
Existing local scripts
-- points of failure
inconsistent
completion
infrequent
Very limited physical data space
Data same state as vendor rpt
Fewer points of failureMore flexible
ODS
Shorter processing time
Nightly process
Load process strictly at Central site
potential sources
primary source
firewall
Vendor rpt server
Logical tables
Central
LocalReports
Applications/Feeds
Enterprise DW
Local data
Spec appl
Other data
Virtual Local Data Warehouse Reporting Concept
Allows transparent and phased transition
-- points of failure
infrequent
Daytime primarily
Data same state as vendor rpt
More flexible
ODS
potential data sources
primary data source
Response time could be slightly longer
Quicker response to connectivity issues
Solution ComparisonsSolution Comparisons
Labor intensive
Highly vulnerable to process disruptions
Longer refresh window
Failures not found until start of bus. day
Protected against telecommunication issues
Faster response time
Slower response time
Vulnerable to telecommunication issues
Fewer points of failure
Greatly shortened refresh window
Data at same point as vendor source
More flexible
Requires less space
Traditional Solution “Virtual” Solution
The Transitional BridgeThe Transitional BridgeLegacy mainframe system
PeopleSoft – Std Adm/ver 7.6
Client/server
UMR Data Warehouse
~ 3,000 reports
23 applications
Converted from IMS legacy system Jan, 2004
ALL required reports and applications operating by second day of classes
400 reports and ALL applications functioning by end of fourth week of classes
And legacy removal
The Transitional BridgeThe Transitional Bridge
PeopleSoft – Std Adm/ver 7.6
Client/serverUMR Data Warehouse
~ 3,000 reports
23 applications
Upgraded to ver 8.0 Jul, 2004
Upgraded to ver 8.9 Jul, 2005
No impact observed with ANY report or application for either upgrade
And version upgrades
Jerry HammonsJerry HammonsSupervisor, Data Warehousing/ReportingSupervisor, Data Warehousing/ReportingInformation TechnologyInformation TechnologyUniversity of Missouri – RollaUniversity of Missouri – RollaE-mail: E-mail: [email protected]@umr.edu
Art BrooksArt BrooksAdjunct Professor/IT Appl Dir (retired) Adjunct Professor/IT Appl Dir (retired) University of Missouri – RollaUniversity of Missouri – RollaE-mail: E-mail: [email protected]@umr.edu
ContactContact