data warehousing, methodologies and challenges by prof. chiedu f. mafiana director, quality...
TRANSCRIPT
DATA WAREHOUSING,METHODOLOGIES AND
CHALLENGES BY
PROF. CHIEDU F. MAFIANA
DIRECTOR, QUALITY ASSURANCE
NATIONAL UNIVERSITIES COMMISSION
Being a paper presented at the 2013 Training Workshop of the Committee of Directors of Academic
Planning of Nigerian Universities (CODAPNU) at the National Mathematical Centre, Kwali, Abuja from 28 – 31 October 2013
FELICITATIONS!
CODAPNU NUCPARTNERING FOR PROGRESS & QUALITATIVE DEVELOPMENT
21ST CENTURY CHALLENGES
THE WORLD IS NOW A GLOBAL VILLAGE!
HOW ARE UNIVERSITIES COPING WITH 21ST CENTURY CHALLENGES ?WE ARE THE
WORLD
21ST CENTURY CHALLENGES
NATION BUILDING
ACCESS
RELEVANCE OF CURRICULA
FUNDING
CROSS BORDER EDUCATION
NUC
GLOBALIZATION
QA ISSUES
ETC ETC
21ST CENTURY CHALLENGES
DATA, DATA EVERYWHERE!
HELP!
DAP
21ST CENTURY CHALLENGES
BUT ARE WE MAKING GOOD USE OF ALL THE DATA?
ANALYTICS
MR ANALYTICS AT YOUR SERVICE
QUALITY INPUT + EFFECTIVE PROCESSES + COMPLIANCE WITH NUC GUIDELINES = QUALITY OUTPUT
DATA WAREHOUSING
DATA WAREHOUSING? NEVER HEARD
OF IT BUT MUSTN’T
DISPLAY MY IGNORANCE!
CONCEPTS & DEFINITIONS
MEET BILL INMON FATHER OF WAREHOUSING 1970s
“It is a subject oriented, nonvolatile, integrated,
time variant collection of data in support of
management's decisions”
What is a Data Warehouse?
Centralized Data warehouses large
size for big businesses
CONCEPTS & DEFINITIONS
MEET RALPH KIMBALL 2ND FATHER OF WAREHOUSING1998
What is a Data Warehouse?
“A database of snapshots and aggregations of data
throughout an enterprise to be used for querying and
decision-making”
INTEGRATED SYSTEMS, SMALLER
DATA MARTS FOR SMALLER
BUSINESSES AND BUDGETS
CONCEPTS & DEFINITIONS
HMMM! IT’S GOING TO BE
A LONG, LONG DAY
OLAP
STAR SCHEMA
SNOWFLAKE SCHEMA
HYPERCUBE
METADATA
LEGACY CODE
DATA MART
?
??
?
?
?
CONCEPTS AND DEFINITIONS
Hmmm! At least now I know what Data
Warehouse is. I hope the process will be
clearer after the Case Study .
Oga teacher U try.Thank U Sir.
TABLE I: COMPARISON OF DW DATA AND OP DATA
OPERATIONAL DATA DW DATA
Application oriented Subject oriented
Detailed Summarized, otherwise refined
Accurate, as of the moment of access
Represents values over time, snapshots
Serves the clerical community
Serves the managerial community
TABLE I: COMPARISON OF DW DATA AND OP DATA
OPERATIONAL DATA DW DATA
Can be updated Is not updated
Run repetitively and non reflectively
Run heuristically
Requirements for processing understood before initial development
Requirements for processing not completely understood before development
Compatible with the Software Development Life Cycle
Completely different life cycle
TABLE I: COMPARISON OF DW DATA AND OP DATA
OPERATIONAL DATA DW DATA
Performance sensitive (immediate response required when entering a transaction)
Performance relaxed (immediacy not required)
Accessed a unit at a time (limited number of data elements for a single record)
Accessed a set at a time (many records of many data elements)
Transaction driven Analysis driven
Control of update a major concern in terms of ownership
Control of update no issue
TABLE I: COMPARISON OF DW DATA AND OP DATA
OPERATIONAL DATA DW DATA
High availability Relaxed availability
Managed in its entirety Managed by subsets
Non-redundancy Redundancy is a fact of life
Static structure; variable contents
Flexible structure
Small amount of data used in process
Large amount of data used in a process
CASE STUDY
A Logical Approach to Data Warehouse Design (Horsburgh.com)
Q1Do I Need a Data
Warehouse?
No Idea. Do You?
CASE STUDY
Q2What Specific
Problems Will It Solve?
Leave me alone. I have
my own problems
CASE STUDY
Q3What are my available
resources (time, money, & personnel ?
Am I your P.A?
CASE STUDY
A Logical Approach to Data Warehouse Design (Horsburgh.com)
Q4What criteria will I use
to measure my success ?
Your Hummer Jeep. Remember me in
your kingdom.
CASE STUDY
Q5Should I outsource all,
some or none of the development and
operation?
Just make sure you don’t get lynched in
the process
CASE STUDY
Q6Am I upgrading an existing system, converting from a
legacy system, or developing from scratch ?
If your DICT does not know, sack him
CASE STUDY
A Logical Approach to Data Warehouse Design (Horsburgh.com) PROCESSES
1
•REQUIREMENTS ANALYSIS
2
•INFORMATION/DATA MODELLING
3
•DESIGN & PROTOTYPING
CASE STUDY
A Logical Approach to Data Warehouse Design (Horsburgh.com) PROCESSES
a.
•RAPID PROTOTYPING
b.
• STRUCTURED DEVELOPMENT
4.
•DEVELOPMENT & DOCUMENTATION
CASE STUDY
A Logical Approach to Data Warehouse Design (Horsburgh.com) PROCESSES
5.
•TEST & REVIEW
6.
• DEPLOYMENT & TRAINING
7.
•OPERATION
CASE STUDY
A Logical Approach to Data Warehouse Design (Horsburgh.com) PROCESSES
8.
• ENHANCEMENT
9.
•HELP DESK
CHALLENGES PART I
CHALLENGES PART I
1
•ENSURING ACCEPTABLE DATA QUALITY
a.
•DISPARATE DATA SOURCES
b.
•UNSTABILIZED SOURCE SYSTEMS
CHALLENGES PART I
2
•ENSURING ACCEPTABLE PERFORMANCE
a.
•PRIORITIZING PERFORMANCE
b.
•SETTING REALISTIC GOALS
CHALLENGES PART I
c.
•PERFORMANCE BY DESIGN
3.
•TESTING THE DATA WAREHOUSE
a.
•TEST PLANNING
CHALLENGES PART I
b.
•NO AUTOMATED TESTING
4.
•RECONCILIATION OF DATA
a.•CO
MPLEX
CHALLENGES PART I
5.
•USER ACCEPTANCE
a.
•RELUCTANT USERS
b.
•TRAIN THEM
CHALLENGES PART 2
CHALLENGES PART 2
1.
•USER EXPECTATION
2.
•SYSTEMS OPTIMIZATION
3.
•DATA STRUCTURING
CHALLENGES PART 2
4.
•PREFABRICATED VS CUSTOM WAREHOUSE
5.•DATA
STRUCTURING
THE LARGEST CHALLENGE OF THEM ALL
GARBAGE IN
GARBAGE OUT
QUALITY OF DATA
SOURCE: PRACTICAL LIFE CHALLENGES IN IMPLEMENTING DATA WHAREHOUSE – Subramanya Hoysala 17th June 2011
LOW QUALITY DATA
• Accuracy
• Completeness
• Consistency
• Correctness
LOW QUALITY DATA
• Integrity
• Timelines
• Uniqueness
LOW QUALITY DATA
FACTORS • Lack of data validation in ERP-Systems;• Older technologies do not support eg. drop
down lists;• The precise business rules are not known
to many; • People are busy;
LOW QUALITY DATA
FACTORS • Psychology: Correct data are not
important to the user, who registers them, eg. the car sales person;
• Data may be loaded / corrected in one-off batch runs; and
• The rules have changed, but users are not aware of it.
LOW QUALITY DATA
WHAT DO WE DO
ABOUT IT?
LOW QUALITY DATA
CORRECTIVE STEPS
PROFILING
CLEANSING & ENHANCEMENT
CONSOLIDATION
LOW QUALITY DATA
CORRECTIVE STEPS
AUDITING
HETEROGENEOUS DATADATA INTEGRATION
SAMPLE UDW
Source: New York University, Reporting Home page
EMERGING TRENDS
CLOUD INTEGRATION
REAL TIME DATA WAREHOUSING
IN-MEMORY DBMSs
BIG DATA
OVERVIEW: ICT IN NUS
2015
2013
2014
WORLD SUMMIT INFORMATION SOCIETY GLOBAL TARGET ICT FOR ALL 2015
ICT INITIATIVES IN NUS
NUC NUS
MY DEAR SON I
HAVE SO MANY GIFTS FOR YOU DADDY!
THANK YOU SIR
NUTALP
NgRENPRESSID
NUSMAP
NLOP
VLP
ACE
ICT INITIATIVES IN NUS
• NUSMAP – NUC DATA WAREHOUSING INITIATIVE
1
•STUDENT INFORMATION
•STAFF INFORMATION
2
•UNIVERSITIES PROGRAMME ACCREDITATION
•BUDGETS AND EXPENDITURE
3
•ENROLMENT/STATISTICS
•PHYSICAL PLANNING & DEVELOPMENT
ICT INITIATIVES IN NUS
• NUSMAP – NUC DATA WAREHOUSING INITIATIVE
4
•ACADEMI C PROGRAMMES
5
•UNIVERSITY GEOGRAPHIC INFORMATION SYSTEM
6
•RESEARCH AND INNOVATIONS
ICT INITIATIVES IN NUS
• NUSMAP – FEATURES
1
•ROBUST ADMINISTRATION
2
•SECURED LOGIN SYSTEM
3
•UNIQUE IDENTITY FOR EACH SYSTEM
ICT INITIATIVES IN NUS
• NUSMAP – NUC DATA WAREHOUSING INITIATIVE
4
•USER FRIENDLY
5
•CAPTURES A VAST RANGE OF INFORMATION
6
•AUTOMTIC POPULATION OF STUDENTS & STAFF RECORDS
ICT INITIATIVES IN NUS
• NUSMAP – FEATURES
7
•GENERATION OF PRINTALE REPORTS IN NUC-SPECIFIED FORMAT
8•EXP
ORT FEATURE
9
•PROVISION FOR NUC TO VIEW SUBMITTED REPORTS
ICT INITIATIVES IN NUS
• NUSMAP – FEATURES
10
•GENERATION OF PRINTALE REPORTS IN NUC-SPECIFIED FORMAT
11
•AUTOMATIC CALCULTION OF FULL TIME EQUIVALENT (FTE)
12
•AUTOMATIC SUMMATION/CALCULATION CAPABILITY
ICT INITIATIVES IN NUS
• NUSMAP – FEATURES
13
•AUTOMATIC NOTIFICATION ON EXPIRY OF A PROJECT’S MASTERPLAN
14
•AUTOMATIC GENERATION OF ANNUAL PAYROLL REPORT
15
•STAFF PROGRESSION/STAFF AREA OF SPECIALIZATION AT A GLANCE
ICT INITIATIVES IN NUS
• NUSMAP – FEATURES
16
•FACILITIES ONLINE UNIVERSITIES PROGRAMME ACCREDITATION
17
•USARM NOW ONLINE
ICT INITIATIVES IN NUS
• NUSMAP – CHALLENGES
1•INITIAL
RESISTANCE FROM SOME UNIVERSITIES
2
•FORGOTTEN PASSWORDS
•PROBLEM OF UPLODING DUE TO WRONG ARRANGEMENT OF SOME EXCEL FILES
LEVEL OF ICT DEPLOYMENT IN NUS
UNIVERSITY PORTAL POSTING OF VITAL INFORMATION ON-LINE
ICT PARKS
LAPTOPS FOR STAFF
INTERNET CONNECTIVITY,
AREAS GIVEN HIGH PRIORITY BY UNIVERSITIES
LEVEL OF ICT DEPLOYMENT IN NUS
NUC
ICT TEACHING AIDS?
DATA BASES?
DIGITAL VIRTUAL LBRARY?
SIRS, WHAT ABOUT THESE ?
PUBLISHING PAPERS ON-LINE?
SATELLITE?VIDEO CONFERENCING?
LOWER PRIORITY AREAS
DIGITAL AUDIOS?
ICT DRIVEN OPERATIONS?
E-LEARNING SUPPORT & DEVELOPMENT ?
Web Presence, Visibility and Access?.
IP(Internet Protocol) Phones?
AREAS GIVEN PRIORITY BY UNIVERSITIES
ICT InfrastructureInternet ConnectivityICT ParksLaptops for Staff
ICT Integration University Portal Posting of vital information on-line
LOWER PRIORITY AREAS
Low level of ICT infrastructure Across Board
Digital Virtual Library SatelliteAutomated overhead projectorsDigital audios IP(Internet Protocol) Phones
LOWER PRIORITY AREAS
Low Level ICT Integration Across Board TeachingPublishing papers on-line Advertising research proposals on-lineWork stationsData bases
LEVEL OF ICT DEPLOYMENT IN NUS
• Findings
NUS: LESS THAN 5% ICT OPERATIONS IN PROCESSES(Vision 20:2020 ICT Technical Report (2009) )
ICT CAPACITY & INFRASTRUCTURE
ICT INTEGRATION
ICT DEPLOYMENT IN OPERATIONS & SERVICE DELIVERY
REASONS FOR LOW LEVEL ICT DEPLOYMENT
• HIGH COSTS OF INFRASTRUCTURE, EQUIPMENT & ELECTRICITY;
• INADEQUATE FUNDING;• LOW ICT SKILLS;• INSUFFICIENT TRAINING;• LOW PRIORITY RATING; • RESISTANCE TO SOME ICT INITIATIVES;• UNSUSTAINED INITIATIVES; &• RESISTANCE TO CHANGE.
RECOMMENDATIONS
• Policies for higher ICT integration • Serious investment in ICT infrastructures;
• Infrastructural support }• Massive training and }• Deployment of ICT skilled manpower }
(Government to Universities)
RECOMMENDATIONS
• Collaboration with:• Private Sector;• International donors; and • Civil Society
(affordable and sustainable access to ICT infrastructure)
• In-service training programmes on effective ICT integration
RECOMMENDATIONS
• ICT integration competencies for promotion of staff;
• Improved Power Supply; • Functional Records Management
Programme; • University Archive; and• Benchmarking of functional Records
Management Programme and University Archives by NUC.
THE WAY FORWARD
Data Warehousing Initiative • Realistic?• Reserve for Affluent Universities?• K.I.V.?
THE WAY FORWARD
• Questions • Data Warehousing Initiative• Realistic?• Purview of Affluent Universities?• Kept-in-View
• Questions • Data Warehousing Initiative• Realistic?• Purview of Affluent Universities?• Kept-in-View
LETS DO IT NUC
THE WAY FORWARD
PLEASE LISTEN TO THE FOLLOWING ANNOUNCEMENTS 1. THE DATA WAREHOUSE PROJECT HAS NUC’S BACKING
DAPS NUS
THE WAY FORWARD 2. IMMEDIATE STRATEGIES FOR TAKEOFF ARE AS FOLLOWS:A. LOBBY YOUR RESPECTIVE VICE-CHANCELLORS
DAPS NUS
THE WAY FORWARD
VICE-CHANCELLOR’S OFFICE UNIVERSITY OF DOGGED COMPLIANCE & INNOVATION (UDCI)
WAREHOUSE? WHAT ABOUT THE ONE WE BUILT FOR
THE CONSULT LAST YEAR?
VC, SIR, I’VE COME TO
DISCUSS THE ISSUE OF THE
DATA WAREHOUSE
PROJECT
SORRY SIR, I MEAN THE ICT
DATA WAREHOUSE
PROJECT. IT’S VERY USEFUL
FOR SRATEGIC PLANNING
PURPOSES SIR.
COME UP WITH A
PROPOSAL PLEASE
THE WAY FORWARD 2. IMMEDIATE STRATEGIES FOR TAKEOFF :B. SOLICIT FOR SUPPORT FROM YOUR DICT & DQA
GENTLEMEN PLS I NEED YOUR KIND ASSISTANCE AND
SUPPORT IN PACKAGING A
VIABLE PROPOSAL FOR THE DATA WAREHOUSE
PROJECT
DICT DQA
SOME BENEFITS OF DATA WAREHOUSING
1
•Has a subject area orientation
2
•Integrates data from multiple, diverse sources
3
•Allows for analysis of data over time
SOME BENEFITS OF DATA WAREHOUSING
4
•Adds ad hoc reporting and enquiry
5
•Provides analysis capabilities to decision makers
6
• Relieves the development burden on IT
SOME BENEFITS OF DATA WAREHOUSING
7
•Provides improved performance for complex analytical queries
8
•Relieves processing burden on transaction oriented databases
9
•Allows for a continuous planning process
COSTS OF DATA WAREHOUSING
1
•Time spent in careful analysis of measurable
2
•Design and implementation effort
3
• Hardware costs
COSTS OF DATA WAREHOUSING
4
•Software costs
5
•On-going support and maintenance
6
•Resulting re-engineering effort
THE WAY FORWARD 2. IMMEDIATE STRATEGIES FOR TAKEOFF :C. CONSTRUCTION OF AN INTEGRATED INFORMATION ARCHITECTURE
BRAINSTORMING COMMITTEE
LADIES AND GENTS WE HAVE 3 TASKS
a. Building an integrated information architecture
b. Taxonomy of Administrative Activitiesc. Data Dictionary .
THE WAY FORWARD 2. IMMEDIATE STRATEGIES FOR TAKEOFF :D. CONSULTATION WITH THE DICT
DICT DAP
MY DICT SIR,SO WHEN ARE WE GOING TO
MEET TO DISCUSS THE
DATA WAREHOUSE WORKSHOP MATRIALS
MY ABLE DAP SIR.
LET’S MEET ON FRIDAY 2 PM IN
MY OFFICE
MR. ANALYTICS FINDS A HOME
CONGRATS! I HOPE YOU LIKE THE
HOUSE MAGT. BOUGHT FOR YOU.
I’M SO GRATEFUL MADAM DAP. NOW I
CAN REALLY CONCENTRATE ON MY JOB. I WON’T LET YOU
DOWN.
DATA WAREHOUSEHOME SWEET
HOME
MR. ANALYTICS DAP, XYZ UNIVERSITY
FIVE YEARS LATER
UNI. OF DOGGED COMPLIANCE & INNOVATION (UDCI)MOST ICT COMPLIANT UNI. IN NUS 2018BEST ALL ROUNDUNIV. IN NUS 2018
MY DYNAMIC VC SIR. UDIC IS NOW RATED AS A WORLD CLASS UNIV. HOW DID YOU
PERFORM THIS FEAT IN JUST 3 YEARS.
MY AMIABLE VC SIR. I GIVE GOD ALL THE
GLORY. MY SECRET IS EFFECTIVE STRATEGIC PLANNING THRU DATA WAREHOUSE BASED
ANALYSIS
GOODWILL MESSAGE
WISHING ALL UNIVERSITIES AS MUCH SUCCESS AS UDCI!
YES WE CAN
NUS
WORLD CLASS STATUS
2020
YES WE CAN
THE SKY IS THE LIMIT
CLOSING REMARKS
CODAPNU NUC PARTNERING TO ATTAIN WORLD CLASS STATUS
THE END
LEVEL OF ICT DEPLOYMENT IN NUS
UNIVERSITY PORTAL POSTING OF VITAL INFORMATION ON-LINE
ICT PARKS
LAPTOPS FOR STAFF
INTERNET CONNECTIVITY,
AREAS GIVEN HIGH PRIORITY BY UNIVERSITIES
LEVEL OF ICT DEPLOYMENT IN NUS
NUC
ICT TEACHING AIDS?
DATA BASES?
DIGITAL VIRTUAL LBRARY?
SIRS, WHAT ABOUT THESE ?
PUBLISHING PAPERS ON-LINE?
SATELLITE?VIDEO CONFERENCING?
LOWER PRIORITY AREAS
DIGITAL AUDIOS?
ICT DRIVEN OPERATIONS?
E-LEARNING SUPPORT & DEVELOPMENT ?
Web Presence, Visibility and Access?.
IP(Internet Protocol) Phones?
AREAS GIVEN PRIORITY BY UNIVERSITIES
ICT InfrastructureInternet ConnectivityICT ParksLaptops for Staff
ICT Integration University Portal Posting of vital information on-line
LOWER PRIORITY AREAS
Low level of ICT infrastructure Across Board
Digital Virtual Library SatelliteAutomated overhead projectorsDigital audios IP(Internet Protocol) Phones
LOWER PRIORITY AREAS
Low Level ICT Integration Across Board TeachingPublishing papers on-line Advertising research proposals on-lineWork stationsData bases
LEVEL OF ICT DEPLOYMENT IN NUS
• Findings
NUS: LESS THAN 5% ICT OPERATIONS IN PROCESSES(Vision 20:2020 ICT Technical Report (2009) )
ICT CAPACITY & INFRASTRUCTURE
ICT INTEGRATION
ICT DEPLOYMENT IN OPERATIONS & SERVICE DELIVERY
REASONS FOR LOW LEVEL ICT DEPLOYMENT
• HIGH COSTS OF INFRASTRUCTURE, EQUIPMENT & ELECTRICITY;
• INADEQUATE FUNDING;• LOW ICT SKILLS;• INSUFFICIENT TRAINING;• LOW PRIORITY RATING; • RESISTANCE TO SOME ICT INITIATIVES;• UNSUSTAINED INITIATIVES; &• RESISTANCE TO CHANGE.
RECOMMENDATIONS
• Policies for higher ICT integration • Serious investment in ICT infrastructures;
• Infrastructural support }• Massive training and }• Deployment of ICT skilled manpower }
(Government to Universities)
RECOMMENDATIONS
• Collaboration with:• Private Sector;• International donors; and • Civil Society
(affordable and sustainable access to ICT infrastructure)
• In-service training programmes on effective ICT integration
RECOMMENDATIONS
• ICT integration competencies for promotion of staff;
• Improved Power Supply; • Functional Records Management
Programme; • University Archive; and• Benchmarking of functional Records
Management Programme and University Archives by NUC.
THE WAY FORWARD
Data Warehousing Initiative • Realistic?• Reserve for Affluent Universities?• K.I.V.?
THE WAY FORWARD
• Questions • Data Warehousing Initiative• Realistic?• Purview of Affluent Universities?• Kept-in-View
• Questions • Data Warehousing Initiative• Realistic?• Purview of Affluent Universities?• Kept-in-View
LETS DO IT NUC
THE WAY FORWARD
PLEASE LISTEN TO THE FOLLOWING ANNOUNCEMENTS 1. THE DATA WAREHOUSE PROJECT HAS NUC’S BACKING
DAPS NUS
THE WAY FORWARD 2. IMMEDIATE STRATEGIES FOR TAKEOFF ARE AS FOLLOWS:A. LOBBY YOUR RESPECTIVE VICE-CHANCELLORS
THE WAY FORWARD
MR. ANALYTICS FINDS A HOME
CONGRATS! I HOPE YOU LIKE THE
HOUSE MAGT. BOUGHT FOR YOU.
I’M SO GRATEFUL MADAM DAP. NOW I
CAN REALLY CONCENTRATE ON MY JOB. I WON’T LET YOU
DOWN.
DATA WAREHOUSEHOME SWEET
HOME
MR. ANALYTICS DAP, XYZ UNIVERSITY
FIVE YEARS LATER
UNI. OF DOGGED COMPLIANCE & INNOVATION (UDCI)MOST ICT COMPLIANT UNI. IN NUS 2018BEST ALL ROUNDUNIV. IN NUS 2018
MY DYNAMIC VC SIR. UDIC IS NOW RATED AS A WORLD CLASS UNIV. HOW DID YOU
PERFORM THIS FEAT IN JUST 3 YEARS.
MY AMIABLE VC SIR. I GIVE GOD ALL THE
GLORY. MY SECRET IS EFFECTIVE STRATEGIC PLANNING THRU DATA WAREHOUSE BASED
ANALYSIS
GOODWILL MESSAGE
WISHING ALL UNIVERSITIES AS MUCH SUCCESS AS UDCI!
YES WE CAN
NUS
WORLD CLASS STATUS
2020
YES WE CAN
THE SKY IS THE LIMIT
CLOSING REMARKS
CODAPNU NUC PARTNERING TO ATTAIN WORLD CLASS STATUS
THE END
LEVEL OF ICT COMPLIANCE