promoting accuracy through data quality: the uc data
TRANSCRIPT
![Page 1: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/1.jpg)
October 17, 2016
Promoting Accuracy Through Data Quality: The UC Data Validation Framework
University of California Office of the President OFFICE OF INSTITUTIONAL RESEARCH & ACADEMIC PLANNING
[IRAP] CAIR 2016 Conference
OLA POPOOLA – Director: Reporting & Analytics
1
![Page 2: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/2.jpg)
2
Dealing with bad data?
1
2
![Page 3: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/3.jpg)
Agenda • Desired Presentation Outcomes • Data Quality
– Attributes of Data Quality – Causes & Costs of Poor Quality Data
• The UC Data Validation Framework • Creating Your Own Data Quality
Management Program • Final Thoughts
UCOP-IRAP/CAIR 2016 3
![Page 4: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/4.jpg)
Presentation Outcomes
4
![Page 5: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/5.jpg)
Desired Presentation Outcomes
• A better understanding of data quality from an IR perspective
• Exposure to data quality principles, methods and techniques that enable continuous improvement in data quality
• How to conduct simple data quality audits by implementing a successful Data Quality Program (DQP)
UCOP-IRAP/CAIR 2016 5
![Page 6: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/6.jpg)
Quality 6
![Page 7: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/7.jpg)
What is Data Quality?
Definition 1 The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use. (Government of British Columbia)
Definition 2 The quality of a particular dataset or record is to describe the fitness of that dataset or record for a particular use that one may have in mind for the data. (Chrisman, 1991)
UCOP-IRAP/CAIR 2016 7
![Page 8: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/8.jpg)
Attributes of Data Quality
Accurate Complete Flexible
Timely Consistent Available
UCOP-IRAP/CAIR 2016 8
![Page 9: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/9.jpg)
Causes of Poor Quality Data • Lack of data governance • User errors – manual data entry • Lack of identified “authoritative” data sources • Complex IT infrastructure • Bad business processes • Silo-driven solutions • Multiple disconnected processes • Tactical initiatives to “re-solve” data accuracy
rather than understanding and addressing root cause UCOP-IRAP/CAIR 2016
9
![Page 10: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/10.jpg)
The Cost of Poor Data Quality • Wasted revenue - $3.1 trillion in US alone
(2016) • Mistrust • Bad or delayed decisions • Impacted funding • Constant rework • Missed opportunities
UCOP-IRAP/CAIR 2016 10
![Page 11: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/11.jpg)
The UC Implementation
11
![Page 12: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/12.jpg)
The UC Story
• Challenge associated with data submission from 10 different campus locations, central office, three laboratories and ANR with diverse transactional systems
• Implementation of a new data warehouse called for an extensive review of data quality processes
• Selection of a data quality methodology that involved business practice review and change.
UCOP-IRAP/CAIR 2016 12
![Page 13: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/13.jpg)
UC Quality Program Guidelines • UC Applicable
– For UC business; based on user needs • Flexible
– Adaptable to evolving data content areas • Scalable
– Could be expanded or reduced in scale – Could be deployed across multiple UC locations
• Prudent – Minimal implementation costs
• Complementary – Compatible with UC Standards
UCOP-IRAP/CAIR 2016 13
![Page 14: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/14.jpg)
Elements of UC DQM
• Technology • People
• Processes • Governance
Define goals
Identify areas
for change
Build Solutions
Monitor and plan
updates
UCOP-IRAP/CAIR 2016 14
![Page 15: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/15.jpg)
UC Data Infrastructure
Input Files Data Processing
Reporting &
Analytics
Staging Layer Reporting Layer
Data Marts
UCOP-IRAP/CAIR 2016
15
![Page 16: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/16.jpg)
UC Data Validation Framework
Review data validation reports
Standard input files
Review data audit reports
Data Staging Layer
Reporting Layer
N1
N2
N3
N4
N5
N6
UCOP-IRAP/CAIR 2016
16
![Page 17: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/17.jpg)
Data Collection & File Specs… • File specifications: data collection
instrument • Proper data collection instrumental to
integrity of research • A good file specification:
– Clarifies how you expect all institutions to submit their data
– Clarifies the length, format, error levels and valid values
– Has an accompanying overview and file characteristics that contains:
• File submission schedule • File physical characteristics • Any special conventions
– Has an accompanying code book
UCOP-IRAP/CAIR 2016 17
![Page 18: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/18.jpg)
Error Groups • Error Framework
– Database Tables • Rejected Files (R)
– Header Record Type • Severe Errors (S)
– Invalid Campus Code – Invalid Student ID
• Element Errors (E) – Invalid Sex Code
• Group Errors (G) – Campus-College-Major
combinations
UCOP-IRAP/CAIR 2016 18
![Page 19: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/19.jpg)
Our Toolset
19
![Page 20: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/20.jpg)
UC Data Quality Toolset
• Atlassian JIRA • IBM DB2 Database • IBM DataStage • IBM Cognos • Microsoft Excel
UCOP-IRAP/CAIR 2016 20
![Page 21: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/21.jpg)
Creating Your Data Quality Program
21
![Page 22: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/22.jpg)
Key Requirements for a DQP A Data Quality Vision
A Data Quality Strategy
UCOP-IRAP/CAIR 2016 22
![Page 23: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/23.jpg)
Develop A Data Quality Vision • Every organization needs:
– A vision with respect to having good quality data – An accompanying policy to implement that vision – A strategy for implementation
• Every organization should look: – For efficiencies in data collection and quality
control processes – Beyond immediate use and examine user
requirements – For ways to build networks and partnerships
UCOP-IRAP/CAIR 2016 23
![Page 24: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/24.jpg)
Define a Data Quality Strategy
Strategy
Business Process Review
Data Quality Assessment
Review Results
Business Practice Change
UCOP-IRAP/CAIR 2016 24
![Page 25: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/25.jpg)
Review Your Business Processes
• How and when is data collected?
• Where is data stored? • Is the same data stored
in more than one system?
• Who creates the data? • Who uses the data? • What kind of quality
checks already exist?
UCOP-IRAP/CAIR 2016 25
![Page 26: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/26.jpg)
Do a Data Quality Assessment • What are the quality
criteria? • What are the
acceptable range of values?
• What kind of thresholds should be in place?
• What are your business rules?
UCOP-IRAP/CAIR 2016 26
![Page 27: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/27.jpg)
Review Your Results
• Develop a systematic approach to reviewing results
• Develop a process for data cleaning or correction
• Identify source of data problems
• Communicate!
UCOP-IRAP/CAIR 2016 27
![Page 28: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/28.jpg)
Implement Necessary Changes • Implement changes to
improve data quality – Centralize reference
data codes – Consolidate data
collection and storage • Adopt ongoing data
quality review process – Review data regularly – Communicate quality
improvements
UCOP-IRAP/CAIR 2016 28
![Page 29: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/29.jpg)
Where Are We Now?
![Page 30: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/30.jpg)
Where are we now? • Standardizing input file specifications across
all content areas • Implementation of a requirement
managements tool • Documenting business related quality rules • Promoting data governance through the
creation of a data operations website • Improving communication between the data
creators and IRAP • Improving relationship between IRAP and IT
UCOP-IRAP/CAIR 2016 30
![Page 31: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/31.jpg)
Final Thoughts • Quality data is
achievable if you are willing to: – Take a critical look at
your existing data – Implement changes to
how you collect and manage data
– Invest the time to educate and communicate with data creators and users
– Make data quality improvements an ongoing process
UCOP-IRAP/CAIR 2016 31
![Page 32: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/32.jpg)
32
It’s not the things you don’t know that matter, it’s the things you know that aren’t so. Will Rogers, Famous Okie GI specialist
![Page 33: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/33.jpg)
33
Fast is fine but accuracy is final. Wyatt Earp - Officer of the law, gambler and saloon keeper in the Wild West
![Page 34: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/34.jpg)
34
Good data are the data you already have. Dr. Edgar Horwood - Founder of the Urban and Regional Information Systems
![Page 35: Promoting Accuracy Through Data Quality: The UC Data](https://reader031.vdocuments.us/reader031/viewer/2022012100/6169e32211a7b741a34c7769/html5/thumbnails/35.jpg)
35