the management system for data quality (idq tutorial) · pdf filethe management system for...
TRANSCRIPT
The Management System for Data Quality (IDQ Tutorial)
Thomas C. Redman, Ph.D. “the Data Doc”
Navesink Consulting Group www.navesinkconsultinggroup.com
Information and Data Quality Conference, Nov 4-7, 2013, Little Rock, AR
/Redman-IDQ-DQtutorial-Nov2013 T.C. Redman, Page 1 © Navesink Consulting Group LLC, 2000-2013
“Customers” for this presentation
o Practitioners: Who want to know what they need to do (and a bit of “how to do it”)
o Managers: Who want to know where they fit o Senior Leaders: Who wonder what all the chatter is
about. And if it is real, what they need to do to get in front.
o Live heads: Who want to understand why we think about data quality the way we do.
o Good hearts/Agents of Change: Who want to sort out what’s best for their companies.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 2
The Big Ideas o Data quality, done right, is a huge win/win! o You have to get in front! This means you have find and eliminate
the root causes of error. o Quality is in the eyes of the customer. o You have to do the technical work tolerably well. o Data quality is done in the line (e.g., business process). o You have to overcome organizational momentum, so actively
manage change. o Get responsibility for DQ out of Tech. Built human capability. o Sooner or later, you’re going to need a data strategy. o The quality program only goes as far as the senior team (perceived
to be) leading the effort insists.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 3
Data Quality Done Right
¨ One data set for a market data vendor ¨ Quality Improvement in a data-intensive department ¨ A long-term, company-wide effort
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 4
Market Data Vendor Background: Financial service companies purchase market data from
companies such as Reuters, Bloomberg, etc. o Lack of trust causes them to purchase basic data from multiple
sources. o Bank request: Far better data, so it could reduce its vendor base
and eliminate downstream costs of bad data. Work conducted: o Clear statement of customer needs. o Measurement against those needs. o Root causes identified and addressed, one at a time. o Statistical control. o In the course of day-in, day-out work.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 TCR, Page 5
Market Data Example: Results
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 TCR, Page 6
Each error not made saves an average of $500. Quickly millions!
Summary message: Even in enormous organizations, the day-in, day-out work of data quality managements (customer needs, measurement, improvement, control) is conducted at the work group level.
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20
Frac
tion
Per
fect
Rec
ords
Month
First-time, on-time results
Accuracy Rate ave lower control limit upper control limit target
2. First Meas
Program start
1. Rqmts defined
3. Improvements 4. Control
Access Financial Assurance at AT&T Background: AT&T expenditures for “access” about $20B/yr. o Access Financial Assurance aims to ensure integrity of access
bills, through parallel “billing.” Key Idea: Get the bill right the first time. Work conducted: o Dissatisfied middle manager, seeking a better way. o Top-down deployment. o Staff group defined series of deliverables, then audited (regional)
compliance. o Supplier and process management. o Customer needs, measurement, improvement, control.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 TCR, Page 7
Results: Access Financial Assurance
o Data accuracy improved 90%. o Billing errors reduced 98%. o Cycle time (bill period closure) reduced 67%. o AT&T costs (of financial assurance) reduced 73%
($100M/year). o LEC costs (of access billing) reduced 20%.
Summary message: There is much hidden “non-value-
added work” built in to accommodate bad data.
/Redman-IDQ-DQtutorial-Nov2013 TCR, Page 8 © Navesink Consulting Group LLC, 2000-2013
Enterprise Programme at BT* (British Telecom) o Revenue: $33 billion/yr o Employees: 95,000 o Operates in 170 countries o 22 Million customers (4 Million business) Enterprise Data Quality Improvement Programme (10-year effort) o Recognized the inherent complexity of people, process, technology issues
(e.g., data quality problems masquerading as “systems issues.”) o Explicit linkage of data (quality improvement) to strategic business
objectives (e.g., business transformation). o Over time, magnitude of DQ problem understood and exposed. o Governance structure starting at the very top. o Consolidated expertise in data quality improvement in IT. o Estimated and delivered benefits vetted by Finance.
/Redman-IDQ-DQtutorial-Nov2013 TCR, Page 9
*This summary largely courtesy of Nigel Turner, who led BT’s programme and is now at Trillium Software. He has vetted this summary.
© Navesink Consulting Group LLC, 2000-2013
BT – Results Enterprise Data Quality Improvement Programme, cont’d o Dual focus on reducing capital expenditure and the rework that results
from searching for “lost network facilities.” o Problem discovery, measurement, audits, new controls (hold the gains)
enhanced by Trillium DQ tool suite. o Focused on “big improvement projects” (delivered 75 over ten years). o Dual focus on data clean-up and process improvement. Business Benefits: o > $1B (verified and conservative) o Also improved customer satisfaction, better regulatory compliance,
reversed brand damage and revenue leakage, and contributed to business transformation: These benefits not quantified.
Summary Message: More than you might think, data permeate everything. Bad data are “silent killers.”
/Redman-IDQ-DQtutorial-Nov2013 TCR, Page 10 © Navesink Consulting Group LLC, 2000-2013
For Data, Only Two Moments Really Matter
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 11
The moment of use The moment of
creation
The whole point of data quality
management is to connect the two!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 12
To Clean Up The Lake, One Must First Eliminate The Sources Of Pollutant
The Non-delegatable Choice
Unm
anag
ed
It is so easy for accountability to shift downstream!!!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 13
Here’s how you do
number 3!
The Management Systems for Data and Data Quality touch every other management system
Other Mgmt
Systems
Mgmt Sys for People
Mgmt Sys for Capital Assets
Management System for data
Management System for DQ
Overall management system
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 14 © Navesink Consulting Group LLC, 2000-2013
The management system for data embraces the totality of effort directed at acquiring data, ensuring their quality, putting them to work, competing with them, etc.
Management System for Data o “Structure follows strategy” (Alfred Chandler)
© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 15
Objective/ Goal/
Strategy Organization* Process Technology
Strategy: • Goals • Scope • Comp Advantage • Achievable
Organization • People • Org Structure • Governance • Culture
D4 Process • Data • Discovery • Delivery • Dollars DQ Mgmt Metadata
Technology • to increase scale/decrease unit cost
/Redman-IDQ-DQtutorial-Nov2013
* After Roberts, The Modern Firm
Who Does the Work?*
/Redman-IDQ-DQtutorial-Nov2013
8. Formalize management
accountabilities for data
Senior Leadership:
Middle Management (Command):
10. Advance a culture that
values data and data quality
9. Broad, informed,
demanding leadership
2. Manage processes that create data (so they do so correctly)
3. Manage “suppliers” (both inside and out) of
data
1. Focus on the most important
needs (of customers)
6. Improvement: Find and
eliminate root causes of error
4. Measure quality levels
against customer needs
5. Deploy controls, at all
levels, to remain error-free*
7. Set and meet
aggressive targets for
Improvement: top-to-bottom
Work is highly interconnected
Everyone who touches data = Four Basic “Tasks”
Taken together, the tasks define an
overall “Management
System for Data Quality”
16
*Ten habits of those with the best data from Redman, Data Driven: Profiting from Your Most Important Business Asset, Harvard Business Press, 2008.
© Navesink Consulting Group LLC, 2000-2013
Quality Means Customer Satisfaction
¨ Why we define quality the way we do. ¨ Data Defined ¨ Understand and communicate the Voice of the Customer.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 17
© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 18
Debates about “what quality means” are age-old
/Redman-IDQ-DQtutorial-Nov2013
Defect-free Reliable
Inexpensive to drive and operate
Comfort Room for plenty of
suitcases Status
Circa 1980: Which is of higher quality: The feature-rich Cadillac or the defect-free Volkswagen bug?
Dr. Juran resolved the issue
o He recognized that different “customers” had different needs and desires.
o He recognized that each customer makes different tradeoffs between features and freedom from defects.
o He proposed the term “fitness for use” (or “fitness for purpose”) to unite the two concepts.
o Perhaps most importantly, he recognized customers decide whether a product, service, or collection of data is of high quality. 19 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 20
Data Quality Data are of high quality if they are fit for their intended uses
(by customers) in operations, decision-making, and planning (after Juran).
free of defects: - accessible - accurate - up-to-date - kept secure - etc.
possess desired features: - relevant - comprehensive - proper level of detail - easy-to-interpret - etc.
Data that’s fit for use
Customers are the ultimate arbiters of quality!!
Largely, “the right data”
Mostly, “are the data right?”
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 21
Data Quality - aspirational
“Exactly the right data and information in exactly the right place at the right time and in the right format to complete an operation, serve a client, make a decision, or set and execute strategy.”*
*based on Redman, Data Driven: Profiting from Your Most Important Business Asset, Harvard Business Press, 2008
© Navesink Consulting Group LLC, 2000-2013
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 22
Data Quality – day-in, day-out
Meeting the most important needs of the most important clients.
© Navesink Consulting Group LLC, 2000-2013
In quality management, we use the term “customer” quite broadly
o Paying customers o Other external customers, including:
n Shareholders n Regulators and government agencies n Other “stakeholders,” who may be impacted by the
data o Internal customers, including your manager (next slide).
23 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
Internal customers Most people and their work doesn’t impact external
customers directly. But everyone: o Impacts internal customers directly.
n It is critical that everyone understand who their customers are and what they need.
o Impact external customers indirectly, though a “chain of customers.” n It is also important that everyone understand this
chain.
24 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
A chain of customers
25
Set up a master supplier
record
AP: Authorize and
make Payment
Supplier: Receive payment
Factory: Acquire supplier
raw materials
End Customer: Receive finished
Product
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 TCR, Page 26
Let’s dig into the data layer, starting with a data item
An item of data consists of three elements:
The thing of interest in the real-world
The particular of interest
(entity, attribute, value)
The value assigned to the attribute for the entity
Example: (Jane Doe, Service Record Date = July 1, 1996)
Note that, as defined, data are abstract. “Customers” see them as they are presented in tables, databases, graphs, etc, via applications.
=data model
Note that “you” are an “entity” o Your employer is interested in you as an “Employee.” Attributes
include: n Date of Birth n Service Record Date, n Department, n Manager
o Your doctor is interested in you as a “Patient.” Attributes include: n Gender, n Height, n Weight n Blood Pressure
o The taxing authority is interested in you as a Taxpayer o This list can go on and on
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 27
The ways an organization chooses to “model” the real-world is critically important
o An organization must decide: n What entities (and entity classes) are most important n What attributes it needs about these.
o (After Twain): “The difference between the ‘right data’ and ‘almost right data’ is like the difference between lightning and a lightning bug.”
o From a quality perspective, the data must be: n Relevant to the task at hand n At the proper level of detail n Clearly defined n Etc
o All of us must make sure we understand exactly what the data we’re using mean!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 28
Choosing the right data can be difficult
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 29
Football: And consequential! Data are subtle and nuanced. 1. As noted, people pay more
for exactly the right data! 2. Internally they have
become our lingua franca.
Data “presentation”
As we’ve defined them, data are “abstract.” To use them, we must view physical representations in
tables and graphs, on paper and computer screens, via reports and applications.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 30
There are quality dimensions associated with presentation as well
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 31 T. C. Redman, Page 31
year Sales, $M, US
1995 2 1996 1.8 1997 2.2 1998 2 1999 2.4 2000 2.2 2001 2.6 2002 2.4 2003 2.8 2004 2.6 2005 3
It is easier to understand this
Than this
Sales in $M US
00.5
11.5
22.5
33.5
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Sale
s, in
$M
, US
The most important is “ease of interpretation”
32
Understanding customer needs is a tall order*
o Sometimes it is not even clear who the customers are. o Customers don’t know what they want. o Their needs keep changing. o There are many customers and their needs conflict. o They don’t have time to spend with you. o You don’t have time to spend with them. o Etc. Of course, as we’ve seen, there is really no choice. *The Customer Needs Cookbook provides a way of doing so..
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
1. Listen to people. Hear the broader message
Companies use a variety of means to understand customer satisfaction:
o Surveys o Interviews o Customer complaints o Focus groups
Available to you: o Your line manager o Co-workers o Logged errors o People who ask
something of you o Complaints o “Thank yous”
33
Make understanding customers and their needs part of your daily work
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
34
2. Hear the “Voice of the Customer” Those who drive family cars don’t say: “We need the
glass to to be polarized, .17” thick, tempered for 40 hours in a 800 degree Thermaflex kiln, with a 26% blue-green tint, cut within 14 mils of specification, and with the edges sanded with 500 grit paper.”
They say: “We need to be able to see out in all kinds of
weather. We want to be kept safe. We don’t want to be blinded by the sun. We need the door to sound solid when it closes.”
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
35
3. Use the Customer Needs Analysis Cookbook/Communicate the VOC far and wide
2. Learn how they use the data
3. Determine required features and quality levels
4. Document and communicate
“Customer Requirements”
5. Identify and prioritize “gaps”
1. Name most important customers
/Redman-IDQ-DQtutorial-Nov2013
Optional
© Navesink Consulting Group LLC, 2000-2013
6. Communicate the “Voice of the
Customer”
You Can’t Manage What You Don’t Measure
¨ Why measuring accuracy is so difficult. ¨ General framework for measuring DQ. ¨ A framework for measuring data accuracy. ¨ An example using business rules. ¨ Time series and Pareto plots are the workhorses. ¨ A word of caution!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 36
Five metrics come up frequently
Database
Process that
creates data
5. Cost-of-poor quality $$
4. Customer satisfaction
(especially for meta-data)
1. Accuracy of data as created by the process
2. Accuracy of embedded data
3. Timeliness of delivery (to customers)/Process Cycle time
To be clear, other metrics
(completion of improvement
projects, incorporation of new features) are sometimes
important /Redman-IDQ-DQtutorial-Nov2013 37 © Navesink Consulting Group LLC, 2000-2013
A deeper look at “accuracy”
¨ All customers want their data to be “accurate enough.” ¨ “Accuracy is a measure of the degree of agreement
between a data value or collection of data values and with a standard, taken as correct (Field Guide, p 221).”
¨ Most investors are satisfied if: Mega-company’s annual
revenue is stated within one million dollars.
/Redman-IDQ-DQtutorial-Nov2013 38 © Navesink Consulting Group LLC, 2000-2013
A deeper look at “accuracy”, contd Consider a piece of direct mail sent to:
Jane Doe 10 Main Street
instead of: Jane Doe
12 Main Street Accurate enough? o YES: the mail will very likely be delivered safe and sound. o NO: Who would trust a company that sends mail to the wrong
address? Those responsible for data quality measurement must work
through such issues!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 39
A deeper look at “accuracy”, contd.
¨ Note that a data value is “abstract.” It has no physical properties, such as height, acidity, or voltage.
¨ So there is no “accurometer” in the same sense that there is a ruler, pH meter, or voltmeter.
¨ Thus all accuracy measurements have limitations. ¨ Those responsible for data quality measurement
must understand these limitations!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 40
Developing and using measurements
Understand customer/
customer needs of measurement
Answer basic questions.
For accuracy, select from Accuracy
Framework
Document “protocol” for
making the measurement
Take measurement, according to
protocol
Review results/ Take appropriate
actions
Summarize Results (on time-series
and Pareto plots)
Data Quality Measurement Process
/Redman-IDQ-DQtutorial-Nov2013 41 © Navesink Consulting Group LLC, 2000-2013
42
Data Accuracy Measurement Framework
Suppliers Customers Bus proc
inputs outputs
feedback
rqmts
DB
What Data to Include:
• Key attributes
• All attributes
Measurement Devices: Expert Opinion / Real-world / Allowed Domains / Complaints / Surveys / Tracking
Reporting Scales:
• Attribute-level
• Record level
• Failed Business Rules
• Customer-Sat
• Six-sigma scale
• COPDQ
There are dozens of candidate accuracy measures based on four factors (e.g., answers to the questions of step 2)
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
Where: As received / At DB entry / On DB “exit” / As delivered / In customer eyes / Across process
Candidate Accuracy Measurement Devices
o Comparison business rules: (As previously noted), data values
that fail business rules cannot be correct. o Expert opinion: Those with deep knowledge of the data can spot
(many) errors easily. o Comparison to real-world: Check the value(s) against their real-
world counterpart(s). o Customer complaints: People (often) complain when the data
are wrong. o Surveys: Ask customers o Data tracking: “Track” a data value as it moves across the
process and support systems All have strengths and weaknesses, which we will take up later on.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 43
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 44
First-time, On-time measurement Who’s Leading: ¨ Owner of a data warehouse who is implementing
supplier management for the first time. Customers/Needs: Leader: ¨ Is a selected supplier any good? ¨ Are the data getting better/worse? ¨ What are the key areas of improvement? Supplier: Are the data I’m providing any good?
First-time, on-time: Key Ideas
¨ Make measurements that will help drive accountability back to the supplier.
¨ Frequent measurement (to help get a feel for variation). ¨ Repeatable, low-cost, comprehensive measurement ¨ Measure both accuracy and on-time delivery
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 45
46
“First-time, on-time” measurement
Supplier Customers Business Process
inputs outputs
As data are created/delivered to DB
DB
“Count” the records that fail business
rules or aren’t delivered on schedule.
feedback
rqmts
Summarize at the record level. Identify opportunities at
attribute level.
Key attributes (fields) only
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
A data value that fails a business rules cannot be correct
EXAMPLES: o “SUPPLIER NAME = Null”
o “SEX = X”
o REVENUE = 10,000€,
EXPENSE = 8,000€, PROFIT = 4,000€
o HEIGHT = 2.6 metres
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 47
Required attribute
Sex = M, F, or NA only
Profit = Revenue - Expense
Possibly correct, but can’t tell for sure
A “business rule” is a rule that constrains one or more data values
To be clear o Customers generally want “accurate data:”
Passed business rules è “valid” But
Valid è correct o Indeed,
DQ (validity) ≥ DQ (correct)
o Keep this in mind when interpreting results o Final note: I usually find the depth of business rules
and usability of measurements based on them to be highly correlated.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 48
Process for making process accuracy measurements using business rules
Data delivered (or expected) in
the current time period
Business rules
Failed rules/erred
data spreadsheet
Develop business
rules Apply to
Identify
Summarize results
/Redman-IDQ-DQtutorial-Nov2013 49
Next slide
© Navesink Consulting Group LLC, 2000-2013
Data Quality Measurement Spreadsheet
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 50
Attribute 1 2 3 4 5 Attribute
Summary Record
Summary
Rec# Rule i Rule ii Rule iii Rule iv Rule v Rule vi Rule vii Rule viii Rule ix Rule x Attributes passed (of
5)
Record Pass
A X 4 n B X 4 n C X 4 n D X 4 n E X 4 n F 5 y G X 4 n H 5 y I X X 3 n J X 4 n
Rules Passed (of 10 attempts)
9 9 7 10 9 9 10 9 10 9
Accuracy Measure Fraction Valid fields (4+4+4+4+4+5+4+5+3+4)/50 (next to last column) 0.82 All valid records (2 of 10, see last column) 0.2 Passed Business Rules (9+9+7+10+9+9+10+9+10+9)/100 (last row) 0.91
51
Data Quality Measurement Spreadsheet: “First-time, on-time” measurement
All critical attributes
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
DQ measurements are made every time period and plotted as a time-series
DQ
0.500
0.600
0.700
0.800
0.900
frac
tion
perf
ect r
ecor
ds
week ending
Fraction valid records, XYZ Supplier, 1Q2011
fraction perfect ave rqmt
1. Clear title and labels on axes.
2. “This way good” indicator
3. Average line 4. Requirements
line
/Redman-IDQ-DQtutorial-Nov2013 52 © Navesink Consulting Group LLC, 2000-2013
Important features of a time series plot
53
Support with a simple Pareto plot that describes “error”
Categories ordered by frequency
Pareto chart aims to help identify “opportunities for improvement
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
00.050.1
0.150.2
0.250.3
0.35
attribute 2 attribute 3 late attribute 1
frac
tion
of re
cord
s
error
Pareto chart, weeks 1-25
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 54
Now, answer the questions
Leader questions: ¨ Is a selected supplier any good? Answer: Absolutely not ¨ Are the data getting better/worse? Answer: No, there is considerable week-to-week
variation, but no evidence of any trend. ¨ What are the key areas of improvement? Answer: Attributes 2 and 3 and late delivery Supplier question: ¨ Do the data I supply meet the customers’ needs? Answer: No. Average performance is well below
requirements.
Business Rules: Pros: ¨ Simple, easy-to-use ¨ Once set up, can be applied almost for free. ¨ Many good, commercially-available tools. ¨ Business rules can also be used for control ¨ When the data are poor (as in this case), there is no need for
more elaborate measurement. Cons: ¨ People may find ways to beat the business rules. ¨ This method may overestimate true accuracy. That can
become an issue when DQ nears 98%. ¨ CAUTION: Business rules can seduce!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 55
/Redman-IDQ-DQtutorial-Nov2013
© Navesink Consulting Group LLC, 2000-2013
56
Top-line pros and cons of various data accuracy measurement instrument
Comparison of Accuracy Measurement Devices
Device Advantages Disadvantages
Data Tracking most powerful expensive
Expert Opinion quick and easy Experts aren’t always right
Compare to Real-World
best accurometer very expensive
Business Rules can be applied to entire DB deceptively hard
Complaints eyes of customer hard, if not already set up
Surveys can yield potent insights often hard to interpret
Good Controls Mean Predictable Performance
¨ Control defined ¨ Why control matters ¨ Types of data controls ¨ Simple controls using business rules ¨ Trusted data
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 57
58
Control - Background
o Formally, control is the managerial activity of comparing actual performance against standards and taking action on the difference (after Dr. Juran).
o We wish to emphasize the managerial aspect of control. “Controls” without clear management accountability do not qualify.
o There are many, many types of controls, from budget controls, to automatic controls, to quality controls.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
The generic control process*
59
4. Log results
2. Compare
1. Set standard
3. Take (corrective)
action
Underlying Process/Dat
a
Data/measurement of interest
Standard/goal
Pass? done
Control Log
yes no
*based on the work of Dr. Juran /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
A thermostat effects control
60
18°C ± 1°C
Set desired temperature
Turn furnace or AC on or
off
Thermometer Measures
continuously
Temperature 18° C
Pass?
Done = leave furnace or AC alone
yes no
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
61
Control in your personal life Teenager to father: “You’re not the boss of me.” STEP 1: Consider a situation in your everyday life where
you apply a control. Examples: Weight, budget, children’s bedtime.
STEP 2: Describe each element of the control. STEP 3: Evaluate the control. Does it work? Can you
improve it? /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
Why control matters As these examples make clear, control is fundamental to all
management o For data quality, various types of control:
n Help identify and correct errors n Help prevent them n Help ensure that the gains of process improvement are
sustained. n Establish a basis for prediction and so help manage
processes n Ensure that the data quality program is working as
designed o Net, net, good controls help ensure better quality at lower
cost. 62 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
63
Data Quality Controls o “Validation” controls (also called or “edit” controls) are based on
business rules
n In-process controls detect/correct errors and keep them from being passed further downstream.
n Clean-up controls correct errors that have already been made.
o Statistical process controls establish a basis for predicting future performance.
o Quality Assurance = Audit controls ensure the quality program is working as designed
o Customer error correction controls ensure that customer-discovered issues are addressed.
o Calibration controls make sure equipment works correctly
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
A simple data validation: Age and years with company
64
Age - Years >
18
Labor laws
Alert employee of error. Seek resolution
Employee Data
Age – years with
company
Can’t start work until
18
Pass? done yes no
Log error and
resolution Control
Log
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
65
Can attest that data can be trusted when the plot looks like this….
LCL exceeds requirements
Long period of stability
p-chart, first-time, on-time
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
1 6 11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
week
frac
tion
pass
ing
rule
s
and
on-ti
me
DQ
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
Customer Controls
4. Log results
Assign resp to address
1. Set standard
Correct error/Resol
ve issue
Customer Issue:
“This doesn’t look right”
e.g., All customer questions answered correctly within 24
hours
Cust corr?
done Control Log
no
yes
Customer controls aim to make sure questions from customers are answered in a timely fashion. Customer controls are similar to other controls, except the issue is brought by a customer.
Advise customer
66 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
Call/email “Help Deck”
Research issue
Quality Improvement
¨ Quality improvement and the scientific method ¨ The Quality Improvement Cycle ¨ A (stunningly real) example ¨ Common opportunities for improvement
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 67
68
What is Quality Improvement?
Quality improvement is a structured, team-based
approach for identifying and completing specific “projects” to eliminate or mitigate root causes of error and keep them from coming back!
We’ll use the term “project” to mean an opportunity
scheduled for completion (after Juran) /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
69
The Scientific Method There are many, many specific approaches to
improvement. n The six-sigma technique known as DMAIC (Define,
Measure, Analyze, Improve, Control) is most popular.
n All good methods of improvement stem from the scientific method.
n You should use the method that is best suited to your organization.
n I teach the Quality Improvement Cycle, because I like its emphasis on properly chartering projects.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
70
Quality Improvement Cycle (QIC)
1. Select project
2. Form and Charter project team
3. Conduct root cause analysis
4. Identify and trial solution
5. Implement solution
Complete project/ Dissolve QIT
6. Hold the gains
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
A Simple QIC Example Step 1: Select project
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 71
0%
5%
10%
15%
20%
25%
30%
E C D A all othersAttribute
Percent Erred Attributes,4Q2012
Background: A process management team reviews 4Q2012 attribute error rates, obtaining the following:
The team decides to address Attribute E.
Step 2: Form and charter the team
The PMT: ¨ Forms a project team composed of one person from each major
step of the process producing this data. ¨ Selects a junior manager to lead the team The PMT and improvement team agree to the following charter: ¨ “Reduce the error rate in Attribute E by 50% in three months.
Virtually eliminate it in three additional months.” The improvement team: ¨ Agrees to provide a monthly status report to the PMT. ¨ NOTE: This clarity about roles and responsibilities is the reason I
teach QIC
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 72
73
Step 3: Conduct Root Cause Analysis ¨ The improvement team learns that Attribute E is added to
the data record by a clerk reporting into Finance. ¨ A member of the improvement team meets with this
clerk. ¨ The clerk admits that “he never knew what Attribute E
was” (Note: A white-space problem). So he just entered anything that the “system” would allow.
¨ Further investigation reveals that: § Finance’s procedures do not specify how data is to be
populated. § The clerk’s manager assumed that “the clerk would
get help if it was needed.”
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
Several candidate solutions were proposed
Two received serious consideration: ¨ Move responsibility to entering Attribute E the Field
Engineering Group (currently a field engineer creates Attribute E and emails it to Finance). Clarify exactly when and what is expected.
¨ Clarify the clerk’s responsibility. Clarify exactly when and what is expected.
Note: There are pros and cons of both. In this case, the
PMT pointed out that proposed solution 2 was simpler.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 74
75
Step 4: Identify and Trial Solution ¨ The Improvement Team, working with the clerk, drafted Finance Procedure for Creating and Entering Attribute E. ¨ The clerk agreed to give it a try. ¨ One improvement team member agreed to check in with the clerk, answer questions, etc. ¨ The improvement team conducted a follow-up study near the end of its original three-month deliverable. Results suggested the problem was solved.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
0
0.05
0.1
0.15
0.2
0.25
0.3
Before After
erro
r rat
e
Attribute E Improvement Project
76
Step 5: Implement Solution
¨ The Improvement Team meets with the clerk’s manager to review results and suggest that their draft procedure be made standard.
¨ She agrees and also points out that clerks turn over rapidly. So training of new clerks is essential.
¨ The manager maintains a New Clerks’ Introduction Guide. She agrees to include the Attribute E standard in the Guide.
¨ The manager wonders if there are other similar issues that should be standardized.
¨ The Improvement Team agrees that this may be the case, and advises her that she is free to charter an improvement project to investigate.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
Step 6: Hold the gains
A new control is defined: o PMT: Check measurements every month. If there is
more than a single erred Attribute E, they will advise the manager and clerk directly.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 77
To Complete the Project
¨ The improvement team makes its final report in the form of a storyboard and delivers it to the PMT.
¨ The PMT accepts the report and closes out the project. ¨ The improvement team is disbanded, with thanks. ¨ While the QIC does not require it, a senior leader
should recognize the efforts of the improvement team.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 78
Attribute E Improvement Project: Story Board
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 79
Attribute E Improvement Project April – June, 2013
Step 1: Project selection Motivation: Eliminate the highest error rate
0%5%
10%15%20%25%30%
E C D A all othersAttribute
Percent Erred Attributes, 4Q2012
Step 2: Charter Project Team Members: Jim Henson (Finance), Bev Miller (leader), Paul Thorgood (Engineering) Charter: Reduce Attribute E errors by 50% in three months and virtually eliminate them three months later
0
0.1
0.2
0.3
Before After
erro
r rat
e
Attribute E Improvement Project
Step 3: Root Cause Analysis Suspected Cause: Clerk doesn’t know understand Attribute E. Deeper Cause: High turnover and clerks poorly trained.
Step 5: Implement Solution Solution, part B: Include standard in “New Clerk Training Manual”
Step 4: Identify/Trial solutions Solution, part A: Attribute E Data Creation and Entry Standard
Step 6: Control PMT to advise manager and clerk if there is ever more than Attribute E error in a month
Experience confirms that many improvement projects are quite simple
Look for: ¨ Lack of communication between customers and
creators ¨ People not understanding what is expected ¨ Broken interfaces between steps ¨ Overly complex steps ¨ Uncalibrated measurement devices
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 80
All Work is Part of a Process
¨ The Customer-Supplier Model helps build communications channels ¨ The Process Management Cycle wraps the workhorses (customer needs, measurement, control, improvement) ¨ Smart process owners focus on communications and interfaces between steps. ¨ The Supplier Management Cycle extends the thinking to external suppliers ¨ Why process and supplier management “work.”
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 81
Processes and their importance A process is a set of inter-related work activities, usually
characterized by specific inputs and repeated value-added steps, which produce a specific set of outputs.*
Processes are the means by which businesses deliver added value to their customers, whether the added value is data, a service or a tangible product.
The basic idea is to define, implement, operate, and improve processes that meet customer needs: n Consistently n In a cost-effective manner n In ways that evolve as customer needs evolve.
*Redman, Data Quality: The Field Guide, 2001.
82 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
83
The Customer-Supplier Model
Suppliers Customers Your Process
inputs outputs
requirements requirements
feedback feedback
Data suppliers can be inside or outside the business
Larger process
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
84
Failure Of Communications Channels
Suppliers Customers
Your Process
inputs outputs
requirements requirements
feedback feedback
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
Failure to understand customer requirements has contributed to every quality issue I’ve ever worked
More generally and fundamentally, the root cause is failure to build all needed communications channels
Process Management Cycle
1. Establish management
responsibilities
2. Understand customer
needs
3. Describe process
4. Establish measurement
system
5. Establish control
& check conformance
to requirements
6. Identify & select
improvement opportunities
7. Make improvements
& sustain gains
The process management cycle provides a powerful, repeatable
means to bring the tasks (or habits) of data quality management to bear
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 85 © Navesink Consulting Group LLC, 2000-2013
1. Establish management responsibilities Generally, the “process team” (executive, owner, manager,
support personnel) must: o Be accountable for overall performance (meeting data
customer requirements at reasonable cost) of the process. An excellent starting point is:
“Halve the overall error rate every year” o Have the authority to effect changes (e.g. budget,
staffing, process design) o Have the skills to do the work called for throughout the
process management cycle.
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 86 © Navesink Consulting Group LLC, 2000-2013
Composition of a Process Management Team
1. Representatives of departments that contribute to the process. n Detailed knowledge (or the ability to get it) of the data their department uses and
creates.. n Take and end-to-end view n Volunteers provided the time to effectively contribute. n Empowered representatives of their departments, qualified to evaluate and
implement any necessary changes. 2. Staff with expertise in data quality 3. Formed into sub-teams charged with:
n Understanding customers and their needs n Making and interpreting measurements n Starting and completing improvement projects (e.g., lean, DMAIC) n Defining and implementing controls
4. Processes that depend on outside data should name a “data supplier manager.” For data created/sources outside, this should be a formal position.
5. NOTE: Highly “data dependent” processes, “large” processes, and those which “share” data, almost certainly require a dedicated, full-time data quality professional
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 87 © Navesink Consulting Group LLC, 2000-2013
Talents of the Process Owner
o For data, many liken process management to “herding cats.” o So the ideal process manager possesses the following skills/traits:
o The ability to lead through influence, rather than control. o Nerves of steel and steely optimism o Communications – written and verbal o Analysis and synthesis o Decision-making o Ability to negotiate across organizational boundaries o The ability to build and coordinate a diverse team o Perseverance
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 88 © Navesink Consulting Group LLC, 2000-2013
Successful process owners…
o Bring data front and center. o Emphasize communications channels o Align the work in the direction of the customer.
Importantly, they recognize the people’s bosses are customers.
o Focus less on the details of how departments and groups do their work and more on the interfaces between them.
o Gain needed authority, support, and resources
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 89
Common situation
o It is natural to expect conflict between “line” and “process” management.
o Good process managers avoid most of it, by focusing where they can do the most good, usually the interfaces between steps
o Said differently, good process owners focus less on the work and more on how the work fits together.
o They also focus on less on “their span of control” and more on their “span of influence.”
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 90
Example: To shorten cycle time, process owners focus first on “queue time” between steps
1
T=7 days
T=0
2 3 4
Value-added steps Queue (wait)
time
1
T=7 days
T=0
2 3 4
“Value-added time:” 1 day “Queue time:” 6 days Total time: 7 days
“Value-added time:” 1 day “Queue time:” 3 days Total time: 4 days
BEFORE
AFTER (cutting wait time)
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 91 © Navesink Consulting Group LLC, 2000-2013
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 92
The Supplier Management Cycle is a tuned-up version for external suppliers
4. Define Planning (new feature), Control, and Improvement projects
2. Develop Customer requirements
3. Baseline Supplier performance
5. Complete projects
6. Track performance
1. Assign Supplier management responsibilities and engage selected Supplier
SUPPLIER MANAGEMENT CYCLE
© Navesink Consulting Group LLC, 2000-2013
93
You can’t expect your suppliers to understand your needs and requirements unless you’ve
explained them
Be honest with me. Have we really done a good job
explaining our business and making our needs clear?
I think you have. But explain them one more time so
I can be sure!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
Why Process and Supplier Management “Work”
o Synthesize four basic tasks (customers, measurement,
control, improvement) into a powerful whole. o Directly address political, organizational, and cultural
issues: n White space n Communication n Interfaces n “Right things” get measured improved n Best hope (so far) for working across silos
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 94 © Navesink Consulting Group LLC, 2000-2013
Actively Manage Change
¨ Everything about data quality management is political! ¨ My five-step method for addressing social issues.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 95
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 96
The Cold Brutal Reality No data issue is so trivial that it doesn’t generate enormous
political heat!
Advancing a culture that values data and data quality
1. Understand the political realities 2. Understand the realities surrounding change (and
understand that you MUST be an agent of change) 3. Understand both positives and negatives (you are more
likely to succeed building on strengths) 4. “Bring lawyers, guns, and money…” 5. All change is top-down
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 97
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 98
A-1. Power/data sharing/ownership: In the Information Age, Possession of Data Conveys Power!
Sweeney’s Database has two terabytes and
ours only has one! Get me two more teras!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 99
A-2. Though it is Universally Praised, Data Sharing is the Exception!
NOTE: Many of The 48 Laws of Power (Greene and Elffers, Viking, 1998) seem to argue against sharing data.
Of course you can have our data. Just get your 30-11 form signed by
the Head of Legal, the Head of Accounting, and the Head of HR! Then we’ll run it up the line here!!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 100
G-4. Since the Data are “In The Warehouse,” the CIO must be responsible! I’ve told that #*%! CIO
about these data problems a million
times! Why can’t they get them right?
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 101
A Model for Managing Change
Sense Of
Urgency
Sense of
Urgency
Sense Of
Urgency
Sense Of
Urgency
Sense Of
Urgency
Four components for successful change
When a component is missing
Clear, shared vision
Clear, shared vision
Clear, shared vision
Clear, shared vision
Clear, shared vision
Capacity for
change
Capacity for
change
Capacity for
change
Capacity for
change
Capacity for
change
Actionable first
steps
Actionable first
steps
Actionable first
steps
Actionable first
steps
Actionable first
steps
Successful change
Low-priority no action
Fast start that fizzles, directionless
Anxiety, frustration
Haphazard efforts,
false starts
+
+
+ + +
+ +
+ + +
+ +
+ + +
=
=
=
=
=
102
Rate each component of the “RAG” Scale
Sense of
Urgency
Clear, shared vision
Capacity for
change
Actionable first
steps
Successful change
+
+
+
=
People are motivated by fear, fame, fun, and fortune. Bone-numbing fear seems to work best for organizations
HINT: Attach the data quality program to the organization’s top priorities
The “shared” portion of this component is the most demanding.
HINT: Engage as many people as possible, as early as possible.
Evaluate “intellectual,” “financial,” and “emotional” capabilities separately. HINT: Educate, educate, educate HINT: After an initial investment, set the DQ effort up to be “self-funding.” HINT: “Carry the wounded. But shoot the laggards.”
Hint: Think globally, plan regionally, act locally
Hint: Pilot studies are essential, but no panacea
Hint: No one ever comes to saying “today, I’m going to do something to foul up a customer.” People want to do a good job. Its your job to align them to the effort
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 103
A Force-Field Analysis can help summarize current state
Quality of critical data
Complacency with current results and
position
Entrenched QA group
Senior management distracted by acquisitions
Solid record of innovation
Process initiative gaining traction
Talented staff. People seem to
care about consumers and
quality
Lack of connection
between inward- and
outward-facing groups
Lack quantitative
thinking
Many groups have close customer
connections
Step 4: Bring Everything You Can to Bear ¨ Lead the quality initiative from the business side
(Landauer) o Build organizational capabilities o Educate, educate, educate o Build political capital o Avoid “insurmountable opportunities” barriers o Persist
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 104
Step 5: “Eventually, all change is top-down”
o Probably not literally true, but the advise is solid: Over time, engage increasingly senior leadership.
o Almost all are very smart: “If they don’t get the joke,
you’re not telling it right.” o I’ve not found it to be the case that: “A big success in
one Department A leads Department B to pick DQ up.” o On the other hand, a big success in Department A is
pre-requisite.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 105
Organizing for Data Quality
¨ How many people? ¨ Take responsibility for DQ out of IT ¨ Fundamental organizational unit for DQ ¨ Federated structure
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 106
The Five Most Common Things I Hear
o “We’re data rich and information poor.” o “I’ve been in this industry twenty-five years. Trust me.
These data are as good as they can possibly be.” o “Tom, you’ve got to keep in mind that we are much
more siloed than the other companies (industries, etc) you work with.”
o “Of course my customers like what I give them. I’ve still got a job, don’t I?”
o “If its in the computer, it must be IT’s responsibility.”
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 107
Current State: Today’s organizations
are unfit for data
o Lack talent, up and down the organization chart. o Quality is essentially unmanaged. o Responsibility for data buried in the bowels of IT. Step
one: Move it out! o Silos impede data sharing. More generally, the politics
is brutal. o Organizations have not thought through how to
compete with data, nor gained enough experience to do so in a sensible fashion.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 108
How Many People Will it Take? Base organization only: o Informal study of the effort devoted to managing other
assets (e.g., people and capital): n 1-2% with most concentrated in the upper ranges
o Informal study of effort applied to quality: n 2% overall (data plus) n Up to 4% in a “surge”
o Far more for “data providers.”
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 109
© Navesink Consulting Group LLC, 2000-2013 TCR, Page 110
Who is responsible for data quality? Since the data is “in the warehouse,” it must be the CIO!
I’ve told that #*%! CIO about these data problems a million times! Why can’t
she get it right?
/Redman-IDQ-DQtutorial-Nov2013
Several lines of reasoning and the data confirm: STEP ONE: GET LEAD RESPONSIBILITY FOR DATA OUT OF IT!
A Federated Organization Model for Data
People Management Data Assets Day-in, day-out: “Regular” people and managers. HR role: Policy setting and admin
Day-in, day-out: High-quality data creation and novel use of data is the responsibility of people, processes and departments. DG role: Policy setting and admin
Departmental HR: Help their units find and advance the talent they need
Departmental DG: Help their units find and/or create the high-quality data they need. Home for quality facilitators, analysts
Corporate HR: Succession planning, pay scales, etc
Corporate DG: “metadata” processes, special provision for unique data, etc
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 111
Fundamental Organizational Unit for Data Quality
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 112
Leadership
Tech Management
Supplier Management
Process Management
Requirements Team
Improvement Teams
Measurement Team
Customer Team
Measurement Team
Control Team Control Team
Data creators and customers work within processes!
Base Capability: Departmental Data Group
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 113
Departmental Data Officer
Training Team
Metrics Team
Business Case Team
Strategy Team Quality Team
Ext Supplier Team
Change Mgmt/ Comms Team
Improvement Facilitators
Metadata Processes
Federated Structure for Data Quality
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 114
Top Data Job
Data Council Unique Data Team
Change Mgmt/ Comms
Strategy Team
Fundamental Orgs for DQ
Quality Team
Policy Team
Corporate Metadata Departmental
DGs
Data Quality and Strategy
¨ Sooner or later, you’re going to need a data strategy. ¨ Four basic strategies ¨ Data may be your ultimate proprietary asset
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 115
So far, I’ve identified eighteen distinct ways to “put data to work”
Provide (Sell) Content o New Content o Re-package o Informationalization o Unbundling o Exploiting
Asymmetries o Closing Asymmetries
Facilitators o Own the Identifiers o Infomediation o Big Data/Advanced
Analytics o Privacy and security o Training o New Marketplaces o Infrastructure
technologies o Information appliances o Tools
Working out “what’s right for us” is the key challenge for senior leadership!
• Internally o Improve
operational efficiency
o 360°-view o Data-Driven
Culture
/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 116 © Navesink Consulting Group LLC, 2000-2013
Four basic strategies (with dozens of variants)
o Innovation (Big Data/Advanced Analytics): Find hidden nuggets in the data and,…
o Content: Provide or exploit content that others don’t have. n Informationalization n Infomediation (e.g., Google) n Asymmetry (e.g., Hedge fund)
o Build a Data-Driven Culture: Make better decisions, bottom-to-top and across the company.
o Be the low-cost provider: Superior data quality keeps costs down!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 117
Data Quality and Data Strategy
Strategy Innovation/Big Data Bad data is highly leveraged in unknown ways Content Markets demand high-quality data (e.g., Apple Maps) Data-Driven Culture People discount data they don’t trust in favor of their
intuitions (rightly so) Low-cost Provider Eliminating the cost of finding and fixing errors
provides enormous cost savings
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 118
Collateralized Debt Obligations
o The promise in Finance: “Slice and dice” mortgages to create products with pre-defined risk-reward profiles.
o Billions made in the early 2000s. o Other benefits: More people could purchase homes,
because the mortgage originator didn’t have to hold it. o Everyone knows what happened.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 119
Content Providers Beware: External customers are less tolerant of poor quality data than internal
ones ¨ Drivers sent the wrong way blame the whole car
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 120
©Jimromensko.com
Data Doc’s Rule of Ten
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 121
Data ok?
Input data
Fix errors
Complete our Value-added
work
no
yes
Said differently, if the “straight-through path,” costs a dollar, then
the “fix errors path” costs ten.
“It costs ten times as much to complete a unit of work when the data is flawed in any way as it does when they are perfect!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 122
“IT Doesn’t Matter” (Carr, HBR, 2003)
PROPRIETARY INFRASTRUCTURE Can be “owned” by a single organization
(Eventually) part of general business infrastructure
Patented drug, unique process
Railroads, electric grid
Protected Become commoditized Basis for sustained
advantage Not a basis for sustained
advantage
Proprietary vs. Infrastructure Technologies
Advantage Stems from Scarcity…
o Carr argues that basic IT capabilities (storage, processing, and transport technologies) are now readily available to all.
o Carr does not argue that IT isn’t important. Only that it
is not strategic. o While there are many implications, our primary interest
is data!
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 123
© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 124
Data May Be Your “Ultimate Proprietary Technology”
Note that: o Your dollars are exactly the same as your competitors. o Your employees walk out the door every night. And your
competitors can hire the best ones away. Not so with regards to your data: Unless you let someone “steal”
them, they are yours and yours alone. Further: o Data you create are uniquely yours. And you make more each
day. o We have already noted that data are subtle and nuanced (e.g., you
can define “football” in your own way). Of course: Some, maybe most data, become standardized to facilitate
communications.
/Redman-IDQ-DQtutorial-Nov2013
© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 125
Your Unique Data Merit Special Attention
o Difficult to sustain an advantage from the same stuff every else has (e.g.. publicly-available data)
o Unique data offer opportunity for sustained advantage
o These data merit special attention!
o You must:
n Know which are “special” n Be careful what you allow to become standardize n Broaden and deepen the advantage
/Redman-IDQ-DQtutorial-Nov2013
As a practical matter, the quality program only goes as far as the senior
team (perceived to be) leading the effort insists
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 126
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 127
“Non-delegatable” Roles for the Leaders (at any level)
o Become very intolerant of poor quality in the data they
need to manage. o Become very intolerant of poor quality in the data they
need to compete. o Focus, focus, focus, on the most important data. o Get management accountabilities right. o Set really high targets. o Build the organizational capabilities needed to effect the
above. o Advance a “data culture.”
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 128
Why They Are “Non-Delegatable” “They thought they could make the right speeches,
establish broad goals, and leave everything else to subordinates... They didn’t realize that fixing quality meant fixing whole companies, a task that can’t be delegated.”
Dr. Juran, 1993 Experience so far is that “data” is even tougher than the
factory floor.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 129
Policy for Management Responsibility for Data
“Model” Data Policy:
Don’t take junk data from the guy upstream. And don’t pass junk data on to the next guy!
Said differently:
Be a demanding data customer and a great data creator!
130
Quality Planning/Targets for improvement
o On one level, targets for improvement are nothing more
than: “Half the error rate every year. Forever.”
“Add two significant new features every year”
o On another, quality planning may include “positioning” “We aspire to be, and be perceived as, the highest
quality data provider in our industry”
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013
The Big Ideas o Data quality, done right, is a huge win/win! o You have to get in front! This means you have find and eliminate
the root causes of error. o Quality is in the eyes of the customer. o You have to do the technical work tolerably well. o Data quality is done in the line (e.g., business process). o You have to overcome organizational momentum, so actively
manage change. o Get responsibility for DQ out of Tech. Built human capability. o Sooner or later, you’re going to need a data strategy. o The quality program only goes as far as the senior team (perceived
to be) leading the effort insists.
/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 131
© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 132
Questions?
Thomas C. Redman, Ph.D. “the Data Doc”
+1 732-933-4669 [email protected]
www.navesinkconsultinggroup.com
/Redman-IDQ-DQtutorial-Nov2013
/Redman-Data Driven Quiz © Navesink Consulting Group LLC, 2000-2013 NCG, Page 133
Thomas C. Redman, “the Data Doc” o Ph.D., Statistics, Florida State, 1980.
o Conceived and led the Data Quality Lab at AT&T Bell Labs.
o Formed Navesink Consulting Group in 1996.
o Helped dozens of companies think through, define, and advance their data and data quality programs.
o Led development of most of today’s best-practice data quality management methods & techniques.
o Latest and greatest: Data Driven: Profiting from Your Most Important Business Asset, Harvard Business School Press, 2008.
o Known bias: “Data are quite obviously the key asset of the Information Age. Yet today’s organizations are singularly ill-designed for data. This leads me to conclude that organizing for data is THE management challenge of the 21st century.”
o Much current work focuses on “organizing for data.”