information quality in practice informed decisions group copyright 2005 information quality in...

46
Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Information Quality in Practice: The Good, the Practice: The Good, the Bad and the Ugly Bad and the Ugly Leon Schwartz, Ph.D. Informed Decisions Group November 16, 2005 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Upload: amice-singleton

Post on 03-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Information Quality in Information Quality in Practice: The Good, the Practice: The Good, the

Bad and the UglyBad and the UglyLeon Schwartz, Ph.D.

Informed Decisions Group

November 16, 2005

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 2: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.What Me Worry?What Me Worry?

Business Champions for TDQM Programs are scarce,

because Data Quality is difficult to define &

measure, even though Poor Data Quality

costs Billions of dollars.

Page 3: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Information Quality in Information Quality in PracticePractice

Prolog: Poor Data Costs $Billions

The Good: You Can Clean it Up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?

Page 4: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Poor Data Quality Costs Poor Data Quality Costs $Billions$Billions

Data quality problems cost U.S. businesses $611 billion a year.

40% of firms have suffered losses. 2% of customer records are obsolete in one month.

Customer duplication rates range 5 to 20%.

The Web is increasing data entry errors.

Source: Data Warehouse Institute Study, 2002Source: Data Warehouse Institute Study, 2002

Page 5: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Effects of Bad Customer Effects of Bad Customer DataData Low credibility among customers & suppliers

Poor decision making Lost customers/clients Unnecessary printing & postage Poor customer service Lost business opportunities Inefficient utilization of staff

Page 6: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Data Affects Your Data Affects Your SuccessSuccess

rocess

Algorithm

DATA

eople

olitics

PRelative influence of

on an OR/MS project

Page 7: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Room for ImprovementRoom for Improvement

Only 11% have implemented a DQ program*– 48% have no plan for a program

26% purchased a data quality tool*– 52% have no plans

Still very far from 6 Sigma! Easy to improve Quality, if…..

*Source: Data Warehouse Institute Study, 2002*Source: Data Warehouse Institute Study, 2002

Page 8: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Information must be Information must be UsefulUseful

How good is good enough? How often is often enough? How much is it worth?

……..You Can Answer the ..You Can Answer the FollowingFollowing

Page 9: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Information Quality in Information Quality in PracticePractice

Prolog: Poor Data costs $billions

The Good: You can Clean it up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?

Page 10: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Data Quality Starts with Data Quality Starts with AccessAccess

Data does not exist anywhere Exists, but you can’t find it You found it, but you can’t get to it

You can get to it, but you don’t have authority to use it

You can use it, but it is a total MESS“I never realized HOW BAD!”

Data Warehouse NIRVANA!It’s dirty, but useful.

Page 11: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Data Quality & the Data Data Quality & the Data WarehouseWarehouse

Quality Control the Match Measure & Improve Integrity Flag “out of range” Values Manually examine BIG “leftovers”

Audit a random sample of Customers

“I never realized HOW BAD our data is!”

Integrating data can improve Integrating data can improve Quality, Quality, if you…if you…

Page 12: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Matching Improves Matching Improves QualityQuality

Name Address

Phone Rules

– Group IDGroup ID Account IDAccount ID Account IDAccount ID DunsDuns

– OperationsOperations CleanseCleanse TransformTransform ConsolidateConsolidate

Page 13: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Establish Q. A. Establish Q. A. ProceduresProcedures

Use a common sample Establish replicable process Document carefully Realize the subjectivity Train the Vendor Audit the Vendor

Page 14: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Quality Control Your Quality Control Your MatchMatch

Match Quality Statistics

Data Stats By GROUPS/ Samplematched published who UNIQUES "Good" "Marginal" "Bad" size

Jun-99 Dec-99 PB GROUPS 96.8% 1.5% 1.7% 411UNIQUES 96.3% 1.6% 2.1% 816

Mar-00 May-00 Vendor GROUPS 98.0% 1.0% 1.0% 300UNIQUES **

Jun-00 Nov-00 Vendor GROUPS 98.0% 2.0% 0.0% 300UNIQUES **

Jan-01 Mar-01 Vendor GROUPS 99.0% 0.5% 0.5% 700UNIQUES 99.5% 0.3% 0.2% 4900

Jan-01 Mar-01 PB GROUPS 97.3% 0.6% 2.1% 700UNIQUES 99.2% 0.2% 0.6% 482

Page 15: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Document Integrity Document Integrity RulesRulesIntegrity Rules for IMT Database

Version ChangesV1.2 6b. addedV1.3 2a. updatedV1.4 2a, 4c, 5d updated; 7i, 7j deleted.V1.5 Descriptive headings added, 5b updated.

1. Each Duns group should have a primary Duns account.Every distinct duns:groupid has one record for which duns:groupid =duns.accountid.

2.&3. Duns groups and establishments should be consistent.

2a. Each active Duns-linked establishment should have a primary Dunsaccount.Currently (3/6/96),

Means:Groupbu At Dun & Bradstreet On our Duns tableWe call itDB Current Exists DunsDO No longer exists Retained Duns ObsoleteDM No longer exists Missing Duns Missing

Page 16: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Measure & Reduce Measure & Reduce ViolationsViolations

Integrity RulesShorthand Number of Errors

Rule Description 1996q2 1996q4 Change Summary How Comment Caused

1 d groupid/accountid 0 0 02a. e:d groupid/accountid 10,628 11,037

plus DM accounts 551 0is total 11,179 11,037 -142 Result of 1996q2. Duns removed data from their database. Data dropped 96q2.

2b. e:d groupid/groupid 16,008 16,008 0 Result of 1996q2. Problem in 1996 q2 Duns update.3 d groupid/accountid 0 0 04a. Definition: starduns NA NA NA4b. d starduns groupid 16,008 16,008 0 Result of 1996q2. Problem in 1996 q2 Duns update.4c. e starduns data 82 126 44 Duns input changes. Okay. Can naturally change as Duns data changes.5a. a:e 93,604 40,302 -53,302 Process on 1996q2 data. Process recorded meters immediately removed as dups, not estabs.5b. p:a 3,474 4,106 632 Rejected addresses (cum). Bad input data. Address rejected by match vendor.5c. natlaccct:a 1,582 0 -1,582 Process. Completely fixed with new update.5d. lease:a 0 4,670 4,670 Rejected addresses (cum). Process did not fully adjust for new Colonial Pacific data.5e. mgmtsvs:a 17 23 6 Rejected addresses (cum).5f. contact:a 33,663 9,112 -24,551 Process. Mal-adjustment of tables.5g. custsummary:a 2,563 0 -2,563 Process. Corrected.6a. e:a 0 0 0 Process.6b. e:a prime 147,370 1 -147,369 Process.7a. e:a not null 1 4,566 4,565 Process on new data sources. Data was dropped in 1996q2, causing no integrity error then.7b. e:a null 0 0 0 Process.7c. CP,FX,ML,PM:a not null 0 15,752 15,752 Process on changed data. Data was dropped in 1996q2, causing no integrity error then.7d. CP,FX,ML,PM:a null UNK UNK Process.7e. PC,CL:a not null UNK 0 Process on new data source.7f. PC,CL:a null UNK 1,123 Process on new data source. Process did not fully adjust for new Colonial Pacific data.7g. MG:a not null 0 0 0 Process on new data source.7h. MG:a null 0 483 483 Process on new data source. Process did not account for new MG updating.

24 Total So Far -203,357

Page 17: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Flag “out of range” Flag “out of range” ValuesValuesLeases -- Percent Change

Table BU CO Total Current History Active Inactivelease97q1 CL L 0.0% 0.0% NA NA 0.0%lease97q1 CL L0 1.8% 0.0% NA -3.0% 4.3%lease97q1 CL L1 4.8% 3.3% 38.0% 1.8% 60.4%lease97q1 CL L2 0.1% 0.1% NA 0.1% 0.0%lease97q1 CL L3 97.3% 97.4% 72.7% 101.5% 25.9%lease97q1 CL L4 1.2% 1.0% 80.0% -0.3% 21.5%lease97q1 CL L5 1.3% 1.1% 21.6% -1.8% 14.8%lease97q1 CL L6 2.5% 0.3% 52.9% -20.0% 32.0%lease97q1 CL L7 1.7% 0.6% 60.8% -7.5% 34.7%lease97q1 CL L8 1.8% 0.8% 39.4% -5.5% 19.9%lease97q1 CL L9 2.9% 0.9% 76.3% -8.4% 84.5%lease97q1 PC 10 6.8% 6.8% NA 3.5% 17.8%lease97q1 PC 15 7.3% 7.3% NA 3.5% 16.2%lease97q1 PC 20 2.6% 2.6% NA -2.5% 13.9%lease97q1 PC 30 23.2% 13.5% NA -58.8% 106.9%lease97q1 PC 32 NA NA NA NA NAlease97q1 PC 33 NA NA NA NA NAlease97q1 PC 34 NA NA NA NA NAlease97q1 PC 35 2.6% 2.6% NA -96.5% 299.4%lease97q1 PC 40 0.9% 0.9% NA 0.3% 1.8%lease97q1 PC 50 14.7% 0.2% NA -72.4% 83.0%lease97q1 PC 55 0.0% 0.0% NA -100.0% 255.6%lease97q1 PC 60 10.0% 10.0% NA 1.1% 50.2%lease97q1 PC 65 5.7% 5.6% NA -3.7% 50.6%lease97q1 PC 70 0.0% 0.0% NA -100.0% 1400.0%lease97q1 PC 72 0.0% 0.0% NA -100.0% NATotal CL Total 6.3% 5.0% 44.5% 1.1% 37.7%Total PC Total 7.0% 6.8% NA 2.3% 21.3%Total Total Total 6.9% 6.7% 77.9% 2.2% 22.1%

Looking at counts saves

the day

Page 18: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Manually Examine BIG Manually Examine BIG “Leftovers”“Leftovers”

Products identified by simple "ACE" as belonging to ABC Investment Corp:

ACTIVE PRODUCTS ALL PRODUCTS

total incorrect %incorrect corrected total total incorrect %incorrect

Establishments 84 22 26.19% 62 Establishments 97 23 23.71%Accounts 107 24 22.43% 83 Accounts 114 24 21.05%

Products 531 67 12.62% 464 Products 792 88 11.11%Product $ $1,658,729 $65,670 3.96% $1,593,059 Product $ $2,098,254 $95,052 4.53%

Products caught by simple "ACE" as bogus (NID="FDL", wrong Duns Ult):

ACTIVE PRODUCTS ALL PRODUCTS

Establishments 33 Establishments 37Accounts 37 Accounts 38

Products 190 Products 301Product $ $355,207 Product $ $535,612

Products found by simple "ACE" that were missed by National Account ID (NID):

ACTIVE PRODUCTS ALL PRODUCTS

Establishments 11 Establishments 13Accounts 16 Accounts 19

Products 35 Products 54Product $ $78,373 Product $ $118,407

Products found by simple "ACE" that were missed by Duns ultimate and/or match:

ACTIVE PRODUCTS *Kentucky ALL PRODUCTS

"Mail Factory"Establishments 27 1 Establishments 31

Accounts 33 3 Accounts 36Products 281 167 Products 420

Product $ $1,135,701 $942,829 Product $ $1,347,495 Pareto

Page 19: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Ensuring Data QualityFocus on the PROCESS (TQM)Define Quality Metrics (KPIs)Use Data Cleansing Tools

NCOAType “data cleansing” in Google for list

Document everythingAudit regularlyTest, test, test

Who is using? How?Beg

ins an

d ends

with

the

CUSTOM

ER

Page 20: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Information Quality in Information Quality in PracticePractice

Prolog: Poor Data costs $billions

The Good: You can Clean it up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?

Page 21: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Who’s Cleaning Up?Who’s Cleaning Up? Data Quality Software Vendors

– IBM (acquired Ascential who acquired Vality)– SAS (acquired DataFlux)– Harte-Hanks (acquired Trillium)– Firstlogic, Unitech, Innovative Systems– Similarity Systems (ACQUIRED Evoke SW)

Address Matching & Cleansing Vendors– Pitney Bowes acquired Group 1 (4/05) and Firstlogic (???)

– Plus 100s of service bureaus Specialty houses

– I.e., Comanage for telcomm companies

….and the data is still dirty.

Page 22: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Information Requirements Information Requirements are Relativeare Relative

Strategic objectives or goals Who are the clients (THEY) What THEY need When they need it Where they need it How they need it

Page 23: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Data Quality Programs Data Quality Programs are Rareare Rare

Scope the Effort Data Discovery

Categorize Data DefectsDevelop DQ rules

Define DQ Program Launch & Track

- Information Inventory- ”As-is” processes- Information Priorities

- Data Description- Simple Data Checks- Data Mining

- Integrity, retention, refresh, reliability- Classify defects & causes

- Metrics, KPIs

Page 24: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Dealing with DENIAL is Dealing with DENIAL is DauntingDaunting

Expose shoddy business processes Change business practices Agree on common definitions, rules, roles

Train employees Tackle political/cultural issues

Page 25: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Information Quality in Information Quality in PracticePractice

Prolog: Poor Data costs $billions

The Good: You can Clean it up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?

Page 26: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Sources of ErrorsSources of Errors

Technical– Careless calculations– Poor programming

Process– Human error– Negligence– Intent (policy)

Political

Page 27: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Actual / Forecast To-Be Business processes complete Data quality activities

– 1.6 MM obsolete identified and purged– 2.3 MM duplicates identified, 325K identified for elimination (Customers confirmed need for 1.95 MM duplicates, based upon

current capabilities)– Cleansed 3.6 M U.S. records (via Finalist, Customer Contact)– D&B DUNS linkage in process. Identified 577 K duplicates, 2.9 M unique DUNS customers

Analyzed and Improving processes which create bad data– Identified and documented sources of create / update / delete to legacy customer records. – Removed change authorization from 2,940 employees, primarily Sales, Service, Product Supply, and PBCC New Business

Operations– Identified and corrected 4 significant (and numerous minor) legacy systems problems creating incorrect and/or duplicate

customer information Conversion to SAP environment

– Production environment complete– 34 interface and conversion development activities– Customer Master Live (Converted from IMS to SAP) on track for December 6

User Training– User and Power user training developed– Power User Boot Camp training completed November 22– End user training (1,300 users) scheduled for January

Fix the Basics: Customer Fix the Basics: Customer MasterMaster KPI Target – level 1

Cleanse 6.9 million root recordsEliminate duplicate customer records (est x %)Eliminate inactive customer records (est x%)Reduce business processes creating incorrect Customer InformationPopulate and interface SAP Customer Master

  Customer Master live by Dec. 31, 2002

Target – level 3

Customer Master live by 1Q 03

Page 28: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Avoiding ErrorsAvoiding Errors Technical

– Error Trapping– CMM program

Process– Edit checks– Training– Streamlining

Political– Culture change

“This customer already is in our

database.”

Page 29: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Unreliable Cancellation Unreliable Cancellation Data Data

Creates a “Lose-Lose-Lose” Creates a “Lose-Lose-Lose” I. Suspect cancellations identified

• Audit reports sent to field• VP, Sales fired

II. Customer Retention• Executive focus• The Pogo Effect

III. Fix the Basics• “Software enhancement”

IV. Order to Cash• “All fixed for 2005”

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 30: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Taking I.Q. to the Next Taking I.Q. to the Next LevelLevel

Merge/Purge/Address Hygiene no longer good enough

Move from Repair to Correct to Prevent

Organizational Change, Compromise and Accountability impact program budget

How to JUSTIFY $$ when I.Q. is so fuzzy??

Page 31: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Information Quality in Information Quality in PracticePractice

Prolog: Poor Data costs $billions

The Good: You can Clean it up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?

Page 32: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

It’s All About It’s All About PerceptionPerception We’ve had this problem for 20 years. We know we had this problem for 10 years

Every organization has the problem We know it will cost to improve it How much of an improvement can I buy? What is the ROI? Can I believe what you tell me?

Page 33: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Wang & Strong ID 179 I.Q. Wang & Strong ID 179 I.Q. AttributesAttributes

Ponniah defines 17

Redman defines 27

Marakas defines 11

Page 34: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Where to Start?Where to Start?

Too many definitions: no clarity Need to focus! Most include ACCURACY as one dimension

Even Accuracy is a fuzzy concept– What are ‘errors”?– What are “true” values? “false”? “suspect”?

Can we even measure accuracy “accurately”?

Page 35: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Even the Lexicon of Terms Even the Lexicon of Terms is Fuzzyis Fuzzy

Direct observation of “errors”– Subjective– Unreliable– Impractical even with moderate size data sets

$ High cost

Automated error reports– Who creates the rules?

– Needs to be audited

– Misses subtleties– Lower cost

Quality>>Accuracy>>Error-Quality>>Accuracy>>Error-freefree

A Major Research A Major Research ChallengeChallenge

Page 36: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Find the ErrorsFind the Errors

You be the JUDGE

Custname Street City State Zip PhoneAlec Gomez and Sons 1 Hyde Park Village Chicago Ilinois 56750 (312) 299-3111

Bill Able 191 York Ave, Apt. 19K New York New York 10028 (212) 333-6666

Kyle Costner 1993 Michigan Avenue Chicago Illinois 56723 (212) 423-4441

Joe Diehr 110 W.90th Street Chicago Illinois 56750 (312) 299-3333

Sandra Cimino Interiors 99 Sunset Bld Solana Beach California 92119 (710) 000-1212

Randy Shay 00 Bay Shore Drive San Francisco California 92013 (410) 345-7890

Center Street Catering Yonkers New York New York 10123 (914) 449-1919

Gene Mastow and Partners 155 W. 80 Street New York New York 10028 (646) 484-4482

George Jenkins, Inc. 1442 Columbus Avenue Nee York New York 10023 (212) 422-4102

Blaire Wallace 60 Cerntal Avenue Solana Beach California 92119 (710) 000-1414

Ron Johnson Tourss 000 Marine Drive Chicago Illinois 56700 (312) 222-9999

Jane Smith 113 Creative Place San Francisco California 92001 (410) 355-5555

Richard Green, LLP 112 W. 87th Street Chicago Illinois 56750 (312) 111-0000

Cresent Designs 2 Execution Suffering New 7ork 99999 (045) 369-6690

Page 37: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

The Impact of Context The Impact of Context is Clearis Clear

Transfer Function - Graduates only

y = -1.0063x + 22.718

R2 = 0.2803

0

5

10

15

20

25

30

35

0 2 4 6 8 10 12 14 16

error count

perceived accuracy

Transfer Function - Undergrads

y = -0.2397x + 18.216

R2 = 0.03

0

5

10

15

20

25

30

35

0 2 4 6 8 10 12 14 16

error count

perceived accuracy

Page 38: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

What about Cognition?What about Cognition?Transfer Function - Business Professionals only

y = -1.2199x + 24.444

R2 = 0.4011

y = -0.0017x 3 + 0.0855x2 - 2.1068x + 25.893

R2 = 0.4120

5

10

15

20

25

30

35

0 2 4 6 8 10 12 14 16

error count

perceived accuracy

Page 39: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

The 3 C’sThe 3 C’s

Cognition

Context

Content

Preference Per

ceptio

n

Performance

Analytical

Aptitude

Functional Experience SME

Page 40: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

The Data Quality Perception The Data Quality Perception Research WebsiteResearch Website

http://www.xkimo.com/dqpresearch/

Leon SchwartzLeon Schwartzwww.informeddecisionsgroup.www.informeddecisionsgroup.comcom

Thank you for your Thank you for your timetime

Page 41: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Omit the AnalystsOmit the AnalystsTransfer Function - Management only

y = -0.8056x + 19.809

R2 = 0.1732

0

5

10

15

20

25

30

35

0 2 4 6 8 10 12 14 16

error count

perceived accuracy

Page 42: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Research DesignResearch Design

Samples created with 0-15 errors (17% max)

Samples randomly presented (see website) Practice session (6 samples) Respondents asked to rate 16 samples on 1-30 scale (modified Magnitude Estimation)

Double anchors used 63 students (grad & undergrad) attempted

Page 43: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

The Simple TaskThe Simple Task

Please examine the data/report above, and estimate the accuracy of the information by placing your cursor and clicking on the line below:

Error Prone Error Free(Too many mistakes to be useful) (No discernable mistakes)

Low Accuracy High Accuracy

Anchor Study Fiasco!

Page 44: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

The Perceptual Transfer The Perceptual Transfer FunctionFunction

Number of errors (objective)Error rate (objective)

Perceived accuracy

(subjective)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

3

2

1

Page 45: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

perceived accuracy

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

error count

All Graduate All Graduate StudentsStudents

Page 46: Information Quality in Practice Informed Decisions Group Copyright 2005 Information Quality in Practice: The Good, the Bad and the Ugly Leon Schwartz,

Information Quality in PracticeInformed Decisions Group Copyright 2005

Business Business ProfessionalsProfessionals

perceived accuracy

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

error count