customer needs for data quality by irene polikoff
TRANSCRIPT
© Copyright 2016 TopQuadrant Inc. Slide 1
Customer Needs for Data Quality
Irene Polikoff, CEORalph Hodgson, CTO
TopQuadrant
© Copyright 2016 TopQuadrant Inc. Slide 2
TopBraid Enterprise Solutions Your Enterprise Solutions
Customize/
ConfigureYour Own
Solutions and Platform
IDETopBraid Platform Solution Engine
Search / Content Enrichment through the use of Taxonomies and
Ontologies
Data Governance: Reference Data Management/Metadata
Management/Data Lineage
Data Layer
© Copyright 2016 TopQuadrant Inc. Slide 4
What is Data Quality
The five C’s:– Consistency– Completeness– Correctness– Conformance– Comprehensibility
Plus– Precision– Temporality
© Copyright 2016 TopQuadrant Inc. Slide 5
Examples of where TopQuadrant has met the needs for Data Quality
Consumer Products– Clearance in different markets
Production Reporting– Oil & Gas
Asset Management– V-CON project
Regulatory Compliance– Finance Sector
© Copyright 2016 TopQuadrant Inc. Slide 6
Common Issues with ‘self created’ RDF Data
Careless URIs e.g., skos:label Incorrect use of predicates e.g, skos:broader
with a text value Missing rdf:type statements Inconsistent literals e.g., text versus integer Mal-formed strings Conflated values Inconsistent Units of Measure
© Copyright 2016 TopQuadrant Inc. Slide 7
After initial load, data quality is about enforcing “required practices”
Each organization will have its own Common themes are:– Requiring some fields– Capitalizing names– Enforcing certain patterns (what characters are
allowed)– Enforcing “permissible” values– Complex rules with dependencies between fields– Totally “closed world”
© Copyright 2016 TopQuadrant Inc. Slide 8
Quality-enabling tool support
Form generation based on:– class definition– SHACL constraints
Auto-completion of entries Cardinality enforcement Data types enforcement– SHACL + QUDT
© Copyright 2016 TopQuadrant Inc. Slide 9
As an example – definition of a class
© Copyright 2016 TopQuadrant Inc. Slide 10
As an example – resulting ‘instance’ form
© Copyright 2016 TopQuadrant Inc. Slide 11
In some cases, enforcement is “soft”
Data Validation happens in real time, but also “after the fact”
For information governed by EDG e.g., reference data, glossary terms, etc.
It is an ongoing process summarized in dashboards and metrics
© Copyright 2016 TopQuadrant Inc. Slide 14
We have been transitioning to using SHACL for class definitions and UI customizations
• Users can now create not only classes and properties, but also SHACL constraints
Ask
Thank You
Ralph HodgsonE-mail: [email protected]: @ralphtq, @topquadrant
Irene PolikoffE-mail: [email protected]