PERFORMANCEthat
empowers
www.knightsbridge.com © copyright 2001 Knightsbridge Solutions LLC
K N I G H T S B R I D G E
Practical Meta Data Solutions For the Large Data Warehouse
DAMA - MN
October 16, 2002
2Tom Gransee, Knightsbridge
Agenda• Introduction
• Enterprise meta data strategy
• Data warehousing meta data strategy
• Project approach for a practical solution
• Meta data architecture
• Defining ROI
• Tools/options for moving forward
• Meta data summary
• The data quality cycle
• Questions
3Tom Gransee, Knightsbridge
Everyone knows
So why isn’t everyone doing it?
MetaData
– Valuable
– The right thing to do
– Important for long-term success
Meta data is:
Introduction
4Tom Gransee, Knightsbridge
• Don’t know how to demonstrate the ROI
• Too complex or we don’t know where to begin
• Can’t agree on what should be done
• Market is not mature enough – we’ll wait until it settles down
• We’ve tried and failed
Why isn’t meta data being addressed?
Where do we start?
How do we justify it?
Introduction
5Tom Gransee, Knightsbridge
What is the cost of dirty data?• “The cost of poor data may be 10-25 percent of total
revenues” - Larry English
Real Life Insurance Example - $10 million annually• 2 million annual claims with 377 data items each• Error rate of .001 generates more than 754,000 errors per month and over 9.04
million annually• If 10% are critical to fix, there are still over 1 million errors to correct• Even at a conservative estimate of $10 per error, the companies risk exposure to
poor claim information is $10 million a yearSource: TDWI Data Quality and the Bottom Line
Introduction
• The Data Warehouse Institute (TDWI) estimates that poor quality customer data costs U.S. businesses a staggering $611 billion a year in postage, printing, and staff overhead
• Data quality issues torpedoed a $38 million CRM project - Fleet Bank 1996
6Tom Gransee, Knightsbridge
• Greatly increased exposure to the data
• The data warehouse and Operational Data Stores have revealed the impact of data quality problems on the business
• Meta data is the component that ties the data warehouse and ODS to the legacy systems and exposes the data quality problems
Legacy applications
• Limited exposure to the data
Understanding Data Quality Issues
Introduction
7Tom Gransee, Knightsbridge
Setting boundaries
Enterprise meta data strategy
You can’t do it all at once!
8Tom Gransee, Knightsbridge
Establishing a meta data strategy
Select a practical starting point and build on your success!Enterprise meta data strategy
Data Warehouse
and Business
Intelligence
Enterprise Architecture
- EAI- ERM
Business Rules
Component Management
Document Management
Content Management
andPortals
CDISCHL7
Clinical TrialsClinical
Patient Care
9Tom Gransee, Knightsbridge
Starting with the data warehouse
A practical strategy with real business benefits!Data Warehousing meta data strategy
Data QualitySystems
10Tom Gransee, Knightsbridge
Why is the DW a good starting point?
Built for today - architected for tomorrow
• DW typically focuses on the data that most needs to be shared
• DW presents the greatest need to understand the data because it is cross-functional
• Real business benefit can be obtained for a practical investment
• Existing DW are being re-architected
• Meta data standards and tools are beginning to have an impact in this area
• Challenges are created in a best-of-breed development environment
Data Warehousing meta data strategy
11Tom Gransee, Knightsbridge
MD integration challenges in the DW architecture
Data Warehousing meta data strategy
12Tom Gransee, Knightsbridge
Defining meta data for the DWThe formal approach to managing the processes and information needed by both business and technical
associates to define, build, administer and navigate the DW
Data Warehousing meta data strategy
BuildingBlocks
Example Benefit User Source
DefineBusiness MeaningCalculationsLineage
RecognitionUnderstandingTrust
Casual UserPower UserNew User
Heads and documentsSpreadsheetsETL mappings
Build andAdminister
UsageKey AttributesMappings
PerformanceIntegrityScalability
OperatorModelerDesigner
ETL Jobs StatisticsData ModelETL Tool
NavigateAliasCanned ReportsRefresh Data
LocationExpedienceAccuracy
New/Casual UserExecutiveFrequent User
Data ModelBusiness IntelligenceJob Schedules / logs
Robust, integrated meta data solutions will aid in using, developing and operating the data warehouse source: META Group
Meta data solutions are also referred to as: Information Catalogs or digital DNA
13Tom Gransee, Knightsbridge
The need for a complete project lifecycle• Document today’s meta data environment
– Identify meta data users and its sources
– Identify the business drivers • Problems, opportunities and associated costs
– Identify requirements• How does meta data address the business drivers• What are the savings
– Define objectives and benefits
• Develop a meta data architecture– Processes and disciplines
– Integration requirements
– Delivery and usage requirements
– Technology component / tool
– Change management process priorities
• Develop project plan and cost/benefits
• Build with an Iterative release approach
Process LayerProcesses and disciplines required to generate and sustain complete and accurate meta data
Integration LayerExchange of meta data between multiple tools across the data warehousing framework
Technology LayerAutomate processes and integration and provide a single Web based view consolidated across tools
Building an Architecture
Project Approach for a practical solution
14Tom Gransee, Knightsbridge
What are successful projects addressing?1. Lineage
– What data is available, where it came from and how it’s transformed– Including definition, currency and accuracy
2. Appropriate information by user type– Easy access to meaningful meta data– “How is it different from what I’m used to seeing?”
3. Impact analysis across tools and platforms– Impossible to do without a formalized meta data technology solution
4. Versioning– How has it changed over time?– Moving from development, to test, to production
5. Live meta data– Meta data is a natural part of the process– A function fails if the meta data is not complete and accurate
Project Approach for a practical solution
2
1
3
4
15Tom Gransee, Knightsbridge
Laptop computer
Business and technicaluser Interface
Physical RepositoryRDBMS
Oracle, DB2, others
Meta DataRepository
Engine
Object Model
Modeling ETL RDBMSFile
SystemsJob
ExecutionJob
Execution
Collection Points:
Consolidated view across tools
WEB
DW & ETLDesign
MigrationMgmt
MigrationMgmt
DataQualityData
Quality
Source systems
Data warehouse
Staging area / ODS
Mappings Bus Req
Centralized Meta Data Repository• Manage redundancy• Provide one view across tools
BIBI
Change management
Source Control
Configuration management
1. Collection 2. Integration
3. Usage
Meta data architecture for the DW
Auditing
Balancing and controls
Data to support lineage
Data Cleansing
Householding
Data to support lineage
•Processes and disciplines•Live meta data concepts•System of Record
Meta Data Architecture
Data ValidationData Profiling
5
6
7
16Tom Gransee, Knightsbridge
Components of a well-architected DW solution• Short- and long-term requirements are
well-defined displaying clear business benefits
• Meta tags required to support lineage, balancing and controls, etc., are built into the DW architecture
• Live meta data concepts are rigorously followed
• A plug-and-play architecture is used– Support for multiple tools in a category, i.e.,
Informatica and Ab Initio for ETL
– Simplifies future transitions to new technology and tools
Built for today - architected for tomorrow
Meta Data Architecture
17Tom Gransee, Knightsbridge
How is a meta data investment in the DW justified• Reduced total cost of ownership
– Impact analysis -- 50 percent of development efforts are spent assessing what is impacted by the change
– Configuration and migration management– Eliminate redundant work
• Improved user acceptance– What’s available and how to access it– Business user understands the data and
where it came from -- how is it different from the operation informational systems?
• Risk avoidance– What’s the impact of not delivering the
business benefits used to justify the DW
• Industry or government regulations
• Best practicesDefining ROI
8
18Tom Gransee, Knightsbridge
The CWM standard as an enabler1.The Common Warehouse Metamodel
2.Model driven architecture• Based on object oriented modeling and
development
• Building blocks: UML, MOF, XMI and OCL
• The model generates:
– Repository data structure changes
– APIs for models to interoperate
– APIs to load and retrieve meta data
– CORBA components
3.Meta data exchange format• Based on the CWM and Standard DTDs
• XML data streams following the XMI standard
CWM is supported by:
• Oracle
• IBM
• SAS
• Adaptive
• Hyperion
Tools / Options for moving forward
19Tom Gransee, Knightsbridge
Meta data management repositoriesSelection Criteria
• Manage associations across tools• Search and retrieval across tools – ability to display a single consolidated view
– Lineage– Impact analysis– Plain English definitions– Subject areas
• Ease of Extensibility• Customizable user interface using an industry standards web solution• Reduce integration cost in a best-of-breed development environment• Enable Plug-and-play tool strategy• Available automated bi-directional meta data exchange bridges / adapters• Template driven retrieval of meta data• Group / role based security• Interoperability between metamodels• Support of industry standards – CWM, HL7, etc.• Support of Federated repositories• Ability to expand beyond the DW• Versioning – extracting a time slice
Tools / Options for moving forward
20Tom Gransee, Knightsbridge
• Enterprise – Supports a broad range of functionality including enterprise architecture, data warehousing, business intelligence, component management and others– CA – Advantage (formerly Platinum)
– ASG - Rochade– Adaptive Foundation (formerly Unisys)
• DW Suite solutions – meta data solutions that are integrated into a suite of tools primarily from a single vendor designed to build and maintain the complete data warehouse framework– Microsoft Repository
• DWSoft – Navigator web browser– SAS –Warehouse Administrator– Oracle Warehouse Builder
• OWB Repository• Enterprise Data Warehousing and ETL – supports data warehousing, ETL and business
intelligence activities typically in a best-of-breed toolset environment– Ab Initio– Informatica
• Data Advantage Group – MetaCenter– Ascential – Meta Stage
• Modeling – supports development and versioning activates for ER and object modeling– ERwin Suite– Rational Rose– Oracle Designer– Popkin System Architect
Types of Meta data management repositories
Tools / Options for moving forward
21Tom Gransee, Knightsbridge
• Build from scratch using an RDBMS and custom web delivery application
• Implement a repository tool and extend it as needed
– Enterprise repository tool
– Warehouse suite solutions
– ETL tool repository
Options for moving forward with a DW solution
No complete solution exists today. Establish a foundation and gradually develop a complete solution through a series of iterative releases!
Tools / Options for moving forward
22Tom Gransee, Knightsbridge
Development lifecycle
• Define business objectives, requirements and benefits
• Understand standards
• Research repository tools
• Define technical objectives/requirements
• Document capabilities
• Relate standards, tools, requirements, architecture
• Develop scope and priorities
StrategyDevelopment
ArchitectureDevelopment
Design and Construction
• Define meta data sources
• Define repository
• Define hardware platform and software requirements
• Define meta data integration ETL process
• Define meta data delivery/display mechanisms
• Iterative release approach
• Design and construction of meta data integration ETL
• Design and construct logical and physical meta data models
• Hardware platform implementation
• Test
• Rollout
4 - 6 weeks 3 - 4 weeks 6 - 10 months
Tools / Options for moving forward
23Tom Gransee, Knightsbridge
Pitfalls to avoid• Selecting a repository tool without
defining requirements first
• Expecting the repository tool to solve process problems
• Selecting an architecture that is not extensible or can’t scale
• Underestimating the effort
• Relying too much on manual entry
• Selecting an initial project that does not deliver adequate business benefit
• Selecting an initial project that is too complex to be practical
Many meta data projects fail because they are either too big to be practical or too small to deliver real benefits
Meta data summary
24Tom Gransee, Knightsbridge
OLAP Cubes
Integrated API
3 rd Party/ API
API
Business Definitions for DW AttributesLogical & Physical DB Models
Ab Initio
ETL
Pro ClarityER/Win 3.5
Oracle 8i
Web Based Access
DWSoft Web BrowserXML and ASP
SQL Server2000
Source-to-Target Mappings, Code Mappings, Business Rules, Data Quality, Contacts and
Document Definitions
Code & DataMapping
Custom API
Access BasicOracle 8i
DATA Modeling
Meta ModelSD
Professional Workstation 6000
PRO
Microsoft Meta Data ServicesSQL Server 2000
StagingArea
Data Warehouse
Physical RDBMSInfo through OLE DBCodes FrequencyMin/Max/Averages
Interfaced
API
DML Information
One example of a hybrid solutionCubes, Hierarchy and LevelsField-to-Field Derivation
Fast, inexpensive, low-risk approachimplemented at a major insurance company
BusinessIntelligence
Custom
AP
I
Meta data summary
25Tom Gransee, Knightsbridge
The DW meta data solution within the enterprise meta data architecture
Flat Files
Laptop computer
PersonalizedPortal Access
PDA
Tower box
Taxonomy
Meta data
Tower box
Taxonomy
Meta data
Tower box
MetadataRepository
Knowledge Management Engine• Business rules • Content management• Automatic taxonomy generation• Neural net search engine/adaptive learning• Text mining• Personalization• XML, Java, HTML LDAP:support
ETL Data Management Platform• Centralized meta model• Complex data transformations• Meta data repository• Real-time extractions• XML, Java, HTML, LDAP
Tower box
EIP Engine• Browser-based access• Personalization• Common Authentication Proxy• Automatic taxonomy generation• Structured/unstructured info integration• XML, Java, HTML LDAP support
MicroStrategy BI Platform• Analytical processing• Graphical visualization• Reports
Tower box
Data Warehouse Platform• Operational data store• Customer information• Historical information• Security
UnstructuredData
Document/Email Platforms• Documents• Catalogs/digital content• Email
Oracle
Oracle
Oracle
Oracle
Oracle
Oracle
Analytical data/application launch
Click stream capture
Metadata
Cat
alog
s, c
onte
nt,
docu
men
ts,
web
link
s…
Clic
k st
ream
cap
ture
Data
Dat
a
Custom
er profile information
Unstructured text and digital assets
Dat
a
KM Vendors• Autonomy• Intraspect• Documentum• BRS Rule Track
EIP Vendors• Viador• Hummingbird• TopTier
OperationalData
Enterprise Architecture EAI ERM
MetadataRepository
Oracle
Meta data summary
26Tom Gransee, Knightsbridge
A data warehouse project without a formalized meta data facility has only a one in four chance of being highly successful; still, in the heat of the DW battle, rarely is
meta data seen on the front line!
Is it worth the risk not to do it?
Source: META Group’s industry study: Data Warehouse Scorecard: Cost of Ownership and Successes in Application of Data Warehouse Technology
Increasing the odds of success
Meta data summary
27Tom Gransee, Knightsbridge
• Correctness Defects
• Integrity Defects
• Presentation Defects
• Application Defects *
* How the user applies the data
Detect
Prevent
Correct / Repair
Measure /Make visible
The data quality cycle
com
mun
icate
com
mun
icate
communicate
communicate
1. Identify business drivers2. Set scope3. Define metrics
Data quality cycle
28Tom Gransee, Knightsbridge
Focus on data content
Focus on data structure
Examine and understand data
Change the data
Correctness (value based)
Integrity (structured based)
Inductive Rules
Deductive Rules
Data Quality
Rules
Data Cleansing
and
Transformation
Rules
Content and structure
Governing the Processes
Data Profiling
Column profiling Dependency profiling Redundancy profiling
Data quality rules progression
Data quality cycle
9
29Tom Gransee, Knightsbridge
The data quality cycleAdditional activities to deliver data quality
• Consolidate source systems to reduce collection points and minimize system interfaces
– Consolidate multiple non-integrated legacy application
– Source independent data marts from the data warehouse
• Consolidate shared data
– Use an ODS to shared data across systems
– Use reference tables and keys to logically integrate data that must remain distributed
• Implement a hub / ODS for data integration
– Provide a single source of clean data
– Reduce system interfaces
Data quality cycle
30Tom Gransee, Knightsbridge Data quality cycle
• The meta data solution manages many components needed to support a sustained data quality effort– Navigation meta data– Lineage– Plain English Definitions– Calculations and transformations– Currency– Identify owners and stewards– Business rules– Help identify redundency
• The repository captures and exposes data quality statistics to a wide audience
• The repository web interface can provide a mechanism for soliciting feedback
How meta data supports data quality
31Tom Gransee, Knightsbridge Data quality cycle
“Companies that manage their data as a strategic resource and invest in its quality are already pulling ahead in terms of reputation and profitability from those that fail to do so.”
Source: Global Data Management Survey 2001 PricewaterhouseCoopers
What is the Value?
32Tom Gransee, Knightsbridge
Questions
33Tom Gransee, Knightsbridge
Meta data usage – development samplesData lineage – from source or target
List the fields in a source file
Example 1a
Easily drill to:• Target data• Transformations• Additional source data
34Tom Gransee, Knightsbridge
Meta data usage – development samplesData lineage – data quality review
Example 1b
35Tom Gransee, Knightsbridge
Meta data usage – development samplesData lineage – understanding the data
Example 2
Atlas Source System
1,016,575 Monthly Billing Codes57% - Atlas Billing Codes24% - Total Data Warehouse
Cyberlife IL
437,296 Monthly Billing Codes57% - Cyberlife IL Billing Codes10% - Total Data Warehouse
36Tom Gransee, Knightsbridge
Meta data usage – development samplesImpact analysis
Display all the transformations For a Column in the warehouse or a field from a source system
Example 3
37Tom Gransee, Knightsbridge Example 4
Meta data usage – development samplesLive meta data
RDBMS ETLDDL
Tables
Columns
RDBMS
ETL
DDL
TablesColumns
ETL DesignCodes and data mapping application
Code mappings
Invalid or missing code mappingsSuspense
38Tom Gransee, Knightsbridge
Meta data usage – development samplesIntegrated view of meta data from multiple sources
Example 5a
• Extracted from the repository• Includes data from Oracle• Includes data from ERwin
39Tom Gransee, Knightsbridge
Meta data usage – development samplesIntegrated view of meta data from multiple sources
Example 5b
40Tom Gransee, Knightsbridge
Row Claim Source System Date Source SystemTransaction Identifier Updated Code Value
When is meta data not meta data?
Column
Row level information should be captured, managed and displayed by the application, i.e., the Data warehouse, data mart or other collection points
Column level meta data should be incorporated into the centralized meta data solution for easy display to a wide audience
• Column definition• % of records from each source system• Counts and % of unique values for a code
Audit and Meta Tags
Example 6
41Tom Gransee, Knightsbridge Example 7
• Limited exposure to the data• An awareness of data quality issues• Impact of problems not easy to see• Few sustained activities to correct the problems
• Greatly increased exposure to the data• Impact of problems clearly felt• Few sustained activities to correct data problems• Awareness of meta data problems• Few sustained activities to correct meta data
• Greatly increase the exposure to meta data• Navigation meta data increases data access• Lineage• Plain English Definitions• Calculations and transformations• Identify owners and stewards• Business rules
Legacy applications
The importance of a meta data repository
42Tom Gransee, Knightsbridge
Issue Benefit
• Business Analysts told us they spend up to 70% of their time locating the right information source and resolving multiple versions of the truth
• At least 20% improvement in the time required to locate and validate information
• At least 20% improvement of application development and maintenance activities
• Technical analyst’s and developer’s told us they spend up to 80% of their time finding the data needed to satisfy a request
• Reduced learning curve for new associates by 3 months and lower the mentoring required from key resources
• Business and technical analysts told us it requires six to nine months for a new associate to become proficient in using data
• At least 20% reduction in the time required to perform a complete impact analysis and reduce risk of errors when migrating changes to production
• Lack of complete automated Impact analysis within and across tools creates a serious risk when implementing changes
Quote from the META Group’s series of white papers addressing application delivery strategies
“meta data interchange will improve Application Development and maintenance efficiency by up to 30%, and real-time meta data interoperability will enable up to 50% improvement in Application Development and maintenance efficiency.”
ROI for meta data
Example 8
43Tom Gransee, Knightsbridge
Correctness
Rules• Accuracy
• Consistency
• Completeness
• Balancing
• Continuity
• Precedence
• Currency
• Duration
• Retention
• Precision
• Granularity
Validation
Does it match the rules
Verification
Does it make sense as applied to other reliable sources
Accurate
Data can be valid but still not be accurate
Inspection
Can be as simple as spot checking or as thorough data-driven discovery inspection using techniques like: pattern recognition, classification and probability
What does correct mean
Activities
Example 9