technical metadata integration for true data lineage...business analyst etl developer business...
TRANSCRIPT
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved. The material in this document is for the consumption of the recipient only. It may not be forwarded/shared with anyone else without express written permission of Compact Solutions LLC.
TECHNICAL METADATA INTEGRATION & TRUE DATA LINEAGE
SID BANERJEE VP – WW PRODUCT SALES
DAWID DUDA, VP – PRODUCTS
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
INTRODUCTION
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
Founded in 2002, privately held
Presence in
Chicago (US) – Worldwide HQ
London (Western Europe)
Krakow (Eastern Europe/Poland) – Innovation labs
Ahmedabad (APAC/India) – Dual shore services
Solutions
MetaDexTM – Metadata Integration
TestDriveTM – Testing solution for Data Warehouse/ETL
System Integration Capabilities/Alliances
WHO ARE WE – ORGANIZATION AND CAPABILITIES
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
GoalsTop Companies trust Compact Software
BFSI
Life Sciences
Technology
All logos are trademarks and owned by their respective companies and affiliates
Retail
Media
Logistics
© Copyright 2009 - 2016 Compact Solutions LLC. All Rights Reserved.
4
ETL DeveloperBusiness Analyst
Business Partner Data Steward
Project Manager
Information Architect
“In reviewing this report
I have a question
about…"
“Let me look into it,
I will get back to you”
“I’ll need to take
resources from another
project to get those
answers…”
BI Developer DBA/DDA/DA
Let’s look at
the table…
That is
calculated by …
The data comes in from …
and then… finally…
The mapping rules tell
us that this should…The most reliable
source is.…
Today a single question often requires talking to several different resources because the answers are only found by looking across disparate locations
The AS – IS State?
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
GoalsData Governance – Business Challenges
Expensive missteps - Action is taken - only to find out later that information was wrong or incomplete
Higher costs – unclear change impact and creation of redundant processes and information
Slow response – lack of information clarity slows decision process and agility for mergers and regulatory initiatives (DFAST, CCAR Basel III)
Productivity loss – those who don’t understand data burden the few that do
Lack of standards– no global codes, definitions or data format exists
Application specific definitions – term definitions differ across divisions and LOB
No single source of truth – unless vetted its not trusted.
No ownership / governance for the problem – system and process “work-arounds” are created.
Difficult to find and understand data reliance on key knowledge workers.
Root Cause Analysis-data quality issues are time consuming to understand and verify.
Problems Governing & Managing Data Cost of Misunderstanding
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
FRONT LINE
APPLICATIONS,
REGULATORY
OLAP
DATA INTEGRATION /
DATA QUALITY / ETL
Analytics(e.g. SAS, Cognos,
Business Objects, etc.)
ETL Tools
(e.g. DataStage,Informatica,
etc.)
Data Warehouse Appliance
(e.g. IBM PS,Teradata Netezza.)
SOURCE SYSTEMS,
Mainframes
ERP
External
End to end data lineage
Business context and meaning for IT assets
Catalog of information assets
Risk data analysis and dependency management
Shared metadata repository for Business and Technical users
Data Extraction
(e.g. SQL Scripts, COBOL, JCL
STG
Data Governance – Technical Metadata Challenges
EDW
Landing Zone
HDS
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
KEY DATA GOVERNANCE ELEMENTS
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• What data do I have – catalog of my data assets
• What language do I use to speak about it – my business glossary
• What does my data mean – the assets-glossary relationship
• How is my data sources and transformed – the data lineage
WHAT ARE THE KEY ELEMENTS OF DATA GOVERNANCE
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
THERE ARE TWO TYPES OF METADATA OUT THERE...
Technical Metadata
Business Metadata
Succesful Data
Governance
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Assets catalog
• Data lineage
• Operational metadata
• Data profiling and quality results
• Testing reports
• And more...
TECHNICAL METADATA ELEMENTS
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
THE HOLY GRAIL OF LINEAGE
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Connections between data assets
• Shows how does data move from one asset to another
• Exposes transformations used to derive data elements
WHAT IS DATA LINEAGE
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Correct
• Up to date
• Usable
THE KEY REQUIREMENTS FOR DATA LINEAGE
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Prepare manually• Read the code
• Write source to target mappings
• Maintain over time!
WHERE TO TAKE THE LINEAGE FROM
• Extract automatically• Identify the right extractor
• Configure for your environment
• Setup automatic refreshment
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
REAL LIFE ENVIRONMENTS ARE COMPLEX...
MSSQL
Oracle
SSISOracle
DataStage
Oracle
TD
DataStage
FastLoad
BTEQ
TD(EDW)
Cognos
TD(Views)
MicroStrategy
QlikView
MSSQL
Oracle
Informatica
PLSQL
Hadoop Hive
Netezza
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• When prepare manually• Technology is used in limited
scope
• Processes do not change over time
• Detailed low level lineage is (not yet) required
WHERE TO TAKE THE LINEAGE FROM – LOOK AT TECHNOLOGIES
• When extract automatically• Technology used at large scale
• Processes changing on a regular basis
• Detailed low level lineage (with transformations) is required
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
TWO CLASSES OF APPLICATIONS
Static code/processes• Different types of code (SQL, SAS, Cobol, DataStage, Informatica, SSIS,
...) available directly is relatively easy to derive lineage from
Dynamic code calculated basing on parameters• Dynamic code generated in runtime basing on parameters of
procedures and programs could also be analyzed once the parameter values were identified.
18
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
DYNAMIC CODE EXAMPLES
• Fragments of SQL extracted from database tables• Actual query construction happening only at runtime• Highly parameterized ETL processes
19
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
THE NOT-THAT-EASY PARTS – APPROACHES
• Using operational metadata to identify parameter values for ETL processes
• Using logs to capture the actually executed transformations• Automatic analysis of metadata-driven code generation• If you are interested in the low level details of how such challanges
are solved with real life examples we will be happy to discuss offline
20
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
SAMPLE LINEAGE DIAGRAMIT IS COMPLEX ITSELF, BUT THERE ARE STILL MORE LEVELS...
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
A ZOOM INTO A SINGLE STEP OF THE PROCESS
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
LESSONS LEARNED
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Data governance does not happen overnight across all your assets...
• Clearly identify your priorities, basing on the regulatory requirements and/or internal drivers
• Work on particular applications/areas one after another
DO NOT TRY TO BOIL THE OCEAN
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Any large company has policies to protect the assets
• What you need to do to get your governance project off the ground often does not follow the typical patterns
• Introducing governance/quality tools/lineage involves working with• Particular LoB
• Appropriate administration teams
BRING PEOPLE ON-BOARD BEFORE THE ASSETS
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Different types of users• Business users
• Power users
• Engineers or analysts
• Ask yourself few questions:• How many users of each type do you have?
• How important is it (in short and long term) to satisfy their needs? How do you prioritize?
• Keep in mind – there are some regulations you may need to follow!
SOME PEOPLE NEED TO SEE IT ALL... WHILE OTHERS DON’T
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• These are highly technical projects, with a relatively high amount of „surprises”
• Gather as much intel as you can before you start, but do not assume you know it all – the life of your data will surprise you
• You need a solid data governance platform/tooling to work on, but also a tiny swiss army knife to solve some smaller problems that may be specific to your environment
EXPECT THE UNEXPECTED
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Data governance projects usually fail for a reason
• More often then not that reason is• Either lack of sufficient IT engagement
• Or lack of sufficient Business engagement
IT TAKES TWO TO TANGO
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
WHERE ARE WE GOING WITH DATA LINEAGE
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Big data is just one more technology – the same rules apply
• What is so special about that? The sandbox approach...
• If you want your big data to be governed, use it responsibly
BIG DATA IS COMING TO THE PICTURE
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Even when initial lineage (especially for CCAR or BCBS 239) is prepared manually sooner or later keeping it up to date becomes unmanagable
• More technologies can now have ready lineage extractors available (Cobol/JCL for example)
• Lineage becomes more complete, but in the same time more complex
MORE AUTOMATION
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Last year about 20% of our projects were related to custom solutions aimed at enriching and enhancing standard technical metadata (to the extend where we have launched a set of generic tools to automate that)
• Demand for operational metadata from various technologies is growing
• Again – metadata becomes more complete and more rich, but in the same time more complex to consume
MORE RICH METADATA
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
• Creating data lineage, when using proper extractors, is not that difficult these days
• What is more challenging is consumption - pick a random table and your lineage will have 500+ objects
• Make sure you use a repository that will allow you to work with this complexity (filtering and reporting are the two key features)
• In some environments we are at the point where lineage must be pre-aggregated before populating the repository (user demand is driving the technology)
CONSUMPTION AND USABILITY BECOMES THE MAIN CHALLANGE
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.
Q&A / DISCUSSION
Let us discuss how Compact can assist your organizations information management objectives
▐ For more information please visit www.compactbi.com or
▐ Contact us [email protected]