in metadata we trust - dama chicago...in metadata we trust keys to successfully building, managing...
TRANSCRIPT
In Metadata We Trust
Keys to Successfully Building, Managing and Governing an
Enterprise Data Catalog
Anthony J. Algmin 2019 DAMA Chicago
October 16, 2019
If You Remember Nothing Else
• Data Catalogs are like Black Holes: o Massive, built from many pieces, and difficult to
put in a box
• Data Catalogs are a Tool, Not a Solution o They consolidate many data management
functions o Data Catalogs need strong process and people
• Data Value is the Most Important Thing o This fundamental Data Leadership principle
connects Data Catalogs to what matters most to the business
2
Agenda
• Introduction
• Why Data Catalogs
• What Data Catalogs Do
• Implementing Data Catalogs
• APPENDIX: Sustaining Long-Term
3
• Founder and CEO of Algmin Data Leadership, LLC • Helping organizations of all sizes maximize the
value created from their data assets • Many years of data management/strategy
consulting • Former CDO, with earlier roles as a technical data
architect and developer in the financial industry
• Thought Leadership • DataLeadershipBook.com • DATAVERSITY Online Training Courses • Frequent Public Speaking Engagements • Quarterly Column at TDAN.com
About Me
[email protected] 630-403-8348 algmin.com
4
Why Data Catalogs
5
The Data Governance Breakdown
• A Question: Whose Data Governance is Wildly Successful? o Why not?
• A Better Question: What does “Wildly Successful Data Governance” Look Like? o Is it hard to even imagine? o Are our visions of this objectively
ridiculous?
6
The Value of Data
• The Value of Data o The realized difference between what
you do with it versus what you would do without it
• Measuring Value o Increase Revenue o Decrease Cost o Manage Risk
• These are the ONLY ways data creates actual value
7
Data Leadership Framework• Access: Preparing Data For Use
o Data Security o Data Architecture o Data Wrangling o Development o Support, Operations, DevOps
• Refinement: Optimize Data Potential o Metadata o Data Quality o Master Data o Enrichment o Curation
• Adoption: Acting from Data Insights o Data Modeling and Warehousing o Traditional Reporting
o Interactive Dashboards and Visualizations o Systems Integration o Emerging Data Technologies
• Impact: Maximize Business Outcomes o Measurements, Metrics, KPIs o Regression Analysis and Predictive Modeling o Machine Learning and Artificial Intelligence o Business Process Automation o Data Monetization
• Alignment: Engage Stakeholders o Strategy, Standards, and Policies o Project and Program Management o Marketing and Communications o Training and Building Quantitative Culture o Regulatory Compliance
8
What a Data Catalog Solves• So What is the Point?…
….“So What” IS the Point! o Data Catalogs are like the ERP of the
data space—they can do pretty much anything
o If we try to do everything at once, we’re going to fail
• What is Most Important? o Differs for each organization o If we think of a Data Catalog as a
solution, we’re in trouble o We must think of it as a tool
9
What Data Catalogs Do
10
Data Catalog Functions• Development
o Profiling o Ad Hoc Analytics / Data Science o SQL Repositories o ETL Facilitator
• Metadata o Business Glossary o Master Data o Data Quality o Lineage o Dictionary
• Data Governance o Collaboration o Policies and Standards o Workflow Facilitation o Compliance o Data Lifecycle
• Reference o Search o Taxonomy o Documentation Repository o Data Lake Navigation
• … any more?
11
Development
• Profiling o Determining characteristics of data
acquisition
• Ad Hoc Analytics / Data Science o Navigate data used for analysis
• SQL Repositories o Database “code” for reference/re-use
• ETL Facilitator o Source/target mappings, aggregation
rules, etc.12
Metadata• Business Glossary
o Agreed definitions for broad use
• Master Data o Most important data, frequently referenced
• Data Quality o Rules and policies to determine and improve
suitability-for-use o Data Catalogs generally do NOT perform Data
Quality Management directly
• Lineage o The story of data over time
• Dictionary o Detailed and technical metadata for integrations13
Data Governance• Collaboration
o Discussion boards, messaging, etc.
• Policies and Standards o The underlying rules framework to guide
Data Governance
• Workflow Facilitation o Performing approvals and other processes
• Compliance o Ensure adherence to regulatory or audit
mandates
• Data Lifecycle o Proactively guiding data acquisition,
transformation, use, and deprecation14
Reference
• Search o Rapid text-based lookups
• Taxonomy o Structured, relationship-driven reference
• Documentation Repository o A “more findable” library of user and
technical systems documentation
• Data Lake Navigation o Ease the difficulty of finding desired
information in data lakes15
Reality Check
• Doesn’t This All Sound Like a Bit Much? o What’s the viability of a do-nothing
option?
• Can our organizations support this much activity? o If not, what’s the alternative? o What are we doing now?
• Is This the Data-Equivalent of ERP? o Are Data Catalogs the future, or a repeat
of the past?
16
Implementing Data Catalogs
17
Architecture• Human Interface
o Help people relate to data structures and objects
• Qualitative Insights o Provides information to use data appropriately
• Catalogs Provide Some Analytics, but are More Metadata-Layer Focused o Despite the marketing, catalogs are advanced
metadata tools more so than they are data analytics tools
o They just happen to be frequently used by Data Scientists
18
Building Quantitative Culture• Engaging the Business
o If we build it…they won’t even notice o If we build it badly…then we will get the wrong kind of
attention o If others see their own likely success in using it…then
we have hope • Data Value is All that Matters
o Throw away the catalog if it doesn’t foster Data Value o If data is used without creating value, stop using the
data
• Encourage Change o Without change, data is simply added costs19
Implementation Strategy• Pick a Style
o Big Bang o Little Engine That Could o Grow Roots, then Tree
• How to Decide? o Data Value. It’s always Data Value.
• Who to Decide? o Bad English, important idea o The wrong “who” group will cause
problems
• How to Decide Who? o Right! Data Value! Great job! 20
Projects
• Installing a Data Catalog Is Pretty Easy o Security challenges o Technical challenges o Permissions and governance
considerations
• Data Catalog Implementations are Tricky o Rarely anybody’s full-time job o Stakeholders across the organization o Outcomes are NOT immediate o Sometimes our strategy isn’t perfect
21
Integration• Technical Integration
o Yes, do that. That’s not what really matters. o But also don’t mess it up.
• Integration With Business Processes o Data Catalogs must increase efficiency o People must
• know how to use • actually use • derive measurable benefit from the use
o Breakdowns in the above will ruin everything22
Catalog-Supported Governance• Data Governance is Usually a Hot Mess
• Why Do We Have Such Trouble? o Too process-focused o Or even worse, data-focused
(Really? Yes, really.) o We must be business-outcome focused
• Data Catalogs CAN Help o Capable, flexible, able to facilitate
workflows — A lot to like! o …but if our Data Governance is poor, we
will be amplifying noise23
Summary
• Data Catalogs are Worth Being Excited About o They unite a heckuva lot of things that often
get orphaned
• But Don’t Believe the Hype o Different Data Catalog providers will market
different approaches o And software companies don’t always fully
understand the applications of what they’ve built
o It’s your tool, and your business—you can do what you want with them both
24
Recall: If You Remember Nothing Else
• Data Catalogs are like Black Holes: o Massive, built from many pieces, and difficult to
put in a box
• Data Catalogs are a Tool, Not a Solution o They consolidate many data management
functions o Data Catalogs need strong process and people
• Data Value is the Most Important Thing o This fundamental Data Leadership principle
connects Data Catalogs to what matters most to the business
25
Thank You for Attending [email protected]
630-403-8348 algmin.com
AbstractData Governance is difficult. It takes sustained efforts coordinated across many stakeholders, all moving towards a shared vision. It’s a marathon! Too often, Data Governance efforts fall down in the last mile of that journey by failing to connect to the rest of the business in a meaningful way. This is not because what Data Governance does is unimportant. It is because the outcomes of Data Governance are often hard to find, even for the folks who know to look for them—and most people don’t! Enter the Data Catalog. This is where many of the artifacts of Data Governance activities can connect to the rest of the business. The Data Catalog provides a clearinghouse of insights about data, including definitions, lineage, owners, appropriate use, restrictions, and all of the technical metadata you can shake a stick at! This crucial information hub should become a well-integrated complement to the operations of your organization—and if it isn’t today, come to this tutorial to learn how to make it so! Key points include:
• Establishing a Data Catalog as the foundation of Data Governance efforts
• How a Data Catalog fits into your existing data architecture
• Which capabilities matter most in the early days
• Establishing a beachhead to build from
• Taking it to the streets: gaining broad adoption of your Data Catalog
• Creating long-term sustainability in the enhancement, support, and usage of your Data Catalog27
Appendix
28
Sustaining Long-Term
29
Developing a Business Case
• Recall Data Value o Any successful business case is going to need it
• Know How Your Company Makes Investments o Develop a pitch that will resonate
• Market and Sell It o Good ideas fail all the time o People are busy, distracted, and sometimes lazy
• A Lesson from the Underpants Gnomes: o https://www.youtube.com/watch?v=3zc4bGkU05o
30
Early Days• A Hard Push to Critical Mass
o This is not to say “quantity” over “quality”, but scale is necessary
o Load the information that will be used most o Do not try to have “perfection” or “completeness” o An MVP-like software development approach is
appropriate
• Track Usage or Don’t Even Bother o Important to build success stories in addition to
actual success o Without metrics, people will forget how helpful
the catalog is31
Playing Politics• Driving Change is Threatening to Some
o Some folks may resist, which is fine and expected o Encourage people to participate in ways that benefit
them
• Resist “Data Catalog as a Passive Reference Tool” o One more poorly-named data management concept o But…if your data scientists are demanding it as a
reference tool, and they have the ear of the money people, go with it
• Connect the Dots for People o Even with metrics, you need to sell the results
32
Remaining Relevant
• Easy to Peak Early and Crash o Sounds like Data Governance
• We Must Know How the Data Catalog Creates Data Value o This is the only way to avoid stagnation
• Quantify o The importance of this increases as
excitement wanes o Excitement WILL wane
33
Motivating Further Investment• Start with Proof-of-Concept Level Commitment
o Small asks, prove viability o A Data Catalog should be able to provide
positive value at any scale
• Ask: “What’s the Burning Platform?” o Just “trying our doggone best to be better with
data” is a lousy foundation o Something that will keep the business viable,
people out-of-jail, executives keeping their cushy jobs—these are the kinds of motivators to find
• Remember this General Rule: o Reducing Costs < Increasing Revenue < Risk
Management34