linkedin links analysts to collaborate on analysis · 2016. 9. 13. · the analytics team at...
TRANSCRIPT
#TDPARTNERS16 #datacatalog GEORGIA WORLD CONGRESS CENTER
LinkedIn Links Analysts to Collaborate on AnalysisRohit JonnalagaddaBusiness Operations, LinkedIn
Stephanie McReynoldsVP of Marketing @Alation
A little about me & LinkedIn
2
• Investment banker turned data junkie• Work as part of a cross-functional team to support our
Marketing Solutions (advertising) business• Expert in manipulating data with SQL but goal is always
to deliver insights that actually drive business decisions
• Mix of numerous home-grown, open source, & procured products• Offline analytics starts with Kafka → Hadoop• Distributed team of users (“anyone” can learn SQL)
• Primary DW Environment• Used by hundreds of
data analysts across Finance, Operations, Product, HR, etc.
• (Most) ETL in Hadoop
• Built at LinkedIn• Hundreds of Billions of
Messages Routed Daily
• Petabytes of storage across the grid
• Writing > 75TB+ Daily• Spread across 3 DC’s• Thousands of Nodes• Hive, Presto, Spark
Supported by a robust environment
3
The analytics team at LinkedIn
4
• Different types of “data discovery” happen in different teams of analysts
• All analysts need access to data, but individual workflows can be quite different
• Data catalogs are a central point of reference for all data consumers
Executive Reporting
1000s of Data Consumers
10s
Business Ops100s
Ad Hoc Analysis1000s
• 3 data industry trends are driving Data Catalogs in the enterprise
• Challenges of linking analysts & data• How data cataloging helps• LinkedIn example
Linking Analysts to Collaborate
5
Trend #1: Data Proliferation
6
Data-driven organizations demand data proliferation:• All new products released
with new data structures• A new data set every
week• Deeper and wider data is
being produced than ever before
- Typical weblog has hundreds of attributes/columns
“Big” data’s challenge is human
7
Volume is not our challenge, the speed of analysis is• Impossible for any one analyst to keep up with the continual stream of
new data updates• Documentation is often light by design• Rough conclusions are easy, accurate insights are hard• Impossible for any one analyst to keep up with the continual stream of
new data updates
Remember: insights come from analysis, not from keeping up with the data
Trend #2: Data Discovery
8
36% of end-users now preparing their own data - Late-binding/discovery oriented style of analysis wins over predictable/ well structured BI queries
Source: TDWI Best Practices Report, Improving Data Preparation for Business Analytics, Q3 2016
What can be cataloged for re-use?
9
86% of organizations looking for re-use options to make data prep efficient – data catalogs help immensely with re-use & consistency
Source: TDWI Best Practices Report, Improving Data Preparation for Business Analytics, Q3 2016
Trend #3: Collaboration
10
Analysis has become a team sport
“According to data we have collected over the past two decades, the time spent by managers and employees in collaborative activities has ballooned by 50% or more.
Source: Harvard Business Review, Collaborative Overload, January 2016
“But Collaborative Overload is a Risk
11
Data on leaders across 20 organizations show that those regarded by colleagues as the best information sources & most desirable collaborators have the lowest career satisfaction.
Source: Harvard Business Review, Collaborative Overload, January 2016
Challenges of the new era of analysis
12
Data-driven orgs drive Data Proliferation• A new product, a new dataset• Every product launches new datasets• Unboxing process is often one of discovery without documentation
More ad-hoc analysis challenges Human Productivity & System Performance• Performance - Cost to trying a query out for the first time• Analysts & tools must be productive cross-system
Analysis is now a team sport where Collaborative Efficiency & Overload must be managed• Effective collaboration requires some organizing structure/documentation• Best analysts are overloaded/burnt out• New analysts take 6 months to learn LinkedIn data
Scenario: Onboarding new users
13
Scenario: New employee needs to learn about our vast data footprint• No single place to learn• Unlike “source code”, queries are decentralized and live
on a mix of desktops and servers• Difficult to discern “source of truth” when questions can
have multiple answers• Need to come up to speed quickly due to rapid growth
and constant product innovation
Step 1: Build an Inventory
14
• What data sources exist?• What data is available?• What do the columns mean?• Where does the data come from (ETL, lineage)?• What is sensitive/protected?• What promises do we make to our users about their private
data and how we can use it for advertising purposes?
Step 2: Enrich the Catalog
15
An inventory without a sense of usage is not very informative, need to know:
• Who used it• How was it used?• Why was that data helpful
Samples of common queries: What is the growth rate in Country X? How is our sales pipeline tracking for the quarter?What customers are at risk for churning?
Step 3: Support Human Adoption
16
• Training• Support• Adapting
Value of Alation for LinkedIn Analysts
17
Productivity:• Collaboration: Teams around the world can quickly share insights with
one anotherROI:• Teams are spending more time disseminating knowledge and less time
writing queries. This shortens product release cycles, drives faster deal closings, and increases overall productivity.
Benefits:• Onboarding has been greatly simplified as Alation has generated an
organic repository of up to date knowledge
Alation Delivers a GPS for Analysts
18
Data Catalog links data and analysts together for collaboration• Automates the inventory• Maintains a rich catalog based on actual analyst behaviors• Reinforces best practices
- SmartSuggest recommendations- Behavioral interventions for governance- Monitors wide & deep usage
Table Explorer
19
Popularity Indicators
20
Data Profiling
21
Lineage
22
Articles to Collaborate on Definitions
23
Data Catalogs address complexity
24
A platform for efficient & effective human collaboration • Proactive recommendations• Inline documentation• Details to navigate Data proliferation
- Table Explorer- Data Profiling- Interactive Query Editor- Lineage
Find out more about Data Catalogs
25
alation.com/resources• TDWI Best Practices Report, Improving Data Preparation for
Business Analytics, Q3 2016
Alation Booth #729
Thank You
Questions/CommentsEmail:
Join Us AtAlation Booth
Follow UsTwitter
Rate This Session # with the PARTNERS Mobile App
Remember To Share Your Virtual Passes
[email protected] & [email protected]
#729
739
26
@slangenfeld @alation