best practices for the data lake
TRANSCRIPT
Best Practices for the Data LakeWho is using it and how can you get the most out of it?
© Attunity, Hortonworks, and Teradata
We’ll discuss the research findings including:•The two biggest Data Lake adoption issues•Common use cases for Data Lakes•Rethinking how data is used across an organization
•Data Lake best practices and pitfalls•How business leaders gain C-level buy-in for Data Lake projects
Agenda
© Attunity, Hortonworks, and Teradata
Survey DemographicsRoles
IT/Business Manager CXO/ExecutiveAcademic
Company Size
Large Enterprise Midsize SMB
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
High-level findings• The Data Lake is increasingly
recognized within a data strategy• Clear early use cases exist for the
Data Lake• Governance and security are still
top of mind as challenges and success factors for the Data Lake
© Attunity, Hortonworks, and Teradata
Currently researching and learning about it
Actively involved with it
Heard of it, but don't know what it is
Have not heard of it
0% 10% 20% 30% 40% 50% 60%
What is your familiarity with the term “Data Lake”?
Source: Data Lake Adoption and Maturity Survey Findings Report
What is a Data Lake?A data lake is a collection of long term data containers that capture, refine, and explore any form of raw data at scale, enabled by low cost technologies, from which multiple downstream facilities may draw upon.
Data sources Downstream
Sensors email
TransactionsMachine logs
Geolocation
Media
BI Tools IDW
Data Marts Analysis
Apps OtherData LakeData Lake
C
Value from Data Lakes
• New insights from unknown or under appreciated data
• New forms of analytics
• Expanded corporate memory retention
• Data integration optimization
C
Data Manufacturing
DATA R&D
DATA LAKE DATA PRODUCTS
R
Data Manufacturing: Logical View of Workloads
DATA R&D• Goal: analytic agility, flexibility• Exploratory tools, algorithms, skills• Finding new high value questions• Light governance, no SLAs• Data scientists, data miners
DATA LAKE• Goal: original raw data at low cost• Refinery feeds data R&D, data
products• Medium governance, SLAs• Low business value density• Programmers and data scientists
DATA PRODUCTS• Goal: consumable analytic
results • Integrated, cleansed, + metadata• High governance, SLAs, cost• High business value density• Shared by many users, roles,
skills
R
AccessPreparation
Acquisition
Data Lake Architecture
Math and Stats
DataMining
BusinessIntelligence
Applications
Languages
Marketing
ANALYTIC TOOLS & APPS
USERS
MarketingExecutives
OperationalSystems
FrontlineWorkers
CustomersPartners
Engineers
DataScientists
BusinessAnalysts
Streams SearchAggregations
Security, Metadata/Lineage, Administration
Distributed Storage
Msg. queues Cleansing Access
ExperimentsGovernance Feeds
SOURCES
Sensors
Social
Telemetry
Mobile
Tabular Data
Machine logs
© Attunity, Hortonworks, and Teradata
Have an approved budget
Have submitted a budget
Still researching
Already have an initiative
0% 5% 10% 15% 20% 25% 30% 35% 40%
Do you have budget for a Data Lake initiative?
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
Data discovery/Data Science/Big Data
Real-time analytics/Operationalized insights
Decentralized data acquisition or staging for other systems
Offloading data from other systems
0% 10% 20% 30% 40% 50% 60% 70% 80%
What use cases are you primarily using Hadoop clusters for currently?
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
Governance
Metadata
Security
End user skills
Ingest
0% 10% 20% 30% 40% 50% 60% 70% 80%
What are the key challenges you have experienced in making the Data Lake concept a reality?
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
Lack of agreed upon definition and strategy
Budget-oriented challenge
Data integration challenge
Organizational challenges
Technology challenges
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
What are the obstacles for your company in achieving Data Lake goals?
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
• The Data Lake is a viable component of a data strategy
• Companies of all sizes are interested in Data Lakes
• Critical success factors – Hadoop skillsets, budget, and data integration – will persist as adoption increases
• Data Lake maturity will increase with additional use cases
Survey summary
© Attunity, Hortonworks, and Teradata
To get your own copy of the “Data Lake Adoption and Maturity Survey Findings Report”, click here.
Download the research today
Thanks!hortonworks.com
To view the recorded version of this webinar, click here.