best practices for the data lake

17
Best Practices for the Data Lake Who is using it and how can you get the most out of it?

Upload: attunity

Post on 13-Jan-2017

606 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Best Practices for the Data Lake

Best Practices for the Data LakeWho is using it and how can you get the most out of it?

Page 2: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

We’ll discuss the research findings including:•The two biggest Data Lake adoption issues•Common use cases for Data Lakes•Rethinking how data is used across an organization

•Data Lake best practices and pitfalls•How business leaders gain C-level buy-in for Data Lake projects

Agenda

Page 3: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

Survey DemographicsRoles

IT/Business Manager CXO/ExecutiveAcademic

Company Size

Large Enterprise Midsize SMB

Source: Data Lake Adoption and Maturity Survey Findings Report

Page 4: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

High-level findings• The Data Lake is increasingly

recognized within a data strategy• Clear early use cases exist for the

Data Lake• Governance and security are still

top of mind as challenges and success factors for the Data Lake

Page 5: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

Currently researching and learning about it

Actively involved with it

Heard of it, but don't know what it is

Have not heard of it

0% 10% 20% 30% 40% 50% 60%

What is your familiarity with the term “Data Lake”?

Source: Data Lake Adoption and Maturity Survey Findings Report

Page 6: Best Practices for the Data Lake

What is a Data Lake?A data lake is a collection of long term data containers that capture, refine, and explore any form of raw data at scale, enabled by low cost technologies, from which multiple downstream facilities may draw upon.

Data sources Downstream

Sensors email

TransactionsMachine logs

Geolocation

Media

BI Tools IDW

Data Marts Analysis

Apps OtherData LakeData Lake

C

Page 7: Best Practices for the Data Lake

Value from Data Lakes

• New insights from unknown or under appreciated data

• New forms of analytics

• Expanded corporate memory retention

• Data integration optimization

C

Page 8: Best Practices for the Data Lake

Data Manufacturing

DATA R&D

DATA LAKE DATA PRODUCTS

R

Page 9: Best Practices for the Data Lake

Data Manufacturing: Logical View of Workloads

DATA R&D• Goal: analytic agility, flexibility• Exploratory tools, algorithms, skills• Finding new high value questions• Light governance, no SLAs• Data scientists, data miners

DATA LAKE• Goal: original raw data at low cost• Refinery feeds data R&D, data

products• Medium governance, SLAs• Low business value density• Programmers and data scientists

DATA PRODUCTS• Goal: consumable analytic

results • Integrated, cleansed, + metadata• High governance, SLAs, cost• High business value density• Shared by many users, roles,

skills

R

Page 10: Best Practices for the Data Lake

AccessPreparation

Acquisition

Data Lake Architecture

Math and Stats

DataMining

BusinessIntelligence

Applications

Languages

Marketing

ANALYTIC TOOLS & APPS

USERS

MarketingExecutives

OperationalSystems

FrontlineWorkers

CustomersPartners

Engineers

DataScientists

BusinessAnalysts

Streams SearchAggregations

Security, Metadata/Lineage, Administration

Distributed Storage

Msg. queues Cleansing Access

ExperimentsGovernance Feeds

SOURCES

Sensors

email

Social

Telemetry

Mobile

Tabular Data

Machine logs

Page 11: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

Have an approved budget

Have submitted a budget

Still researching

Already have an initiative

0% 5% 10% 15% 20% 25% 30% 35% 40%

Do you have budget for a Data Lake initiative?

Source: Data Lake Adoption and Maturity Survey Findings Report

Page 12: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

Data discovery/Data Science/Big Data

Real-time analytics/Operationalized insights

Decentralized data acquisition or staging for other systems

Offloading data from other systems

0% 10% 20% 30% 40% 50% 60% 70% 80%

What use cases are you primarily using Hadoop clusters for currently?

Source: Data Lake Adoption and Maturity Survey Findings Report

Page 13: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

Governance

Metadata

Security

End user skills

Ingest

0% 10% 20% 30% 40% 50% 60% 70% 80%

What are the key challenges you have experienced in making the Data Lake concept a reality?

Source: Data Lake Adoption and Maturity Survey Findings Report

Page 14: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

Lack of agreed upon definition and strategy

Budget-oriented challenge

Data integration challenge

Organizational challenges

Technology challenges

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%

What are the obstacles for your company in achieving Data Lake goals?

Source: Data Lake Adoption and Maturity Survey Findings Report

Page 15: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

• The Data Lake is a viable component of a data strategy

• Companies of all sizes are interested in Data Lakes

• Critical success factors – Hadoop skillsets, budget, and data integration – will persist as adoption increases

• Data Lake maturity will increase with additional use cases

Survey summary

Page 16: Best Practices for the Data Lake

© Attunity, Hortonworks, and Teradata

To get your own copy of the “Data Lake Adoption and Maturity Survey Findings Report”, click here.

Download the research today

Page 17: Best Practices for the Data Lake

Thanks!hortonworks.com

To view the recorded version of this webinar, click here.