strata+hadoop ny 2015: use case examples of building applications on hadoop with cdap

25

Click here to load reader

Upload: cask-data-inc

Post on 13-Apr-2017

685 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP
Page 2: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Why Cask?

2

@jgrayla

Page 3: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

SIMPLE ACCESS TO POWERFUL TECHNOLOGY

Cask’s goal is to enable every developer and enterprise to

quickly and easily build and run modern data applications

using open source big data technologies like Hadoop

Page 4: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Introduction to Data Applications

Data Applications, also known as Operational Analytics,integrate analytics into applications in order to drive actionable intelligence, turning insights into action.

Apps that utilize data insights to enhance the

customer experience, achieve a business objective,

improve a business process orenable new products or lines of business.

Page 5: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

ENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS

Batch and Realtime Data Ingestion

Any type of data from anytype of source in any volume

Batch and Streaming ETLCode-free self-service creationand management of pipelines

SQL Exploration andData Science

All data is automaticallyaccessible via SQL and client SDKs

Data as a ServiceEasily expose generic or

custom REST APIs on any data

360o Customer ViewIntegrate data from any source

and expose through queries and APIs

Realtime DashboardsPerform realtime OLAP

aggregations and serve them through REST APIs

Time Series AnalysisStore, process and serve massive

volumes of time-series data

Realtime Log AnalyticsIngestion and processing of high-throughput streaming

log events

Recommendation EnginesBuild models in batch using

historical data and serve them in realtime

Anomaly Detection SystemsProcess streaming events and predictably compare them in

realtime to historical data

NRT Event MonitoringReliably monitor large streams of data and perform defined actions

within a specified time

Internet of ThingsIngestion, storage and processing of events that is highly-available,

scalable and consistent

ENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS

Batch and Realtime Data Ingestion

Any type of data from anytype of source in any volume

Batch and Streaming ETLCode-free self-service creationand management of pipelines

SQL Exploration andData Science

All data is automaticallyaccessible via SQL and client SDKs

Data as a ServiceEasily expose generic or

custom REST APIs on any data

360o Customer ViewIntegrate data from any source

and expose through queries and APIs

Realtime DashboardsPerform realtime OLAP

aggregations and serve them through REST APIs

Time Series AnalysisStore, process and serve massive

volumes of time-series data

Realtime Log AnalyticsIngestion and processing of high-throughput streaming

log events

Recommendation EnginesBuild models in batch using

historical data and serve them in realtime

Anomaly Detection SystemsProcess streaming events and predictably compare them in

realtime to historical data

NRT Event MonitoringReliably monitor large streams of data and perform defined actions

within a specified time

Internet of ThingsIngestion, storage and processing of events that is highly-available,

scalable and consistent

The Path to Data AppsENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS

Batch and Realtime Data Ingestion

Any type of data from anytype of source in any volume

Batch and Streaming ETLCode-free self-service creationand management of pipelines

SQL Exploration andData Science

All data is automaticallyaccessible via SQL and client SDKs

Data as a ServiceEasily expose generic or

custom REST APIs on any data

360o Customer ViewIntegrate data from any source

and expose through queries and APIs

Realtime DashboardsPerform realtime OLAP

aggregations and serve them through REST APIs

Time Series AnalysisStore, process and serve massive

volumes of time-series data

Realtime Log AnalyticsIngestion and processing of high-throughput streaming

log events

Recommendation EnginesBuild models in batch using

historical data and serve them in realtime

Anomaly Detection SystemsProcess streaming events and predictably compare them in

realtime to historical data

NRT Event MonitoringReliably monitor large streams of data and perform defined actions

within a specified time

Internet of ThingsIngestion, storage and processing of events that is highly-available,

scalable and consistent

Page 6: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Data App Customer Examples

E-Commerce Mature company with existing dependence on SaaS services and legacy apps, moving to a Data Lake and custom Apps

Enterprise Large global enterprise with

multiple Lakes and Apps attempting to centralize

platform and extend to LoB

SaaS Technically advanced company with product teams focused on

minimizing the time-to-market for Apps

Page 7: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Data App in SaaS

Deep Technical Talent with Product FocusReduce time-to-market

NRT Event Monitoring

High-throughput, real-time event ingestion

Consistent processing of events with persistence

Scalable and guaranteed NRT event processing

Page 8: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Data App in E-Commerce

Unfamiliar Technical Talent with Shift to DIYAddress skill gaps

Consumer Intelligence

Ingestion of multiple customer data sources

Real-time analytics and machine learning

Targeting and optimization for consumer web

Page 9: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Data App in the Enterprise

Highly variable talent with organizational gaps and executive pressure

Provide reference architecture and abstractions

Enterprise Data Hub

Centralized data storage & management in Data Lakes

Tools and APIs that extend data into lines of business

Multi-tenant PaaS for governance and accessibility

Page 10: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

Big Data Application Example

Page 11: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

Big Data Technology Explosion

Core HadoopHDFS, MR

2006

HBaseZooKeeper

Core Hadoop

2008

HivePig

MahoutHBase

ZooKeeperCore Hadoop

2009

SqoopWhirrAvroHivePig

MahoutHBase

ZookeeperCore Hadoop

2010

FlumeBigtopOozie

MRUnitHCatalog

SqoopWhirrAvroHivePig

MahoutHBase

ZookeeperCore Hadoop

2011

SparkImpala

SolrKafkaFlumeBigtopOozie

MRUnitHCatalog

SqoopWhirrAvroHivePig

MahoutHBase

ZookeeperCore Hadoop

2012

SentryTez

ParquetSentryRangerKnoxSparkYARNImpala

SolrKafkaFlumeBigtopOozie

MRUnitHCatalog

SqoopWhirrAvroHivePig

MahoutHBase

ZookeeperCore Hadoop

Present

Hadoop Alone = 47+ projects across 6 distros — Merv Adrian, Gartner Analyst

Page 12: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Data App Challenges

E-Commerce Emphasis and time spent

drastically skewed towardsops and integration logic

Difficult for existing Java devs to use APIs like HBase and build

apps without dev lifecycle tools

Significant time and effort to bring applications from

development into production

Enterprise Lack of standards and best practices leads to divergent

architectures across org

Gaps between IT and LoBmakes it difficult to cooperate and separate areas of concern

Multiple OSS projects and Hadoop distributions make ops

and governance a challenge

SaaS Real-time, scale-out data

ingestion into HDFS and HBase means building infrastructure

Difficult to provide guarantees such as high-availability

and data consistency

Tight coupling of ingestion pipeline and processing pipeline creates operational complexity

Page 13: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

The Power of Abstraction

Hadoop is a diverse collection ofopen source infrastructure projects,

the universe of open source infrastructurecontinues to expand and accelerate.

This stuff is already too hard

There is a dire need for abstraction, to provide standardization and encapsulation,

to enable accessibility, flexibility and reusability.

Page 14: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Analogy of Hadoop to RDBMSM

arke

t Si

ze

1970 1980 1990 2000 2010 2020 2030

BI Market

RDBMS Created

$30-$40bn

MIDDLEWARE & MANAGEMENT

INFRASTRUCTURE

APPLICATIONS

$$$

$

VALUE ACCRUAL OVER TIMEMIDDLEWARE & MANAGEMENT

INFRASTRUCTURE

APPLICATIONS

$$$

$

VALUE ACCRUAL OVER TIME

Page 15: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Analogy of Hadoop to RDBMSM

arke

t M

atur

ity

2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

HadoopCreated

Pioneers Adopt and extend Hadoop

Early Adopters Adopt

Fast Followers

Still early days…

Page 16: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

CASK DATA APPLICATION PLATFORM

Integrated Framework for Building and Running Data Applications on Hadoop

Integrates the LatestBig Data Technologies

Supports All MajorHadoop Distributions

Fully Open Sourceand Highly Extensible

Page 17: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL17

Key FeaturesCASK DATA APPLICATION PLATFORM

Infrastructure INTEGRATION

Provide an integrated product experience with out-of-the-box

capabilities

Architecture STANDARDS Define a reference

architecture to standardize support for mixed

infrastructure

Programming ABSTRACTIONS

Utilize abstraction layers to encapsulate complex

patterns and insulate developers

Production SERVICES

Provides development tools and runtime services to

enable productionapps and data

Page 18: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

• Application Container Architecture

• Reusable Programming Abstractions

• Global User and Machine Metadata

Applications

ProgramsMapReduce SparkTigon Workflow Service Worker

Metadata

DatasetsTable Avro ParquetTimeseries OLAP CubeGeospatial ObjectStore

Metadata

Metadata

Page 19: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Page 20: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Self-Service Ingestion and ETL for Hadoop Data Lakes

Built for Productionon CDAP

Rich Drag-and-DropUser Interface

Open Source &Highly Extensible

Page 21: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

DISCOVERdata using user and machine

generated metadata

INGESTany data from any source

in real-time and batch

BUILDdrag-and-drop ETL/ELT

pipelines that run on Hadoop

EGRESSany data to any destination

in real-time and batch

Page 22: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

PROPRIETARY & CONFIDENTIAL

Data Apps on CDAP

E-Commerce CDAP is an integrated platform that accelerates onboarding and

keeps focus on business logic

Dataset Patterns and App Templates provide higher-level,

domain-specific APIs

Tools and runtime services enable a true dev lifecycle and

faster time to production

Enterprise CDAP defines a set of standards

and embeds best practices to drive a common architecture

Tools, reusable abstractions and self-service interfaces enable

non-IT users to access Hadoop

CDAP provides portability across many versions of major distros and is going beyond Hadoop

SaaS CDAP Streams capability provides scale-out, highly

available real-time data ingest

Tephra transaction engine integrates with Datasets and

Programs for data consistency

Streams separate ingestion from processing and combine support

for real-time, batch and SQL

Page 23: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

Demo

Page 24: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

CDAP Community 100% Open Source (ASL2)

Website: http://cdap.io

Mailing List: [email protected] [email protected]

IRC: #cdap on freenode.net

CDAP Enterprise 100% Commercially Supported

Website: http://cask.co

Contact Sales: [email protected]

Contact Me: [email protected] or @jgrayla

Accelerate Your Big Data Journey

Tap In @ cask.co

Download Now or Learn More at cask.co

Page 25: Strata+Hadoop NY 2015: Use case examples of building applications on Hadoop with CDAP

Thank You!Jonathan Gray

[email protected] @jgrayla

Questions?