big data with nosql, hadoop, spark, and kafka – couchbase connect 2016

32
©2016 Couchbase Inc. 1 The Couchbase Connect16 mobile app Take our in-app survey!

Upload: couchbase

Post on 15-Feb-2017

191 views

Category:

Software


1 download

TRANSCRIPT

©2016CouchbaseInc. 1

The Couchbase Connect16mobile appTake our in-app survey!

©2016CouchbaseInc.

Big Data with NoSQL, Hadoop, Spark and Kafka

Will Gardella, Director of Product Management

2

©2016CouchbaseInc. 3

Will GardellaDirector of Product [email protected]

@WillGardella

IMAGE GOES HERE

©2016CouchbaseInc.©2015CouchbaseInc. 4

4

©2016CouchbaseInc.©2016CouchbaseInc.

Agenda

• The Big Data Big Picture

• Spark & Hadoop

• Kafka

• Couchcbase Analytics (Sneak Peek)

5

©2016CouchbaseInc.©2016CouchbaseInc.

Where does “big” data come from?

6

MobileWeb/Cloud Internet of Things

©2016CouchbaseInc.©2015CouchbaseInc. 7

COUCHBASECONFIDENTIAL

Couchbase is addressing the requirements of Digital Economy businesses

©2016CouchbaseInc. 8

Spark & Hadoop

©2016CouchbaseInc.©2016CouchbaseInc.

NoSQL versus Hadoop

NoSQL Hadoop NoSQL Hadoop

Overlap Compliment

NoSQL or Hadoop? NoSQL and Hadoop.

©2016CouchbaseInc.©2016CouchbaseInc.

Big Data at a Glance

Couchbase Spark Hadoop

Use cases • Operational• Web / Mobile

• Analytics• Machine Learning

• Analytics• Machine Learning

Processing mode• Online • Ad Hoc

• Ad Hoc • Batch• Streaming (+/-)

• Batch• Ad Hoc (+/-)

Low latency = < 1 ms ops Seconds Seconds

Performance Highly predictable Variable Variable

Users are typically…Millions of customers 100’s of analysts or

data scientists100’s of analysts or data scientists

Memory-centric Memory-centric Disk-centric

Big data = 10s of Terabytes Petabytes Petabytes

ANALYTICALOPERATIONAL

©2016CouchbaseInc.©2016CouchbaseInc.

Couchbase + Spark use cases

11

Operations Analysis

§ Recommendations§ Next gen data warehousing§ Predictive analytics§ Fraud detection

§ Catalog § Customer 360 + IOT§ Personalization§ Mobile applications

©2016CouchbaseInc.©2016CouchbaseInc.

Use Case 1: Operationalize Analytics / ML

Examples: recommend content and products, spot fraud or spam• Data scientists train machine learning models

• Load results into Couchbase so end users can interact with them online

Hadoop

Machine Learning Models

Data Warehouse

Historical Data

©2016CouchbaseInc.©2016CouchbaseInc.

Use Case 2: Spark connects to everything

13

DCPKVN1QLViews

Adapted from: Databricks – Not Your Father’s Database https://www.brighttalk.com/webcast/12891/196891

©2016CouchbaseInc.©2016CouchbaseInc.

Lambda Architecture

1

4

5

DATA

SERVE

QUERY

New Data Stream Analysis

All Data Precompute Views (Map Reduce)

Process Stream

Incremental Views

BatchRecompute

Real-TimeIncrement

Batch Layer

Serving Layer

Speed Layer

2 BATCH

3 SPEED

©2016CouchbaseInc.©2016CouchbaseInc.

Lambda Architecture

1

4

5

DATA

SERVE

QUERY

New Data Stream Analysis

All Data Precompute Views (Map Reduce)

Process Stream

Incremental Views

BatchRecompute

Real-TimeIncrement

Batch Layer

Serving Layer

Speed Layer

2 BATCH

3 SPEED

©2016CouchbaseInc.©2016CouchbaseInc.

Database Change Protocol (DCP)

• Innovative protocol for data sync in Couchbase Server• Efficient data sync, memory to memory• Removes slower disk-IO from the data sync path• Improves latencies to replication for data durability

• Powers data replication & XDCR for HA / DR, maintains indexes, and more

• Big data connectors use this as a fast sync mechanism

16

©2016CouchbaseInc.©2016CouchbaseInc.

Couchbase Spark Connector 2.0

• Spark 2 support• Structured streaming• New Databricks cloud analytics support

• Efficiency• Improved DCP handling memory allocation creates less garbage

• Easier management• Tolerates Couchbase cluster topology changes (e.g. add nodes & rebalance)• …except rollbacks

17

©2016CouchbaseInc.©2016CouchbaseInc.

Couchbase Spark 2.0 Connector

Features• Automatic cluster & resource management• Create RDDs from KV, N1QL, Views• Create DStreams from DCP feeds• Persist RDDs and DStreams• Support SparkSQL, Datasets, DataFrames, and Structured Streaming

©2016CouchbaseInc. 19

Kafka

©2016CouchbaseInc.©2015CouchbaseInc. 20

You might need Kafka if…

Photo Credit: Cory Doctorow https://www.flickr.com/photos/doctorow/14638938602

©2016CouchbaseInc.©2016CouchbaseInc.

Kafka as an industrial data sharing “backbone”

• Before Kafka After Kafka

©2016CouchbaseInc.©2016CouchbaseInc.

Couchbase & Kafka Use Cases

• Couchbase as the Master Database• Changes in the bucket update data elsewhere

• Triggers / Event Handling• Handle events like deletions / expirations

externally • E.g. expiration & replicated session tokens

• Real-time Data Integration• Extract from Couchbase, transform and load data

into another system

• Real-time Data Processing• Extract from a bucket, process in real-time and

load back to another Couchbase bucket

©2016CouchbaseInc.©2016CouchbaseInc.

Couchbase Kafka Connector 3.0 (DP now – GA Q4 2016)

Available Now: 2.0 GA

• Kafka Producer or Consumer

• Stream events

• Filters

• Transform events

• Sample Producer & Consumer

• Improved DCP – less garbage collection, more memory efficient

23

Code:https://github.com/couchbase/couchbase-kafka-connector/

3.0 (DP now - GA Q4 2016)§ Adopts Kafka Connect (Apache Kafka 0.9+)§ Dynamic topology support / rebalance

Future§ Rollback handling

©2016CouchbaseInc.©2015CouchbaseInc. 24

NewinApacheKafka0.9

• One service to manage

• Unified connector config, control, monitoring, metrics

• Easy to set up as a self-service system for developers, ETL team

• Confluent dashboards visualize the complete data pipeline

©2016CouchbaseInc.©2016CouchbaseInc.

Lamba + Hadoop + Spark + Storm + Kafka

New Data Stream Merged View

All Data Precompute Views (Map Reduce)

Process Stream

Incremental Views

BatchRecompute

Real-TimeIncrement

Merge

Batch Layer

Serving Layer

Speed Layer

©2016CouchbaseInc. 26

Analytics

©2016CouchbaseInc.©2016CouchbaseInc.

Sneak Peek: Couchbase Analytics (DP1)

27

One stop shopping for both operations and analytics

Couchbase QueryOptimized for operational (narrow) queries

Many queries Each touches a little data

Couchbase Analytics

Fewer queries Each touches a lot of data

Optimized for analytical (big) queries

©2016CouchbaseInc. 28

What is Couchbase Analytics?

• Extend Couchbase Platform to power real-time analytics

• Ad-hoc queries (“Ask me anything!”)

• Workload isolation

• Independent scaling

• Common programming model & data model

• Unified management

• Fast data synchronization

Data Query Index Search AnalyticsTransport

Unified Administration

Unified Declarative Programming Interface

©2016CouchbaseInc.©2016CouchbaseInc.

Operations Analytics

Couchbase Analytics and friends

BatchOnline“Hurry! The user is waiting!” “Better cache this in Couchbase…”

Key Value CB Query CB Analytics Spark Hadoop

𝜇s ms 30s Minutes+

1 record Trillions of records

Start up overhead

Job-based

Parallel query

©2016CouchbaseInc.

Thank You!

30

©2016CouchbaseInc. 31

The Couchbase Connect16mobile appTake our in-app survey!

©2016CouchbaseInc. 32

Share your opinion on Couchbase1. Go here: http://gtnr.it/2eRxYWn

2. Create a profile

3. Provide feedback (~15 minutes)