nosql in a hadoop world: couchbase, hadoop, spark, kafka and more – couchbase live new york 2015

31
NoSQL in a Hadoop world: Couchbase, Hadoop, Spark, Kafka and more Will Gardella | Director of Product Management, Couchbase

Upload: couchbase

Post on 23-Jan-2018

1.085 views

Category:

Software


0 download

TRANSCRIPT

Page 1: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

NoSQL in a Hadoop world:Couchbase, Hadoop, Spark, Kafka

and moreWill Gardella | Director of Product Management,

Couchbase

Page 2: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 2

Agenda

Intro – NoSQL, Couchbase, and what’s new in 4.0

Analytics & Data Integration

The Big Data Big Picture

Hadoop, Spark, Kafka, and Storm

Page 3: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 3

Intro – NoSQL, Couchbase, and what’s new in 4.0

Page 4: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 4

Where does “big” data come from?

Page 5: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 5

Where does “big” data come from?

MobileWeb/Cloud Internet of Things

Page 6: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 6

This is where Couchbase comes in…

NoSQL Data Management for a broad range of apps and use cases

High availability cache

Key-value store

Document database

Embedded database

Sync management

Couchbase Server Couchbase Lite CouchbaseSync Gateway

Page 7: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 7

Couchbase meets today’s & tomorrow’s requirements

Flexible data model

Consistent performance at scale

High availability

Easy, affordable scalability

24x365

Page 8: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 8

Oh, and Couchbase is efficient w/ Hardware…

http://googlecloudplatform.blogspot.com/2015/05/Couchbase-Server-Hits-One-Million-Writes-Per-Second-with-Just-50-Nodes-of-Google-Compute-Engine.html

1.1M writes/sec1/6 the hardware of NoSQL

competitor3 Billion records

Page 9: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 9

Couchbase Server 4.0

9

Download now: www.couchbase.com/download

Multi Dimensional Scaling

Option to separate, isolate, and scale

querying, indexing, and data as independent

services

N1QLSQL for JSON

Powerful query language based on SQL

and global secondary indexes with support for JOINs and more

New Storage EngineForestDB

High performance storage engine

engineered for multi-core processors and

solid state drives

Page 10: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 10

Analytics and Big Data Integration

Powered by N1QL

Page 11: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 11

N1QL – Enterprise Tool / Application Ecosystem

ODBC / JDBC

App

CB Node

ODBC / JDBC

ETL

ODBC / JDBC

BI

ODBC / JDBC

Visualization

CB Node CB Node

Standards-based drivers

Integrations, partnerships

Page 12: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 12

N1QL – Enterprise Tool / Application Ecosystem

4:35 PM - 5:20 PM

Introduction to BI with Couchbase Server Using Tableau, Informatica, Excel and More

Perry Krug, Couchbase

Page 13: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 13

The Big Data Big Picture

What’s Hadoop got to do with it?

Page 14: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 14

NoSQL versus Hadoop

NoSQL Hadoop NoSQL Hadoop

Overlap Compliment

NoSQL or Hadoop? NoSQL and Hadoop.

Page 15: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 15

What is Hadoop?

Hadoop

HDFS

Map Reduce Hive …

Pro

cess

Sto

re

Page 16: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 16

Spark

Fast and general engine for big data processing with libraries for advanced analytics

Spark Core:• task scheduling • memory management• fault recovery• interacting with storage systems

Page 17: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 17

Big Data at a Glance

Couchbase Spark Hadoop (Hive)

Use cases • Operational• Web / Mobile

• Analytics• Machine

Learning

• Analytics• Machine

Learning

Processing mode • Online • Ad Hoc (New!)

• Streaming• Ad Hoc • Batch

• Batch• Ad Hoc

Low latency = < 1ms ops Seconds Minutes

Performance Highly predictable Variable Variable

Users are typically… Millions of customers 100’s of analysts 100’s of analysts

Memory-centric Memory-centric Disk-centric

Big data = 10s of Terabytes Petabytes(?) Petabytes

ANALYTICALOPERATIONAL

Page 18: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 18

Why is Spark popular?

Compute engine for Hadoop & other platforms (e.g. SAP HANA)

Fast

– Claims 100x better than MapReduce when in-memory, 10x on disk

Sophisticated

– can run most advanced algorithms

Easy to develop

– Well designed APIs in Java, Scala, Python, now R

– Supports SQL, Dataframes, and many other formats

– Interactive shell

Unified Lambda architecture (mostly)

– Same code for streaming and batch

Page 19: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 19

Couchbase: Full range of Connectors

Page 20: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 20

Database Change Protocol (DCP)

Innovative protocol for data sync in Couchbase Server

– Increases data sync efficiency with massive data footprints

– Removes slower disk-IO from the data sync path

– Improves latencies to replication for data durability

Powers many critical functions

– Data replication

– XDCR (Cross Datacenter Replication) for HA / DR

– Maintains indexes

– Connectors

20

Page 21: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 21

Hadoop, Spark, Kafka, and Storm

Page 22: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 22

Lambda Architecture

1

4

5

DATA

SERVE

QUER

Y

New Data Stream

Analysis

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

BatchRecompute

Real-TimeIncrement

Batch Layer

Serving Layer

Speed Layer

2 BATCH

3 SPEED

Page 23: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 23

Lambda Architecture

New Data Stream

Merged View

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

Partial Aggregate

Partial Aggregate

Partial Aggregate

Real-Time Data

BatchRecompute

Batch Views

Real-Time Views

Real-TimeIncrement

Merge

Batch Layer

Serving Layer

Speed Layer

Page 24: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 24

Lambda + Couchbase

New Data Stream

Merged View

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

Partial Aggregate

Partial Aggregate

Partial Aggregate

Real-Time Data

BatchRecompute

Batch Views

Real-Time Views

Real-TimeIncrement

Merge

Batch Layer

Serving Layer

Speed Layer

Page 25: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 25

Lamba + Hadoop + Spark + Storm

New Data Stream

Merged View

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

Partial Aggregate

Partial Aggregate

Partial Aggregate

Real-Time Data

BatchRecompute

Batch Views

Real-Time Views

Real-TimeIncrement

Merge

Batch Layer

Serving Layer

Speed Layer

Couchbase HadoopConnector (Sqoop)

Page 26: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 26

Hadoop, Spark, and Storm

1:15 PM - 2:00 PMBank with Big Data - Data Science Use Cases in Finance with Hortonworks and CouchbaseVamsi Chemitiganti, Hortonworks

2:45 PM - 3:30 PM (Developer Track)Spark with Couchbase to Electrify Your Data ProcessingMichael Nitschinger, Couchbase

Page 27: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 27

New: Couchbase Spark Connector

Available Now: Beta

Spark Core

– Create RDDs from Documents, Views and N1QL Queries

– Write RDDs into Couchbase

– Automatic cluster and resource management

Spark SQL – Data frames based on N1QL

27

Spark Streaming Persisting DStreams

Experimental support: use DCP feeds to create Spark streams

github.com/couchbaselabs/couchbase-spark-connector

Page 28: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 28

Kafka

Data broker w/ publish / subscribe system

Massively scalable, well decoupled

Messages queued until the recipient can retrieve them

https://github.com/couchbase/couchbase-kafka-connector

Page 29: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

©2015 Couchbase Inc. 29

Lamba + Hadoop + Spark + Storm + Kafka

New Data Stream

Merged View

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

Partial Aggregate

Partial Aggregate

Partial Aggregate

Real-Time Data

BatchRecompute

Batch Views

Real-Time Views

Real-TimeIncrement

Merge

Batch Layer

Serving Layer

Speed Layer

Page 30: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

Build Your App TODAY!Download Couchbase Server 4.0

atwww.couchbase.com/download

Page 31: NoSQL in a Hadoop World: Couchbase, Hadoop, Spark, Kafka and More – Couchbase Live New York 2015

Thank [email protected]

Twitter: @WillGardella