nosql in a hadoop world: couchbase, hadoop, spark, kafka and more – couchbase live new york 2015

Post on 23-Jan-2018

1.085 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

NoSQL in a Hadoop world:Couchbase, Hadoop, Spark, Kafka

and moreWill Gardella | Director of Product Management,

Couchbase

©2015 Couchbase Inc. 2

Agenda

Intro – NoSQL, Couchbase, and what’s new in 4.0

Analytics & Data Integration

The Big Data Big Picture

Hadoop, Spark, Kafka, and Storm

©2015 Couchbase Inc. 3

Intro – NoSQL, Couchbase, and what’s new in 4.0

©2015 Couchbase Inc. 4

Where does “big” data come from?

©2015 Couchbase Inc. 5

Where does “big” data come from?

MobileWeb/Cloud Internet of Things

©2015 Couchbase Inc. 6

This is where Couchbase comes in…

NoSQL Data Management for a broad range of apps and use cases

High availability cache

Key-value store

Document database

Embedded database

Sync management

Couchbase Server Couchbase Lite CouchbaseSync Gateway

©2015 Couchbase Inc. 7

Couchbase meets today’s & tomorrow’s requirements

Flexible data model

Consistent performance at scale

High availability

Easy, affordable scalability

24x365

©2015 Couchbase Inc. 8

Oh, and Couchbase is efficient w/ Hardware…

http://googlecloudplatform.blogspot.com/2015/05/Couchbase-Server-Hits-One-Million-Writes-Per-Second-with-Just-50-Nodes-of-Google-Compute-Engine.html

1.1M writes/sec1/6 the hardware of NoSQL

competitor3 Billion records

©2015 Couchbase Inc. 9

Couchbase Server 4.0

9

Download now: www.couchbase.com/download

Multi Dimensional Scaling

Option to separate, isolate, and scale

querying, indexing, and data as independent

services

N1QLSQL for JSON

Powerful query language based on SQL

and global secondary indexes with support for JOINs and more

New Storage EngineForestDB

High performance storage engine

engineered for multi-core processors and

solid state drives

©2015 Couchbase Inc. 10

Analytics and Big Data Integration

Powered by N1QL

©2015 Couchbase Inc. 11

N1QL – Enterprise Tool / Application Ecosystem

ODBC / JDBC

App

CB Node

ODBC / JDBC

ETL

ODBC / JDBC

BI

ODBC / JDBC

Visualization

CB Node CB Node

Standards-based drivers

Integrations, partnerships

©2015 Couchbase Inc. 12

N1QL – Enterprise Tool / Application Ecosystem

4:35 PM - 5:20 PM

Introduction to BI with Couchbase Server Using Tableau, Informatica, Excel and More

Perry Krug, Couchbase

©2015 Couchbase Inc. 13

The Big Data Big Picture

What’s Hadoop got to do with it?

©2015 Couchbase Inc. 14

NoSQL versus Hadoop

NoSQL Hadoop NoSQL Hadoop

Overlap Compliment

NoSQL or Hadoop? NoSQL and Hadoop.

©2015 Couchbase Inc. 15

What is Hadoop?

Hadoop

HDFS

Map Reduce Hive …

Pro

cess

Sto

re

©2015 Couchbase Inc. 16

Spark

Fast and general engine for big data processing with libraries for advanced analytics

Spark Core:• task scheduling • memory management• fault recovery• interacting with storage systems

©2015 Couchbase Inc. 17

Big Data at a Glance

Couchbase Spark Hadoop (Hive)

Use cases • Operational• Web / Mobile

• Analytics• Machine

Learning

• Analytics• Machine

Learning

Processing mode • Online • Ad Hoc (New!)

• Streaming• Ad Hoc • Batch

• Batch• Ad Hoc

Low latency = < 1ms ops Seconds Minutes

Performance Highly predictable Variable Variable

Users are typically… Millions of customers 100’s of analysts 100’s of analysts

Memory-centric Memory-centric Disk-centric

Big data = 10s of Terabytes Petabytes(?) Petabytes

ANALYTICALOPERATIONAL

©2015 Couchbase Inc. 18

Why is Spark popular?

Compute engine for Hadoop & other platforms (e.g. SAP HANA)

Fast

– Claims 100x better than MapReduce when in-memory, 10x on disk

Sophisticated

– can run most advanced algorithms

Easy to develop

– Well designed APIs in Java, Scala, Python, now R

– Supports SQL, Dataframes, and many other formats

– Interactive shell

Unified Lambda architecture (mostly)

– Same code for streaming and batch

©2015 Couchbase Inc. 19

Couchbase: Full range of Connectors

©2015 Couchbase Inc. 20

Database Change Protocol (DCP)

Innovative protocol for data sync in Couchbase Server

– Increases data sync efficiency with massive data footprints

– Removes slower disk-IO from the data sync path

– Improves latencies to replication for data durability

Powers many critical functions

– Data replication

– XDCR (Cross Datacenter Replication) for HA / DR

– Maintains indexes

– Connectors

20

©2015 Couchbase Inc. 21

Hadoop, Spark, Kafka, and Storm

©2015 Couchbase Inc. 22

Lambda Architecture

1

4

5

DATA

SERVE

QUER

Y

New Data Stream

Analysis

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

BatchRecompute

Real-TimeIncrement

Batch Layer

Serving Layer

Speed Layer

2 BATCH

3 SPEED

©2015 Couchbase Inc. 23

Lambda Architecture

New Data Stream

Merged View

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

Partial Aggregate

Partial Aggregate

Partial Aggregate

Real-Time Data

BatchRecompute

Batch Views

Real-Time Views

Real-TimeIncrement

Merge

Batch Layer

Serving Layer

Speed Layer

©2015 Couchbase Inc. 24

Lambda + Couchbase

New Data Stream

Merged View

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

Partial Aggregate

Partial Aggregate

Partial Aggregate

Real-Time Data

BatchRecompute

Batch Views

Real-Time Views

Real-TimeIncrement

Merge

Batch Layer

Serving Layer

Speed Layer

©2015 Couchbase Inc. 25

Lamba + Hadoop + Spark + Storm

New Data Stream

Merged View

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

Partial Aggregate

Partial Aggregate

Partial Aggregate

Real-Time Data

BatchRecompute

Batch Views

Real-Time Views

Real-TimeIncrement

Merge

Batch Layer

Serving Layer

Speed Layer

Couchbase HadoopConnector (Sqoop)

©2015 Couchbase Inc. 26

Hadoop, Spark, and Storm

1:15 PM - 2:00 PMBank with Big Data - Data Science Use Cases in Finance with Hortonworks and CouchbaseVamsi Chemitiganti, Hortonworks

2:45 PM - 3:30 PM (Developer Track)Spark with Couchbase to Electrify Your Data ProcessingMichael Nitschinger, Couchbase

©2015 Couchbase Inc. 27

New: Couchbase Spark Connector

Available Now: Beta

Spark Core

– Create RDDs from Documents, Views and N1QL Queries

– Write RDDs into Couchbase

– Automatic cluster and resource management

Spark SQL – Data frames based on N1QL

27

Spark Streaming Persisting DStreams

Experimental support: use DCP feeds to create Spark streams

github.com/couchbaselabs/couchbase-spark-connector

©2015 Couchbase Inc. 28

Kafka

Data broker w/ publish / subscribe system

Massively scalable, well decoupled

Messages queued until the recipient can retrieve them

https://github.com/couchbase/couchbase-kafka-connector

©2015 Couchbase Inc. 29

Lamba + Hadoop + Spark + Storm + Kafka

New Data Stream

Merged View

All DataPrecompute

Views (Map Reduce)

Process Stream

Incremental Views

Partial Aggregate

Partial Aggregate

Partial Aggregate

Real-Time Data

BatchRecompute

Batch Views

Real-Time Views

Real-TimeIncrement

Merge

Batch Layer

Serving Layer

Speed Layer

Build Your App TODAY!Download Couchbase Server 4.0

atwww.couchbase.com/download

Thank you.will.gardella@couchbase.com

Twitter: @WillGardella

top related