nosql in a hadoop world: couchbase, hadoop, spark, kafka and more – couchbase live new york 2015
TRANSCRIPT
NoSQL in a Hadoop world:Couchbase, Hadoop, Spark, Kafka
and moreWill Gardella | Director of Product Management,
Couchbase
©2015 Couchbase Inc. 2
Agenda
Intro – NoSQL, Couchbase, and what’s new in 4.0
Analytics & Data Integration
The Big Data Big Picture
Hadoop, Spark, Kafka, and Storm
©2015 Couchbase Inc. 3
Intro – NoSQL, Couchbase, and what’s new in 4.0
©2015 Couchbase Inc. 4
Where does “big” data come from?
©2015 Couchbase Inc. 5
Where does “big” data come from?
MobileWeb/Cloud Internet of Things
©2015 Couchbase Inc. 6
This is where Couchbase comes in…
NoSQL Data Management for a broad range of apps and use cases
High availability cache
Key-value store
Document database
Embedded database
Sync management
Couchbase Server Couchbase Lite CouchbaseSync Gateway
©2015 Couchbase Inc. 7
Couchbase meets today’s & tomorrow’s requirements
Flexible data model
Consistent performance at scale
High availability
Easy, affordable scalability
24x365
©2015 Couchbase Inc. 8
Oh, and Couchbase is efficient w/ Hardware…
http://googlecloudplatform.blogspot.com/2015/05/Couchbase-Server-Hits-One-Million-Writes-Per-Second-with-Just-50-Nodes-of-Google-Compute-Engine.html
1.1M writes/sec1/6 the hardware of NoSQL
competitor3 Billion records
©2015 Couchbase Inc. 9
Couchbase Server 4.0
9
Download now: www.couchbase.com/download
Multi Dimensional Scaling
Option to separate, isolate, and scale
querying, indexing, and data as independent
services
N1QLSQL for JSON
Powerful query language based on SQL
and global secondary indexes with support for JOINs and more
New Storage EngineForestDB
High performance storage engine
engineered for multi-core processors and
solid state drives
©2015 Couchbase Inc. 10
Analytics and Big Data Integration
Powered by N1QL
©2015 Couchbase Inc. 11
N1QL – Enterprise Tool / Application Ecosystem
ODBC / JDBC
App
CB Node
ODBC / JDBC
ETL
ODBC / JDBC
BI
ODBC / JDBC
Visualization
CB Node CB Node
Standards-based drivers
Integrations, partnerships
©2015 Couchbase Inc. 12
N1QL – Enterprise Tool / Application Ecosystem
4:35 PM - 5:20 PM
Introduction to BI with Couchbase Server Using Tableau, Informatica, Excel and More
Perry Krug, Couchbase
©2015 Couchbase Inc. 13
The Big Data Big Picture
What’s Hadoop got to do with it?
©2015 Couchbase Inc. 14
NoSQL versus Hadoop
NoSQL Hadoop NoSQL Hadoop
Overlap Compliment
NoSQL or Hadoop? NoSQL and Hadoop.
©2015 Couchbase Inc. 15
What is Hadoop?
Hadoop
HDFS
Map Reduce Hive …
Pro
cess
Sto
re
©2015 Couchbase Inc. 16
Spark
Fast and general engine for big data processing with libraries for advanced analytics
Spark Core:• task scheduling • memory management• fault recovery• interacting with storage systems
©2015 Couchbase Inc. 17
Big Data at a Glance
Couchbase Spark Hadoop (Hive)
Use cases • Operational• Web / Mobile
• Analytics• Machine
Learning
• Analytics• Machine
Learning
Processing mode • Online • Ad Hoc (New!)
• Streaming• Ad Hoc • Batch
• Batch• Ad Hoc
Low latency = < 1ms ops Seconds Minutes
Performance Highly predictable Variable Variable
Users are typically… Millions of customers 100’s of analysts 100’s of analysts
Memory-centric Memory-centric Disk-centric
Big data = 10s of Terabytes Petabytes(?) Petabytes
ANALYTICALOPERATIONAL
©2015 Couchbase Inc. 18
Why is Spark popular?
Compute engine for Hadoop & other platforms (e.g. SAP HANA)
Fast
– Claims 100x better than MapReduce when in-memory, 10x on disk
Sophisticated
– can run most advanced algorithms
Easy to develop
– Well designed APIs in Java, Scala, Python, now R
– Supports SQL, Dataframes, and many other formats
– Interactive shell
Unified Lambda architecture (mostly)
– Same code for streaming and batch
©2015 Couchbase Inc. 19
Couchbase: Full range of Connectors
©2015 Couchbase Inc. 20
Database Change Protocol (DCP)
Innovative protocol for data sync in Couchbase Server
– Increases data sync efficiency with massive data footprints
– Removes slower disk-IO from the data sync path
– Improves latencies to replication for data durability
Powers many critical functions
– Data replication
– XDCR (Cross Datacenter Replication) for HA / DR
– Maintains indexes
– Connectors
20
©2015 Couchbase Inc. 21
Hadoop, Spark, Kafka, and Storm
©2015 Couchbase Inc. 22
Lambda Architecture
1
4
5
DATA
SERVE
QUER
Y
New Data Stream
Analysis
All DataPrecompute
Views (Map Reduce)
Process Stream
Incremental Views
BatchRecompute
Real-TimeIncrement
Batch Layer
Serving Layer
Speed Layer
2 BATCH
3 SPEED
©2015 Couchbase Inc. 23
Lambda Architecture
New Data Stream
Merged View
All DataPrecompute
Views (Map Reduce)
Process Stream
Incremental Views
Partial Aggregate
Partial Aggregate
Partial Aggregate
Real-Time Data
BatchRecompute
Batch Views
Real-Time Views
Real-TimeIncrement
Merge
Batch Layer
Serving Layer
Speed Layer
©2015 Couchbase Inc. 24
Lambda + Couchbase
New Data Stream
Merged View
All DataPrecompute
Views (Map Reduce)
Process Stream
Incremental Views
Partial Aggregate
Partial Aggregate
Partial Aggregate
Real-Time Data
BatchRecompute
Batch Views
Real-Time Views
Real-TimeIncrement
Merge
Batch Layer
Serving Layer
Speed Layer
©2015 Couchbase Inc. 25
Lamba + Hadoop + Spark + Storm
New Data Stream
Merged View
All DataPrecompute
Views (Map Reduce)
Process Stream
Incremental Views
Partial Aggregate
Partial Aggregate
Partial Aggregate
Real-Time Data
BatchRecompute
Batch Views
Real-Time Views
Real-TimeIncrement
Merge
Batch Layer
Serving Layer
Speed Layer
Couchbase HadoopConnector (Sqoop)
©2015 Couchbase Inc. 26
Hadoop, Spark, and Storm
1:15 PM - 2:00 PMBank with Big Data - Data Science Use Cases in Finance with Hortonworks and CouchbaseVamsi Chemitiganti, Hortonworks
2:45 PM - 3:30 PM (Developer Track)Spark with Couchbase to Electrify Your Data ProcessingMichael Nitschinger, Couchbase
©2015 Couchbase Inc. 27
New: Couchbase Spark Connector
Available Now: Beta
Spark Core
– Create RDDs from Documents, Views and N1QL Queries
– Write RDDs into Couchbase
– Automatic cluster and resource management
Spark SQL – Data frames based on N1QL
27
Spark Streaming Persisting DStreams
Experimental support: use DCP feeds to create Spark streams
github.com/couchbaselabs/couchbase-spark-connector
©2015 Couchbase Inc. 28
Kafka
Data broker w/ publish / subscribe system
Massively scalable, well decoupled
Messages queued until the recipient can retrieve them
https://github.com/couchbase/couchbase-kafka-connector
©2015 Couchbase Inc. 29
Lamba + Hadoop + Spark + Storm + Kafka
New Data Stream
Merged View
All DataPrecompute
Views (Map Reduce)
Process Stream
Incremental Views
Partial Aggregate
Partial Aggregate
Partial Aggregate
Real-Time Data
BatchRecompute
Batch Views
Real-Time Views
Real-TimeIncrement
Merge
Batch Layer
Serving Layer
Speed Layer
Build Your App TODAY!Download Couchbase Server 4.0
atwww.couchbase.com/download
Thank [email protected]
Twitter: @WillGardella