developinginnosqlwith couchbase*files.meetup.com/1775479/chicagobigdata.pdf · •architect and...

Developing in NoSQL with Couchbase

Raghavan “Rags” Srinivas Developer Advocate

Simple. Fast. Elastic.

•  Architect and Evangelist working with developers •  Speaker at JavaOne, RSA conferences, Sun Tech Days, JUGs and other

developer conferences •  Taught undergrad and grad courses •  Technology Evangelist at Sun Microsystems for 10+ years •  Still trying to understand and work more effectively on Java and distributed

systems •  Couchbase Developer Advocate working with Java and Ruby developers •  Philosophy: “Better to have an unanswered question than a unquestioned

answer”

Speaker Introduction

>  Introduction and Getting Started with the APIs >  Demonstration: Topology changes and xDCR >  Hadoop Sqoop connector and use cases >  Secondary Indexing with Query and Views >  Roadmap and conclusions

Agenda

A BIT OF NOSQL

Intro. Demos. Hadoop Connector

Views Summary

NoSQL (from a Couchbase perspective)

Short for Not Only SQL Not single class of database (i.e., NoSQL is not analogous to the RDMBS

classification) Generally refers to databases that neither expose a SQL interface nor use the

relational data model >  Perhaps better thought of as “non-relational databases” Typically don’t offer common RDBMS features for performance gains >  Often schema-less >  No key constraints >  No multi-step transactions >  No concept of a “join” Well suited for cloud deployments

The CAP theorem states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

CAP Theorem/Brewer’s Conjecture

•  Consistency means that each node/client always has the same view of the data,

•  Availability means that all clients can always read and write,

•  Partition tolerance means that the system works well across physical network partitions.

CAP Systems

CA: Consistency and Availability >  Consider a system that uses a two phase commit for distributed transactions >  The commit is impeded if a network segment is unavailable CP: Consistency and Partition Tolerance >  Consider a system that uses pessimistic locking for concurrency control >  Node failures hinder availability AP: Availability and Partition Tolerance >  Consider a system that always responds (e.g., DNS) >  Reads might be dirty as data awaits propagation

NoSQL Taxonomies

NoSQL Databases Come in Many Flavors* >  XML (myXMLDB, Tamino, Sedna) >  Tabular (Hbase, Big Table) >  Key/Value (Couchbase, Redis, Riak, Cassandra) >  Object (db4o, JADE) >  Graph (Trinity, neo4j, InfoGrid) >  Document store (Couchbase Server 2.0, CouchDB, MongoDB)

* These are loose taxonomies

Web Application Architecture and Performance

Application Scales Out Just add more commodity web servers

Database Scales Up Get a bigger, more complex server

www.facebook.com/animalparty

Web Servers

RelaConal Database

Load Balancer

-‐ Expensive and disrupCve sharding -‐ Doesn’t perform at Web Scale

Couchbase data layer scales like application logic tierData layer now scales with linear cost and constant performance.

Application Scales Out Just add more commodity web servers

Database Scales Out Just add more commodity data servers

Scaling out flattens the cost and performance curves.

Couchbase Servers

www.facebook.com/animalparty

Web Servers Load Balancer

Horizontally scalable, schema-‐less, auto-‐sharding, high-‐performance at Web Scale

WHAT IS COUCHBASE?


Views Summary

Couchbase Server Overview

Open source, Apache 2.0 licensed Database Commercial support provided by Couchbase Free community edition for developers Enterprise edition free for 2 nodes, support license required for larger clusters Administered via web console or RESTful API Available for Windows, Linux and Mac (development only) Written in C/C++ with some components in Erlang Both a distributed key/value store and a document oriented database >  Implements Memcached API (version 1.8) >  Implements most of CouchDB RESTful view API (version 2.0)

Membase

CouchDB

Couchbase memcached

1,000s of community deployments More than 150 paying customers Over 3,500 nodes deployed in production by customers

Rapidly Growing Community

Customers with Production Deployments…

Pictionary Like Game

OMGPOP’s DrawSomething Went Viral

#1 Paid & Free App in 84 Countries 35m Downloads

Couchbase Server 1.8 Architecture

Heartbeat

Process m

onito

r

Glob

al singleton supe

rviso

r

Confi

guraCo

n manager

on each node

Rebalance orchestrator

Nod

e he

alth m

onito

r

one per cluster

vBucket state and

replicaC

on m

anager

hOp RE

ST m

anagem

ent A

PI/W

eb UI

HTTP 8091

Erlang port mapper 4369

Distributed Erlang 21100 -‐ 21199

Erlang/OTP

Cluster Manager

Persistence Layer

storage interface

Memcached

Couchbase EP Engine

11210 Memcapable 2.0

Moxi


Data Manager

Component Architecture

Hea

rtbea

t

Pro

cess

mon

itor

Con

figur

atio

n M

anag

er

Glob

al singleton supe

rviso

r

on each node

Rebalance orchestrator

Nod

e he

alth m

onito

r

one per cluster

vBucket state and

replicaC

on m

anager

http

RES

T M

anag

emen

t /W

ebU

I

HTTP 8091

Erlang port mapper 4369

Distributed Erlang 21100 -‐ 21199

Erlang/OTP

Persistence Layer

storage interface

Memcached

Couchbase EP Engine

Moxi



A Distributed Hash Table Document Store

Application

set(key, json) get(key) returns json

DData

Cluster

Database

{ !“id": ”brewery_Legacy”,!“type” : “brewery”,!"name" : "Legacy Brewing”,!“address": "525 Canal Street Reading", "updated": "2010-07-22 20:00:20", }!

Couchbase Distributed Key/Value Store Features

All documents/items are stored and retrieved using standard hashtable operations (i.e, get, add, etc.)

Keys are distributed evenly across the nodes of a cluster by way of vBuckets >  A vBucket is a level of indirection between a key and value, and the node on

which it is stored Constant time access to items via keys Single dimension hashtable with no secondary keys

Couchbase Document Store Features

Items can be stored as JSON documents Non-JSON items are stored as binary attachments to CouchDB JSON documents Secondary, B-tree indexes are created by way of views >  Views are accessed via RESTful API >  Indexes are incrementally updated after access – not insert >  Caller specifies whether to accept stale data Views may be queried by key or range Indexes are stored across the cluster and accumulated from all nodes when

requested

Workflow (including replication)

User action results in the need to change the VALUE of KEY

Application updates key’s VALUE, performs SET operation

Couchbase driver hashes KEY, identifies KEY’s master server SET request sent over

network to master server

Couchbase replicates KEY-VALUE pair, caches it in memory and stores it to disk

1

2

3 4

5

GETTING STARTED


Views Summary

•  Operating systems supported

•  RedHat – RPM

•  Ubuntu – dpkg

•  Windows

•  Mac OS

•  GUI Installer

•  Unattended installation

•  Quickstart once installed (invokes the Admin Console) •  http://<hostname>:8091

Installation of the Server for development

COUCHBASE CLIENT INTERFACE


Views Summary

Client Setup: Getting Cluster Configuration

26

Server Node

Couchbase Client

http://myserver:8091/

{ … "bucketCapabilities": [ "touch", "sync", "couchapi" ], "bucketCapabilitiesVer": "sync-1.0", "bucketType": ”couchbase", "name": "default", "nodeLocator": "vbucket", "nodes": [ ….

Cluster Configuration over REST

Server Node

Server Node

Server Node

Client at Runtime: Adding a node

27

Couchbase Client

Server Node

Server Node

Server Node

Server Node

Server Node

Cluster ��Topology��Update

New node coming online

Connect to the Cluster URI of any cluster node

Opening a Connection (Java)

List<URI> uris = new LinkedList<URI>(); uris.add(URI.create("http://127.0.0.1:8091/pools")); try { client = new CouchbaseClient(uris, "default", ""); } catch (Exception e) { System.err.println("Error connecting to Couchbase: " + e.getMessage()); System.exit(0); }

PROTOCOL OVERVIEW

Intro. Demos Hadoop Connector

Views Summary

Store Operations

Operation Description add() Adds new value if key does not exist or returns error. replace() Replaces existing value if specified key already exists. set() Sets new value for specified key.

TTL allow for ‘expiry’ time specified in seconds:

Expiry Value Description Values < 30*24*60*60 Time in seconds to expiry Values > 30*24*60*60 Absolute time from epoch for expiry

Retrieve Operations

Operation Description get() Get a value. getAndTouch() Get a value and update the expiry time. getBulk() Get multiple values simultaneously, more efficient. gets() Get a value and the CAS value.

ASYNCHRONOUS OPERATIONS

Intro. Advanced opertations

Doc. Design Views Summary

•  Synchronous commands –  Force wait on application until return from server

•  Asynchronous commands –  Allow background operation until response available –  Operations return Future object

Couchbase Java Client Asynchronous Functions

Set Operation is asynchronous // Do an asynchronous set!OperationFuture<Boolean> setOp = ! client.set(KEY, EXP_TIME, VALUE);!!// Do something not related to set …!!// Check to see if our set succeeded!// We block when we call the get()!if (setOp.get().booleanValue()) {! System.out.println("Set Succeeded");!} else {! System.err.println("Set failed: " + ! setOp.getStatus().getMessage());!}!

It’s Asynchronous

Compare and Swap Operations >  Often referred to as “CAS” >  Optimistic concurrency control >  Available with many mutation operations, depending on client.

Get with Lock >  Often referred to as “GETL” >  Pessimistic concurrency control >  Locks have a short TTL >  Locks released with CAS operations >  Useful when working with object graphs

Distributed System Design: Concurrency Controls

35

Actor 1 Actor 2

Couchbase Server

CAS mismatch Success

A

B

F

C D

E

CAS Example CASValue<Object> casv = client.gets(KEY);! Thread.sleep(random.nextInt(1000)); // a random workload! ! // Wake up and do a set based on the previous CAS value! Future<CASResponse> setOp =! client.asyncCAS(KEY, casv.getCas(), random.nextLong());!! // Wait for the CAS Response! try {! if (setOp.get().equals(CASResponse.OK)) {! System.out.println("Set Succeeded");! } else {! System.err.println("Set failed: ");! }! }! …!

CAS Operation

CONCURRENCY


Views Summary

KEYS AND RELATIONSHIPS


Views Summary

Couchbase >  Distributed Key, Value pairs >  De-normalized Data >  Fast access Session store Hadoop >  Unstructured/semi-structured data >  Analytics and frequent data analysis >  Leverage the Hadoop Ecosystem (Pig, Hive, Hbase, etc.)

Utilizing the strengths of the Platform

Ad Targeting >  Decisions have to be made in milliseconds based on historical trends, location,

user preferences, etc. Virtual Reality >  Sub-second response time based on data available from number of different

sources Gaming applications >  Responsive applications that integrate with data in the back end

Use cases

•  Easily Import/Export Data into Hadoop

•  Generate Datatypes for use in MapReduce Applications

•  Integrate with Pig, Hive and Hbase

•  Easily export Data from Hadoop

Sqoop HDFS Hive HBase

Hadoop Pig

Introducing Sqoop

COUCHBASE PLUGIN

•  Based on the Couchbase Tap Interface •  Allows importing and exporting of entire database key mutations

Couchbase HDFS

1.  Data imported via Tap mechanism 2. Hadoop

Processing

3. Data exported back to Couchbase

Couchbase Plugin

$ sqoop import –-connect http://localhost:8091/pools --table DUMP

$ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5

$ sqoop export --connect http://localhost:8091/pools

--table DUMP –export-dir DUMP

For Imports, table must be: >  DUMP: All keys currently in Couchbase >  BACKFILL_n: All key mutations for n minutes Specified –username maps to bucket >  By default set to “default” bucket

Couchbase Import and Export

Couchbase >  System of Record >  Provides fast access to users and being able to record events Hadoop >  Analytics >  Generates a count of the # of clicks

Ad Targeting Application

Couchbase HDFS

1.  User/Event Data is imported to HDFS

3. Consolidated data Is used to target ads.

2. Event data is consolidated with Map/Reduce

Importing to Hadoop

Counting the clicks

Analyze using Pig

Counting the ads

Analyze and Consolidate

THE VIEW AND QUERY API


Views Summary

Identical feature set to Couchbase Server 1.8 >  Cache-layer >  High-performance >  Cluster >  Elastic >  Core interface Adds >  Document based storage (JSON) >  Views with materialized indexes >  Querying

Couchbase Server 2.0

JSON (JavaScript Object Notation) >  Lightweight Data-interchange format >  Easy for humans to read and manipulate >  Easy for machines to parse and generate (many parsers are available) >  JSON is language independent (although it uses similar constructs)

Why use JSON?

•  Views create perspectives on a collection of documents •  Views can cover a few different use cases

–  Simple secondary indexes (the most common) –  Aggregation functions

•  Example: count the number of North American Ales

–  Organizing related data •  Use Map/Reduce

–  Map defines the relationship between fields in documents and output table –  Reduce provides method for collating/summarizing

•  Views materialized indexes –  Views are not create until accessed –  Data writes are fast (no index) –  Index updates all changes since last update

What Are Views

•  Extract fields from JSON documents and produce an index of the selected information

View Processing

map function creates a mapping between input data and output reduce function provides a summary (such as count, sum, etc.)

View Creation using Incremental Map/Reduce

•  Map outputs one or more rows of data •  Map outputs:

>  Document ID >  View Key (user configurable) >  View Value

•  View Key Controls >  SorCng >  Querying

•  Key + Value stored in the Index •  Maps provide query base

Map Func:ons

A map function to get breweries by province function (doc) { if (doc.type == "brewery" && doc.province) { emit(doc.province, doc.name); } }

Map/Reduce – Map Function

The conceptual output of the map function is

[ {"key":"Connecticut", "value":"Cottrell Brewing"}, {"key":"Connecticut","value":"Thomas Hooker Brewing Co."}, {"key":"Massachusetts","value":"Harpoon Brewing Company"}, {"key":"New York","value":"Brewery Ommegang"}, {"key":"Vermont","value":"Long Trail Brewing Company"} ]

Map/Reduce – Map Results

A reduce function to count breweries by province >  Or simply use the build in _count function function (keys, values) { return values.length; } The conceptual input to the reduce function is: [ {"keys":["Connecticut", "Connecticut"], "values": ["Cottrell Brewing", "Thomas Hooker Brewing Co."]}, {"keys":["Massachusetts"],"values":["Harpoon Brewing Co."]}, {"keys":["New York"],"values":["Brewery Ommegang"]}, {"keys":["Vermont"],"values":["Long Trail Brewing Company“]} ]

Map/Reduce – Reduce Function

The conceptual result of the reduce is {"key":"Connecticut","value":2}, {"key":"Massachusetts","value":1}, {"key":"New York","value":1}, {"key":"Vermont","value":1}

Map/Reduce – Reduce Results

THE QUERY API


Views Summary

Query APIs

// map function!function (doc) {! if (doc.type == "beer") {! emit(doc._id, null);! }!}!!// Java code!Query query = new Query();!!query.setReduce(false);!query.setIncludeDocs(true);!query.setStale(Stale.FALSE);!!

THE VIEW API


Views Summary

View APIs with Java

// map function!function (doc) {! if (doc.type == "beer") {! emit(doc._id, null);! }!}!!// Java code!View view = client.getView("beers", "beers");!!ViewResponse result = client.query(view, query);!!Iterator<ViewRow> itr = result.iterator();! !while (itr.hasNext()) {! row = itr.next();! doc = (String) row.getDocument();! // do something!}!!!

USING THE QUERY API FOR CALCULATION

Intro. Advanced opertations

Doc. Design Views Summary

Querying with Java – Custom Reduce

// map function!function(doc) {! if (doc.type == "beer") {! if(doc.abv) {! emit(doc.abv, 1);! }}}!!//reduce function!_count!!// Java code!View view = client.getView("beers", "beers_count_abv");!query.setGroup(true);!!while (itr.hasNext()) {! row = itr.next();! System.out.println(String.format("%s: %s”, ! row.getKey(), row.getValue()));!}!

View Calculation Result

10: 2!9.6: 5!9.1: 1!9: 3!8.7: 2!8.5: 1!8: 2!7.5: 4!7: 4!6.7: 1!6.6: 1!6.2: 1!6: 2!5.9: 1!5.6: 1!5.2: 1!5: 1!4.8: 2!4.5: 1!4: 2!

RESOURCES AND SUMMARY


Views Summary

Server 1.8 Compatible >  1.0.1/2.8.0 >  1.0.2/2.8.0 (Memcache Node imbalance) >  1.0.x/2.8.0 (Replica Read) Server 2.0 Compatible >  1.1-dp/2.8.1 (Views) >  1.1-dp2/2.8.x (Observe) >  1.1/2.8.x (Final) Other Features >  Spring Integration >  Your feedback?

!

Java Client Library Roadmap

Couchbase Server Downloads >  http://www.couchbase.com/downloads-all >  http://www.couchbase.com/couchbase-server/overview Views >  http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views.html Developing with Client libraries >  http://www.couchbase.com/develop/java/current Couchbase Java Client Library wiki – tips and tricks >  http://www.couchbase.com/wiki/display/couchbase/Couchbase+Java+Client

+Library Open Beer Database >  http://openbeerdb.com Couchbase Internals >  http://horicky.blogspot.com

Resources and Call For Action

THANKS -‐ Q&A

73

Raghavan “Rags” Srinivas, Couchbase Inc. ([email protected], @ragss)

developing*in*nosql*with* couchbase*files.meetup.com/1775479/chicagobigdata.pdf · •architect and...

Documents

developinginnosqlwith couchbase*files.meetup.com/1775479/chicagobigdata.pdf · •architect and...