developing*in*nosql*with* couchbase*files.meetup.com/1775479/chicagobigdata.pdf · •architect and...
TRANSCRIPT
Developing in NoSQL with Couchbase
Raghavan “Rags” Srinivas Developer Advocate
Simple. Fast. Elastic.
• Architect and Evangelist working with developers • Speaker at JavaOne, RSA conferences, Sun Tech Days, JUGs and other
developer conferences • Taught undergrad and grad courses • Technology Evangelist at Sun Microsystems for 10+ years • Still trying to understand and work more effectively on Java and distributed
systems • Couchbase Developer Advocate working with Java and Ruby developers • Philosophy: “Better to have an unanswered question than a unquestioned
answer”
Speaker Introduction
> Introduction and Getting Started with the APIs > Demonstration: Topology changes and xDCR > Hadoop Sqoop connector and use cases > Secondary Indexing with Query and Views > Roadmap and conclusions
Agenda
A BIT OF NOSQL
Intro. Demos. Hadoop Connector
Views Summary
NoSQL (from a Couchbase perspective)
Short for Not Only SQL Not single class of database (i.e., NoSQL is not analogous to the RDMBS
classification) Generally refers to databases that neither expose a SQL interface nor use the
relational data model > Perhaps better thought of as “non-relational databases” Typically don’t offer common RDBMS features for performance gains > Often schema-less > No key constraints > No multi-step transactions > No concept of a “join” Well suited for cloud deployments
The CAP theorem states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
CAP Theorem/Brewer’s Conjecture
• Consistency means that each node/client always has the same view of the data,
• Availability means that all clients can always read and write,
• Partition tolerance means that the system works well across physical network partitions.
CAP Systems
CA: Consistency and Availability > Consider a system that uses a two phase commit for distributed transactions > The commit is impeded if a network segment is unavailable CP: Consistency and Partition Tolerance > Consider a system that uses pessimistic locking for concurrency control > Node failures hinder availability AP: Availability and Partition Tolerance > Consider a system that always responds (e.g., DNS) > Reads might be dirty as data awaits propagation
NoSQL Taxonomies
NoSQL Databases Come in Many Flavors* > XML (myXMLDB, Tamino, Sedna) > Tabular (Hbase, Big Table) > Key/Value (Couchbase, Redis, Riak, Cassandra) > Object (db4o, JADE) > Graph (Trinity, neo4j, InfoGrid) > Document store (Couchbase Server 2.0, CouchDB, MongoDB)
* These are loose taxonomies
Web Application Architecture and Performance
Application Scales Out Just add more commodity web servers
Database Scales Up Get a bigger, more complex server
www.facebook.com/animalparty
Web Servers
RelaConal Database
Load Balancer
-‐ Expensive and disrupCve sharding -‐ Doesn’t perform at Web Scale
Couchbase data layer scales like application logic tierData layer now scales with linear cost and constant performance.
Application Scales Out Just add more commodity web servers
Database Scales Out Just add more commodity data servers
Scaling out flattens the cost and performance curves.
Couchbase Servers
www.facebook.com/animalparty
Web Servers Load Balancer
Horizontally scalable, schema-‐less, auto-‐sharding, high-‐performance at Web Scale
WHAT IS COUCHBASE?
Intro. Demos. Hadoop Connector
Views Summary
Couchbase Server Overview
Open source, Apache 2.0 licensed Database Commercial support provided by Couchbase Free community edition for developers Enterprise edition free for 2 nodes, support license required for larger clusters Administered via web console or RESTful API Available for Windows, Linux and Mac (development only) Written in C/C++ with some components in Erlang Both a distributed key/value store and a document oriented database > Implements Memcached API (version 1.8) > Implements most of CouchDB RESTful view API (version 2.0)
Membase
CouchDB
Couchbase memcached
1,000s of community deployments More than 150 paying customers Over 3,500 nodes deployed in production by customers
Rapidly Growing Community
Customers with Production Deployments…
Pictionary Like Game
OMGPOP’s DrawSomething Went Viral
#1 Paid & Free App in 84 Countries 35m Downloads
Couchbase Server 1.8 Architecture
Heartbeat
Process m
onito
r
Glob
al singleton supe
rviso
r
Confi
guraCo
n manager
on each node
Rebalance orchestrator
Nod
e he
alth m
onito
r
one per cluster
vBucket state and
replicaC
on m
anager
hOp RE
ST m
anagem
ent A
PI/W
eb UI
HTTP 8091
Erlang port mapper 4369
Distributed Erlang 21100 -‐ 21199
Erlang/OTP
Cluster Manager
Persistence Layer
storage interface
Memcached
Couchbase EP Engine
11210 Memcapable 2.0
Moxi
11211 Memcapable 1.0
Data Manager
Component Architecture
Hea
rtbea
t
Pro
cess
mon
itor
Con
figur
atio
n M
anag
er
Glob
al singleton supe
rviso
r
on each node
Rebalance orchestrator
Nod
e he
alth m
onito
r
one per cluster
vBucket state and
replicaC
on m
anager
http
RES
T M
anag
emen
t /W
ebU
I
HTTP 8091
Erlang port mapper 4369
Distributed Erlang 21100 -‐ 21199
Erlang/OTP
Persistence Layer
storage interface
Memcached
Couchbase EP Engine
Moxi
11210 Memcapable 2.0
11211 Memcapable 1.0
A Distributed Hash Table Document Store
Application
set(key, json) get(key) returns json
DData
Cluster
Database
{ !“id": ”brewery_Legacy”,!“type” : “brewery”,!"name" : "Legacy Brewing”,!“address": "525 Canal Street Reading", "updated": "2010-07-22 20:00:20", }!
Couchbase Distributed Key/Value Store Features
All documents/items are stored and retrieved using standard hashtable operations (i.e, get, add, etc.)
Keys are distributed evenly across the nodes of a cluster by way of vBuckets > A vBucket is a level of indirection between a key and value, and the node on
which it is stored Constant time access to items via keys Single dimension hashtable with no secondary keys
Couchbase Document Store Features
Items can be stored as JSON documents Non-JSON items are stored as binary attachments to CouchDB JSON documents Secondary, B-tree indexes are created by way of views > Views are accessed via RESTful API > Indexes are incrementally updated after access – not insert > Caller specifies whether to accept stale data Views may be queried by key or range Indexes are stored across the cluster and accumulated from all nodes when
requested
Workflow (including replication)
User action results in the need to change the VALUE of KEY
Application updates key’s VALUE, performs SET operation
Couchbase driver hashes KEY, identifies KEY’s master server SET request sent over
network to master server
Couchbase replicates KEY-VALUE pair, caches it in memory and stores it to disk
1
2
3 4
5
GETTING STARTED
Intro. Demos. Hadoop Connector
Views Summary
• Operating systems supported
• RedHat – RPM
• Ubuntu – dpkg
• Windows
• Mac OS
• GUI Installer
• Unattended installation
• Quickstart once installed (invokes the Admin Console) • http://<hostname>:8091
Installation of the Server for development
COUCHBASE CLIENT INTERFACE
Intro. Demos. Hadoop Connector
Views Summary
Client Setup: Getting Cluster Configuration
26
Server Node
Couchbase Client
http://myserver:8091/
{ … "bucketCapabilities": [ "touch", "sync", "couchapi" ], "bucketCapabilitiesVer": "sync-1.0", "bucketType": ”couchbase", "name": "default", "nodeLocator": "vbucket", "nodes": [ ….
Cluster Configuration over REST
Server Node
Server Node
Server Node
Client at Runtime: Adding a node
27
Couchbase Client
Server Node
Server Node
Server Node
Server Node
Server Node
Cluster ���Topology���Update
New node coming online
Connect to the Cluster URI of any cluster node
Opening a Connection (Java)
List<URI> uris = new LinkedList<URI>(); uris.add(URI.create("http://127.0.0.1:8091/pools")); try { client = new CouchbaseClient(uris, "default", ""); } catch (Exception e) { System.err.println("Error connecting to Couchbase: " + e.getMessage()); System.exit(0); }
PROTOCOL OVERVIEW
Intro. Demos Hadoop Connector
Views Summary
Store Operations
Operation Description add() Adds new value if key does not exist or returns error. replace() Replaces existing value if specified key already exists. set() Sets new value for specified key.
TTL allow for ‘expiry’ time specified in seconds:
Expiry Value Description Values < 30*24*60*60 Time in seconds to expiry Values > 30*24*60*60 Absolute time from epoch for expiry
Retrieve Operations
Operation Description get() Get a value. getAndTouch() Get a value and update the expiry time. getBulk() Get multiple values simultaneously, more efficient. gets() Get a value and the CAS value.
ASYNCHRONOUS OPERATIONS
Intro. Advanced opertations
Doc. Design Views Summary
• Synchronous commands – Force wait on application until return from server
• Asynchronous commands – Allow background operation until response available – Operations return Future object
Couchbase Java Client Asynchronous Functions
Set Operation is asynchronous // Do an asynchronous set!OperationFuture<Boolean> setOp = ! client.set(KEY, EXP_TIME, VALUE);!!// Do something not related to set …!!// Check to see if our set succeeded!// We block when we call the get()!if (setOp.get().booleanValue()) {! System.out.println("Set Succeeded");!} else {! System.err.println("Set failed: " + ! setOp.getStatus().getMessage());!}!
It’s Asynchronous
Compare and Swap Operations > Often referred to as “CAS” > Optimistic concurrency control > Available with many mutation operations, depending on client.
Get with Lock > Often referred to as “GETL” > Pessimistic concurrency control > Locks have a short TTL > Locks released with CAS operations > Useful when working with object graphs
Distributed System Design: Concurrency Controls
35
Actor 1 Actor 2
Couchbase Server
CAS mismatch Success
A
B
F
C D
E
CAS Example CASValue<Object> casv = client.gets(KEY);! Thread.sleep(random.nextInt(1000)); // a random workload! ! // Wake up and do a set based on the previous CAS value! Future<CASResponse> setOp =! client.asyncCAS(KEY, casv.getCas(), random.nextLong());!! // Wait for the CAS Response! try {! if (setOp.get().equals(CASResponse.OK)) {! System.out.println("Set Succeeded");! } else {! System.err.println("Set failed: ");! }! }! …!
CAS Operation
CONCURRENCY
Intro. Demos Hadoop Connector
Views Summary
KEYS AND RELATIONSHIPS
Intro. Demos Hadoop Connector
Views Summary
Couchbase > Distributed Key, Value pairs > De-normalized Data > Fast access Session store Hadoop > Unstructured/semi-structured data > Analytics and frequent data analysis > Leverage the Hadoop Ecosystem (Pig, Hive, Hbase, etc.)
Utilizing the strengths of the Platform
Ad Targeting > Decisions have to be made in milliseconds based on historical trends, location,
user preferences, etc. Virtual Reality > Sub-second response time based on data available from number of different
sources Gaming applications > Responsive applications that integrate with data in the back end
Use cases
SQOOP
• Easily Import/Export Data into Hadoop
• Generate Datatypes for use in MapReduce Applications
• Integrate with Pig, Hive and Hbase
• Easily export Data from Hadoop
Sqoop HDFS Hive HBase
Hadoop Pig
Introducing Sqoop
COUCHBASE PLUGIN
• Based on the Couchbase Tap Interface • Allows importing and exporting of entire database key mutations
Couchbase HDFS
1. Data imported via Tap mechanism 2. Hadoop
Processing
3. Data exported back to Couchbase
Couchbase Plugin
$ sqoop import –-connect http://localhost:8091/pools --table DUMP
$ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5
$ sqoop export --connect http://localhost:8091/pools
--table DUMP –export-dir DUMP
For Imports, table must be: > DUMP: All keys currently in Couchbase > BACKFILL_n: All key mutations for n minutes Specified –username maps to bucket > By default set to “default” bucket
Couchbase Import and Export
Couchbase > System of Record > Provides fast access to users and being able to record events Hadoop > Analytics > Generates a count of the # of clicks
Ad Targeting Application
Couchbase HDFS
1. User/Event Data is imported to HDFS
3. Consolidated data Is used to target ads.
2. Event data is consolidated with Map/Reduce
Importing to Hadoop
Counting the clicks
Analyze using Pig
Counting the ads
Analyze and Consolidate
THE VIEW AND QUERY API
Intro. Demos Hadoop Connector
Views Summary
Identical feature set to Couchbase Server 1.8 > Cache-layer > High-performance > Cluster > Elastic > Core interface Adds > Document based storage (JSON) > Views with materialized indexes > Querying
Couchbase Server 2.0
JSON (JavaScript Object Notation) > Lightweight Data-interchange format > Easy for humans to read and manipulate > Easy for machines to parse and generate (many parsers are available) > JSON is language independent (although it uses similar constructs)
Why use JSON?
• Views create perspectives on a collection of documents • Views can cover a few different use cases
– Simple secondary indexes (the most common) – Aggregation functions
• Example: count the number of North American Ales
– Organizing related data • Use Map/Reduce
– Map defines the relationship between fields in documents and output table – Reduce provides method for collating/summarizing
• Views materialized indexes – Views are not create until accessed – Data writes are fast (no index) – Index updates all changes since last update
What Are Views
• Extract fields from JSON documents and produce an index of the selected information
View Processing
map function creates a mapping between input data and output reduce function provides a summary (such as count, sum, etc.)
View Creation using Incremental Map/Reduce
• Map outputs one or more rows of data • Map outputs:
> Document ID > View Key (user configurable) > View Value
• View Key Controls > SorCng > Querying
• Key + Value stored in the Index • Maps provide query base
Map Func:ons
A map function to get breweries by province function (doc) { if (doc.type == "brewery" && doc.province) { emit(doc.province, doc.name); } }
Map/Reduce – Map Function
The conceptual output of the map function is
[ {"key":"Connecticut", "value":"Cottrell Brewing"}, {"key":"Connecticut","value":"Thomas Hooker Brewing Co."}, {"key":"Massachusetts","value":"Harpoon Brewing Company"}, {"key":"New York","value":"Brewery Ommegang"}, {"key":"Vermont","value":"Long Trail Brewing Company"} ]
Map/Reduce – Map Results
A reduce function to count breweries by province > Or simply use the build in _count function function (keys, values) { return values.length; } The conceptual input to the reduce function is: [ {"keys":["Connecticut", "Connecticut"], "values": ["Cottrell Brewing", "Thomas Hooker Brewing Co."]}, {"keys":["Massachusetts"],"values":["Harpoon Brewing Co."]}, {"keys":["New York"],"values":["Brewery Ommegang"]}, {"keys":["Vermont"],"values":["Long Trail Brewing Company“]} ]
Map/Reduce – Reduce Function
The conceptual result of the reduce is {"key":"Connecticut","value":2}, {"key":"Massachusetts","value":1}, {"key":"New York","value":1}, {"key":"Vermont","value":1}
Map/Reduce – Reduce Results
THE QUERY API
Intro. Demos Hadoop Connector
Views Summary
Query APIs
// map function!function (doc) {! if (doc.type == "beer") {! emit(doc._id, null);! }!}!!// Java code!Query query = new Query();!!query.setReduce(false);!query.setIncludeDocs(true);!query.setStale(Stale.FALSE);!!
THE VIEW API
Intro. Demos Hadoop Connector
Views Summary
View APIs with Java
// map function!function (doc) {! if (doc.type == "beer") {! emit(doc._id, null);! }!}!!// Java code!View view = client.getView("beers", "beers");!!ViewResponse result = client.query(view, query);!!Iterator<ViewRow> itr = result.iterator();! !while (itr.hasNext()) {! row = itr.next();! doc = (String) row.getDocument();! // do something!}!!!
USING THE QUERY API FOR CALCULATION
Intro. Advanced opertations
Doc. Design Views Summary
Querying with Java – Custom Reduce
// map function!function(doc) {! if (doc.type == "beer") {! if(doc.abv) {! emit(doc.abv, 1);! }}}!!//reduce function!_count!!// Java code!View view = client.getView("beers", "beers_count_abv");!query.setGroup(true);!!while (itr.hasNext()) {! row = itr.next();! System.out.println(String.format("%s: %s”, ! row.getKey(), row.getValue()));!}!
View Calculation Result
10: 2!9.6: 5!9.1: 1!9: 3!8.7: 2!8.5: 1!8: 2!7.5: 4!7: 4!6.7: 1!6.6: 1!6.2: 1!6: 2!5.9: 1!5.6: 1!5.2: 1!5: 1!4.8: 2!4.5: 1!4: 2!
RESOURCES AND SUMMARY
Intro. Demos Hadoop Connector
Views Summary
Server 1.8 Compatible > 1.0.1/2.8.0 > 1.0.2/2.8.0 (Memcache Node imbalance) > 1.0.x/2.8.0 (Replica Read) Server 2.0 Compatible > 1.1-dp/2.8.1 (Views) > 1.1-dp2/2.8.x (Observe) > 1.1/2.8.x (Final) Other Features > Spring Integration > Your feedback?
!
Java Client Library Roadmap
Couchbase Server Downloads > http://www.couchbase.com/downloads-all > http://www.couchbase.com/couchbase-server/overview Views > http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views.html Developing with Client libraries > http://www.couchbase.com/develop/java/current Couchbase Java Client Library wiki – tips and tricks > http://www.couchbase.com/wiki/display/couchbase/Couchbase+Java+Client
+Library Open Beer Database > http://openbeerdb.com Couchbase Internals > http://horicky.blogspot.com
Resources and Call For Action