couchdb

CouchDB

King Chung HuangInformation Technologies

University of Calgary

Today’s TalkDocument-oriented Databases

CouchDB Overview

Demonstrations

Document-oriented Databases

Databases

Databases

FlatHierarchical

NetworkRelational

DatabasesDimensional

Object

Post-Relational

Document-oriented

Document-oriented Databases• Comparable to documents in the real world

• Records are stored as schema-less documents■ Each document is uniquely named■ Documents are the primary unit of storage

• Structures are not explicitly defined■ No tables with uniform, pre-defined fields■ Every document can have varying fields of different types

• Documents are self contained■ Data is not decomposed into tables with relations■ Documents contain the context needed to understand them

Document-oriented Databases• Examples

■ Lotus Notes■ Amazon SimpleDB■ CouchDB

• Key-Value Stores■ Amazon S3

■ Dynamo: Amazon’s Highly Available Key-value Store, DeCandia, et al., 2007■ Facebook Cassandra

■ Recently accepted as an Apache incubation project■ Google BigTable

■ Bigtable: A Distributed Storage System for Structured Data, Chang, et al., 2006

CouchDB Overview

What is CouchDB?

Document database server

REST API

JSON documents

Views with MapReduce

Highly Scalable

Document Database Server• Implemented in Erlang

■ Ericsson Language■ Highly concurrent, functional programming language

• Designed with modern web applications in mind

• Atomic Consistent Isolated Durable (ACID)

• “Crash-only” design

• Supports external handlers■ Change notification■ Custom processing

•

REST HTTP API• Representational State Transfer

■ A set of principles about how resources are defined and addressed

• World Wide Web (HTTP) is RESTful■ Uniform interface for accessing resources■ Resources identified by URI■ Actions transmitted in HTTP methods■ Status communicated in status codes

REST HTTP APICRUD

• Create, Read, Update, and Delete• In HTTP

■ POST /some/resource/id■ GET /some/resource/id■ PUT /some/resource/id■ DELETE /some/resource/id

JSON Documents• JavaScript Object Notation

■ Considered language-independent

• CouchDB stored XML documents before version 0.8■ Suitable if content is already in XML■ Human readable, but can be onerous to type■ Markup language, requires transformation from/to data structures

• Represents primitive data types and structures■ Strings, numbers, booleans■ Arrays, dictionaries■ Null

• Documents can have attachments

JSON DocumentsExample{

_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,

body: “Once upon a time…”, is_published: true}

JSON DocumentsExample{

_id: “post1”, _rev: “123456”, … _attachments: { “picture.png”: {

stub: true, content_type: “image/png”, length: 384 } }

}

Views• Used to sort and filter through data

• Lazily evaluated, highly efficient■ Similar to indexing in relational databases

• Defined in design documents■ Documents named _design/…

• Consist of map and reduce functions■ Language independent■ JavaScript supported by default

■ Mozilla Spidermonkey included

Data Processing with MapReduce• Programming model for processing and generating large data sets

• Related, but not equivalent to map and reduce operations infunctional languages

• Take and produce key/value pairs with map and reduce functions

• Map functions■ Take input key/value pairs and produce an intermediate set of key/value pairs

• Reduce functions■ Take intermediate key and set of values for the key, and merges them into a

possibly smaller set of values

• MapReduce: Simplified Data Processing on Large ClustersJeff Dean, Sanjay Ghemawat, Google Inc.

Data Processing with MapReduceExample{



Data Processing with MapReduceExample“post1” = {



Data Processing with MapReduceExample“post1” = {

title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”}

Data Processing with MapReduceEmit Posts by post_date“post1” = {


1239910768 = {


Data Processing with MapReduceEmit Posts by post_date

1208456184 {title: “A bloody long time ago”, …}

1215421546 {title: “A blue moon ago”, …}

1222654641 {title: “Just Yesterday”, …}

1239910768 {title: “A Blog Post”, …}

1246816518 {title: “That was Then”, …}

1251687980 {title: “This is Now”, …}

1264836981 {title: “When Will Then Be Now?”, …}

Data Processing with MapReduceEmit Posts by tag“post1” = {


“blue” = { title: “A Blog Post”, … }

“glue” = { title: “A Blog Post”, … }

Data Processing with MapReduceEmit Posts by tag

blue {title: “Just Yesterday”, …}

blue {title: “A Blog Post”, …}

clue {title: “Just Yesterday”, …}

flue {title: “When Will Then Be Now?”, …}

flue {title: “This is Now”, …}

glue {title: “A Blog Post”, …}

wazoo {title: “That was Then”, …}

Data Processing with MapReduceEmit Posts by tag, Reduced

blue{title: “Just Yesterday”, …},{title: “A Blog Post”, …}

clue {title: “Just Yesterday”, …}

flue{title: “When Will Then Be Now?”, …},{title: “This is Now”, …}

glue {title: “A Blog Post”, …}

wazoo {title: “That was Then”, …}

Scalability• Incremental MapReduce

• Multiversion Concurrency Control (MVCC)■ Achieves serializability through multiversioning instead of locking■ Eliminates waits to access objects■ Updates create new documents■ Tradeoff point: no waits, increased data storage

• Incremental Distributed Replication

• Eventual Consistency■ Changes eventually propagate through distributed systems■ Tradeoff point: increase availability and tolerancy, decreased freshness

Demonstrations

couchdb

Technology

blog post

date post1

http post someresourceid

design documents documents

json documents example

mapreduce example post1

simplified data processing

structured data