couchdb

32
CouchDB King Chung Huang Information Technologies University of Calgary

Upload: king-huang

Post on 17-May-2015

1.219 views

Category:

Technology


0 download

DESCRIPTION

An overview of CouchDB. Originally presented internally at University of Calgary IT.

TRANSCRIPT

Page 1: CouchDB

CouchDB

King Chung HuangInformation Technologies

University of Calgary

Page 2: CouchDB

Relax

Page 3: CouchDB

Today’s TalkDocument-oriented Databases

CouchDB Overview

Demonstrations

Page 4: CouchDB

Document-oriented Databases

Page 5: CouchDB

Databases

Page 6: CouchDB

Databases

FlatHierarchical

NetworkRelational

Page 7: CouchDB

DatabasesDimensional

Object

Post-Relational

Document-oriented

Page 8: CouchDB

Document-oriented Databases• Comparable to documents in the real world

• Records are stored as schema-less documents■ Each document is uniquely named■ Documents are the primary unit of storage

• Structures are not explicitly defined■ No tables with uniform, pre-defined fields■ Every document can have varying fields of different types

• Documents are self contained■ Data is not decomposed into tables with relations■ Documents contain the context needed to understand them

Page 9: CouchDB

Document-oriented Databases• Examples

■ Lotus Notes■ Amazon SimpleDB■ CouchDB

• Key-Value Stores■ Amazon S3

■ Dynamo: Amazon’s Highly Available Key-value Store, DeCandia, et al., 2007■ Facebook Cassandra

■ Recently accepted as an Apache incubation project■ Google BigTable

■ Bigtable: A Distributed Storage System for Structured Data, Chang, et al., 2006

Page 10: CouchDB

CouchDB Overview

Page 11: CouchDB

What is CouchDB?

Document database server

REST API

JSON documents

Views with MapReduce

Highly Scalable

Page 12: CouchDB

Document Database Server• Implemented in Erlang

■ Ericsson Language■ Highly concurrent, functional programming language

• Designed with modern web applications in mind

• Atomic Consistent Isolated Durable (ACID)

• “Crash-only” design

• Supports external handlers■ Change notification■ Custom processing

Page 13: CouchDB

REST HTTP API• Representational State Transfer

■ A set of principles about how resources are defined and addressed

• World Wide Web (HTTP) is RESTful■ Uniform interface for accessing resources■ Resources identified by URI■ Actions transmitted in HTTP methods■ Status communicated in status codes

Page 14: CouchDB

REST HTTP APICRUD

• Create, Read, Update, and Delete• In HTTP

■ POST /some/resource/id■ GET /some/resource/id■ PUT /some/resource/id■ DELETE /some/resource/id

Page 15: CouchDB

JSON Documents• JavaScript Object Notation

■ Considered language-independent

• CouchDB stored XML documents before version 0.8■ Suitable if content is already in XML■ Human readable, but can be onerous to type■ Markup language, requires transformation from/to data structures

• Represents primitive data types and structures■ Strings, numbers, booleans■ Arrays, dictionaries■ Null

• Documents can have attachments

Page 16: CouchDB

JSON DocumentsExample{

_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,

body: “Once upon a time…”, is_published: true}

Page 17: CouchDB

JSON DocumentsExample{

_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,

body: “Once upon a time…”, is_published: true}

Page 18: CouchDB

JSON DocumentsExample{

_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,

body: “Once upon a time…”, is_published: true}

Page 19: CouchDB

JSON DocumentsExample{

_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,

body: “Once upon a time…”, is_published: true}

Page 20: CouchDB

JSON DocumentsExample{

_id: “post1”, _rev: “123456”, … _attachments: { “picture.png”: {

stub: true, content_type: “image/png”, length: 384 } }

}

Page 21: CouchDB

Views• Used to sort and filter through data

• Lazily evaluated, highly efficient■ Similar to indexing in relational databases

• Defined in design documents■ Documents named _design/…

• Consist of map and reduce functions■ Language independent■ JavaScript supported by default

■ Mozilla Spidermonkey included

Page 22: CouchDB

Data Processing with MapReduce• Programming model for processing and generating large data sets

• Related, but not equivalent to map and reduce operations infunctional languages

• Take and produce key/value pairs with map and reduce functions

• Map functions■ Take input key/value pairs and produce an intermediate set of key/value pairs

• Reduce functions■ Take intermediate key and set of values for the key, and merges them into a

possibly smaller set of values

• MapReduce: Simplified Data Processing on Large ClustersJeff Dean, Sanjay Ghemawat, Google Inc.

Page 23: CouchDB

Data Processing with MapReduceExample{

_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,

body: “Once upon a time…”, is_published: true}

Page 24: CouchDB

Data Processing with MapReduceExample“post1” = {

_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,

body: “Once upon a time…”, is_published: true}

Page 25: CouchDB

Data Processing with MapReduceExample“post1” = {

title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”}

Page 26: CouchDB

Data Processing with MapReduceEmit Posts by post_date“post1” = {

title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”}

1239910768 = {

title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”}

Page 27: CouchDB

Data Processing with MapReduceEmit Posts by post_date

1208456184 {title: “A bloody long time ago”, …}

1215421546 {title: “A blue moon ago”, …}

1222654641 {title: “Just Yesterday”, …}

1239910768 {title: “A Blog Post”, …}

1246816518 {title: “That was Then”, …}

1251687980 {title: “This is Now”, …}

1264836981 {title: “When Will Then Be Now?”, …}

Page 28: CouchDB

Data Processing with MapReduceEmit Posts by tag“post1” = {

title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”}

“blue” = { title: “A Blog Post”, … }

“glue” = { title: “A Blog Post”, … }

Page 29: CouchDB

Data Processing with MapReduceEmit Posts by tag

blue {title: “Just Yesterday”, …}

blue {title: “A Blog Post”, …}

clue {title: “Just Yesterday”, …}

flue {title: “When Will Then Be Now?”, …}

flue {title: “This is Now”, …}

glue {title: “A Blog Post”, …}

wazoo {title: “That was Then”, …}

Page 30: CouchDB

Data Processing with MapReduceEmit Posts by tag, Reduced

blue{title: “Just Yesterday”, …},{title: “A Blog Post”, …}

clue {title: “Just Yesterday”, …}

flue{title: “When Will Then Be Now?”, …},{title: “This is Now”, …}

glue {title: “A Blog Post”, …}

wazoo {title: “That was Then”, …}

Page 31: CouchDB

Scalability• Incremental MapReduce

• Multiversion Concurrency Control (MVCC)■ Achieves serializability through multiversioning instead of locking■ Eliminates waits to access objects■ Updates create new documents■ Tradeoff point: no waits, increased data storage

• Incremental Distributed Replication

• Eventual Consistency■ Changes eventually propagate through distributed systems■ Tradeoff point: increase availability and tolerancy, decreased freshness

Page 32: CouchDB

Demonstrations