couchdb
DESCRIPTION
An overview of CouchDB. Originally presented internally at University of Calgary IT.TRANSCRIPT
CouchDB
King Chung HuangInformation Technologies
University of Calgary
Relax
Today’s TalkDocument-oriented Databases
CouchDB Overview
Demonstrations
Document-oriented Databases
Databases
Databases
FlatHierarchical
NetworkRelational
DatabasesDimensional
Object
Post-Relational
Document-oriented
Document-oriented Databases• Comparable to documents in the real world
• Records are stored as schema-less documents■ Each document is uniquely named■ Documents are the primary unit of storage
• Structures are not explicitly defined■ No tables with uniform, pre-defined fields■ Every document can have varying fields of different types
• Documents are self contained■ Data is not decomposed into tables with relations■ Documents contain the context needed to understand them
Document-oriented Databases• Examples
■ Lotus Notes■ Amazon SimpleDB■ CouchDB
• Key-Value Stores■ Amazon S3
■ Dynamo: Amazon’s Highly Available Key-value Store, DeCandia, et al., 2007■ Facebook Cassandra
■ Recently accepted as an Apache incubation project■ Google BigTable
■ Bigtable: A Distributed Storage System for Structured Data, Chang, et al., 2006
CouchDB Overview
What is CouchDB?
Document database server
REST API
JSON documents
Views with MapReduce
Highly Scalable
Document Database Server• Implemented in Erlang
■ Ericsson Language■ Highly concurrent, functional programming language
• Designed with modern web applications in mind
• Atomic Consistent Isolated Durable (ACID)
• “Crash-only” design
• Supports external handlers■ Change notification■ Custom processing
•
REST HTTP API• Representational State Transfer
■ A set of principles about how resources are defined and addressed
• World Wide Web (HTTP) is RESTful■ Uniform interface for accessing resources■ Resources identified by URI■ Actions transmitted in HTTP methods■ Status communicated in status codes
REST HTTP APICRUD
• Create, Read, Update, and Delete• In HTTP
■ POST /some/resource/id■ GET /some/resource/id■ PUT /some/resource/id■ DELETE /some/resource/id
JSON Documents• JavaScript Object Notation
■ Considered language-independent
• CouchDB stored XML documents before version 0.8■ Suitable if content is already in XML■ Human readable, but can be onerous to type■ Markup language, requires transformation from/to data structures
• Represents primitive data types and structures■ Strings, numbers, booleans■ Arrays, dictionaries■ Null
• Documents can have attachments
JSON DocumentsExample{
_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,
body: “Once upon a time…”, is_published: true}
JSON DocumentsExample{
_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,
body: “Once upon a time…”, is_published: true}
JSON DocumentsExample{
_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,
body: “Once upon a time…”, is_published: true}
JSON DocumentsExample{
_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,
body: “Once upon a time…”, is_published: true}
JSON DocumentsExample{
_id: “post1”, _rev: “123456”, … _attachments: { “picture.png”: {
stub: true, content_type: “image/png”, length: 384 } }
}
Views• Used to sort and filter through data
• Lazily evaluated, highly efficient■ Similar to indexing in relational databases
• Defined in design documents■ Documents named _design/…
• Consist of map and reduce functions■ Language independent■ JavaScript supported by default
■ Mozilla Spidermonkey included
Data Processing with MapReduce• Programming model for processing and generating large data sets
• Related, but not equivalent to map and reduce operations infunctional languages
• Take and produce key/value pairs with map and reduce functions
• Map functions■ Take input key/value pairs and produce an intermediate set of key/value pairs
• Reduce functions■ Take intermediate key and set of values for the key, and merges them into a
possibly smaller set of values
• MapReduce: Simplified Data Processing on Large ClustersJeff Dean, Sanjay Ghemawat, Google Inc.
Data Processing with MapReduceExample{
_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,
body: “Once upon a time…”, is_published: true}
Data Processing with MapReduceExample“post1” = {
_id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768,
body: “Once upon a time…”, is_published: true}
Data Processing with MapReduceExample“post1” = {
title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”}
Data Processing with MapReduceEmit Posts by post_date“post1” = {
title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”}
1239910768 = {
title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”}
Data Processing with MapReduceEmit Posts by post_date
1208456184 {title: “A bloody long time ago”, …}
1215421546 {title: “A blue moon ago”, …}
1222654641 {title: “Just Yesterday”, …}
1239910768 {title: “A Blog Post”, …}
1246816518 {title: “That was Then”, …}
1251687980 {title: “This is Now”, …}
1264836981 {title: “When Will Then Be Now?”, …}
Data Processing with MapReduceEmit Posts by tag“post1” = {
title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”}
“blue” = { title: “A Blog Post”, … }
“glue” = { title: “A Blog Post”, … }
Data Processing with MapReduceEmit Posts by tag
blue {title: “Just Yesterday”, …}
blue {title: “A Blog Post”, …}
clue {title: “Just Yesterday”, …}
flue {title: “When Will Then Be Now?”, …}
flue {title: “This is Now”, …}
glue {title: “A Blog Post”, …}
wazoo {title: “That was Then”, …}
Data Processing with MapReduceEmit Posts by tag, Reduced
blue{title: “Just Yesterday”, …},{title: “A Blog Post”, …}
clue {title: “Just Yesterday”, …}
flue{title: “When Will Then Be Now?”, …},{title: “This is Now”, …}
glue {title: “A Blog Post”, …}
wazoo {title: “That was Then”, …}
Scalability• Incremental MapReduce
• Multiversion Concurrency Control (MVCC)■ Achieves serializability through multiversioning instead of locking■ Eliminates waits to access objects■ Updates create new documents■ Tradeoff point: no waits, increased data storage
• Incremental Distributed Replication
• Eventual Consistency■ Changes eventually propagate through distributed systems■ Tradeoff point: increase availability and tolerancy, decreased freshness
Demonstrations