drop acid
DESCRIPTION
Session on NoSQL Databases and MongoDB. I stole the title from someone who deserves credit, but unfortunately, can't remember who. I blame the acid.TRANSCRIPT
NoSQL - Death to Relational Databases
Mike FeltmanF1 Technologies
Agenda• The NoSQL Movement• MongoDB Discussion & Demo• Discussion
The NoSQL MovementNo SQL Databases:
Non-relationalLess ACID More BASECAP TradingHighly ScalableHighly Performant
NoSQL = Not Only SQL
Less ACID• Atomic • basically means supports transactions
• Consistent• Has hard constraints & rejects non-conforming data
• Isolated • No peaking at incomplete commits
• Durable• Once a commit is finished, it lasts forever.
More BASE• Basically Available • Soft-state • Eventually consistent
CAP Trading• Consistency (client perceives set of operations
completed)• Availability (operations terminate with an
expected result)• Partition tolerance (operations will complete,
even if a required resource is unavailable)• Only 2 are possible in distributed systems.– Eric Brewer
The NoSQL MovementWhy:• SQL is tedious and difficult• Strongly typed schemas are inflexible and painful
to maintain• Inadequate performance of RDBMS on huge data
stores• Poor Scalability of RDBMS• Poor Replication Support
Types of NoSQL Databases• Document Stores• Graph• Key/Value Store• Object Database• Tabular
Major Players• Mongodb (10gen)• CouchDB (Apache)• Cassandra (Apache –
formerly Facebook)• BigTable – (Google)• Berkeley DB (Oracle)
• Dynamo (Amazon)• MObStor (Yahoo)• Haystack (Facebook)• Voldemort (LinkedIn)• HBase/Hadoop (Apache
& Microsoft)
MongoDBCombining the best features of document
databases, key-value stores, and RDBMSes.
• Scalable• High-Performance• Open Source• Schema-free• Document Oriented
MongoDB Features• Document-oriented
storage (BSON)• Dynamic Queries• Full index support
(including embedded objects & arrays)
• Fast, in-place updates• Efficient Blob storage
• Replication• Auto-sharding• MapReduce• Driver support for many
languages• Cross-Platform• Admin Tools
Document Oriented Storage
• Data is stored in BSON– Binary-encoded
serialization of JSON-like documents.
– Lightweight, traversable & efficient
– Supports embedded objects & arrays
– Document = Record
{ firstName: “Nicklas”, lastName: “Lidstrom”, team: “Red Wings”, stanleyCups : [1997, 1998, 2002, 2008], norrisTrophies : [2001, 2002, 2003, 2006, 2007, 2008] }
Dynamic Queries• No indexes required to
find data.• RDBMSes all support
this as well.
Examples• All records:db.players.find({})• All Red Wingsdb.players.find({“team”:
“Red Wings”})
Index Support• B-Tree format• Default index on PK• Supports unique, compound, document
indexes (indexes on nested documents) and multikeys indexes (allows indexing of arrays of values)
Fast in-place updates• Updates are made to existing documents
within a collection. • Many “NoSQL” databases (such as CouchDB)
do not support updates and instead store versions of records.
Efficient Blob Storage• Blob = Binary Large Object• Up to 4MB within document• GridFS specification is followed for larger
items and external files
Replication• Enhanced master-slave configuration– one server active for writes at a time.– Provides failover and redundancy– Implemented with Replica Pairs• When master fails slave takes over• When slave fails control reverts to master
• Limited Master-master
Auto-Sharding• Sharding: – Breaking database down into “shards” and
spreading those across distributed/commodity servers.
– highly scalable approach for increased throughput and performance of high-transaction, large database applications.
– MongoDB manages data storage and retrieval behind the scenes.
MapReduce
• Term comes from Google. – Patented framework for
processing huge datasets on certain kinds of distributable problems using a large number of servers.
– MongoDB applies it to single server instances as well.
• Useful for batch operations
• Aggregation: NoSQL answer to GROUP BY
Drivers• .NET (C#)• JavaScript• Python• PHP• Ruby• Java• C++
• Perl• JVM– Clojure– Groovy– Scala
Cross-Platform• 32 bit & 64 bit versions available for:– Windows– OS X– Linux– Solaris
Admin Tools• Command Shell• Simple limited REST (http) Interface• Mongostat• Mongosniff (Unix only – use tcpdump on
Windows)• Backup & Restore
MongoDB TerminologyTraditional RDBMS• Database• Table• Record• Field
MongoDB• Database• Collection• Document• Key
Demo!• Start the server (if it’s not running).
C:\mongodb\bin\mongod• Start the shell
C:\mongodb\bin\mongo
The MongoDB Shell
Database Commands• Open Database• Create Database
• use (database name)• use (database name)
How it works• Focused on documents
– Document = sequence of key value pairs in bson• Value can be another document• Additional types vs. JSON. ie dates, regexp
• Messages (cpassed over TCP/IP) in BSON drivers convert code to BSON• Memory mapped storage engine (MMSE) – all disk access takes place
through MMSE• Query Optimizer:
– Find( {x:10, y:”foo”})– Launches multiple simultaneous queries based on indexes & table scan. Stops
when one finishes, remembers which one was the fastest for future similar queries. Can use hint option to specify which index to use.
Why?• Applications where schema gets in the way• Performance• Scalability• RAD• More natural fit with OO Languages
Resources• www.mongodb.org