rapid and scalable development with mongodb, pymongo, and ming
DESCRIPTION
This intermediate-level talk will teach you techniques using the popular NoSQL database MongoDB and the Python library Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.TRANSCRIPT
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland @rick446Arborian Consulting, LLC
Getting Acquainted
http://www.flickr.com/photos/fazen/9079179/
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ODM: When a dict just won’t do
Roadmap
PyMongo: Getting Started
>>> import pymongo>>> conn = pymongo.Connection()>>> connConnection('localhost', 27017)>>> conn.testDatabase(Connection('localhost', 27017), u'test')>>> conn.test.fooCollection(Database(Connection('localhost', 27017), u'test'),
u'foo')>>> conn['test-db']Database(Connection('localhost', 27017), u'test-db')>>> conn['test-db']['foo-collection']Collection(Database(Connection('localhost', 27017), u'test-
db'), u'foo-collection')>>> conn.test.foo.bar.bazCollection(Database(Connection('localhost', 27017), u'test'),
u'foo.bar.baz')
PyMongo: Insert / Update / Delete
>>> db = conn.test>>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2, {'k':5} ] })>>> idObjectId('4e712e21eb033009fa000000')>>> db.foo.find()<pymongo.cursor.Cursor object at 0x29c7d50>>>> list(db.foo.find())[{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'),
u'baz': [1, 2, {'k': 5}]}]>>> db.foo.update({'_id':id}, {'$set': { 'bar':2}})>>> db.foo.find().next(){u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'),
u'baz': [1, 2, {'k': 5}]}>>> db.foo.remove({'_id':id})>>> list(db.foo.find())[ ]
Auto-Generated _id
Cursors are python
generators
Remove uses same query language as
find()
PyMongo: Queries, Indexes
>>> db.foo.insert([ dict(x=x) for x in range(10) ])[ObjectId('4e71313aeb033009fa00000b'), … ] >>> list(db.foo.find({ 'x': {'$gt': 3} }))[{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},
{u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')}, {u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, …]
>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ))[{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},
{u'x': 9}]>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ) ... .skip(1).limit(2))[{u'x': 5}, {u'x': 6}]>>> db.foo.ensure_index([... ('x’,pymongo.ASCENDING),('y’,pymongo.DESCENDING)])u'x_1_y_-1’
Range Query
Partial Retrieval
Compound Indexes
PyMongo and Locking
One Rule (for now): Avoid Javascript
http://www.flickr.com/photos/lizjones/295567490/
PyMongo: Aggregation et.al.
You gotta write Javascript (for now)
It’s pretty slow (single-threaded JS engine)
Javascript is used by $where in a query .group(key, condition, initial, reduce, finalize=None) .map_reduce(map, reduce, out, finalize=None, …)
Sharding gives some parallelism with .map_reduce() (and possibly ‘$where’). Otherwise you’re single threaded.
MongoDB 2.2 with New Aggregation
FrameworkComing Real Soon
Now ™
PyMongo: GridFS>>> import gridfs
>>> fs = gridfs.GridFS(db)
>>> with fs.new_file() as fp:
... fp.write('The file')
...
>>> fp
<gridfs.grid_file.GridIn object at 0x2cae910>
>>> fp._id
ObjectId('4e727f64eb03300c0b000003')
>>> fs.get(fp._id).read()
'The file'
Arbitrary data can be stored in the ‘fp’ object – it’s just a Document (but please put it in ‘fp.metadata’) Mime type, links to other docs, etc.
Python context manager
Retrieve file by _id
PyMongo: GridFS Versioning
>>> file_id = fs.put('Moar data!', filename='foo.txt')
>>> fs.get_last_version('foo.txt').read()
'Moar data!’
>>> file_id = fs.put('Even moar data!', filename='foo.txt')
>>> fs.get_last_version('foo.txt').read()
'Even moar data!’
>>> fs.get_version('foo.txt', -2).read()
'Moar data!’
>>> fs.list()
[u'foo.txt']
>>> fs.delete(fs.get_last_version('foo.txt')._id)
>>> fs.list()
[u'foo.txt']
>>> fs.delete(fs.get_last_version('foo.txt')._id)
>>> fs.list()
[]
Create file by filename
“2nd from the last”
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ODM: When a dict just won’t do
Roadmap
Why Ming? Your data has a schema
Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code
Sometimes you need a “migration” Changing the structure/meaning of fields Adding indexes, particularly unique indexes Sometimes lazy, sometimes eager
“Unit of work:” Queuing up all your updates can be handy
Ming: Models, DataStores, & Sessions
Model (schema)
Datastore
(database)
Session
Ming: DataStores & Sessions
>>> import ming.datastore
>>> ds = ming.datastore.DataStore('mongodb://localhost:27017', database='test')
>>> ds.db
Database(Connection('localhost', 27017), u'test')
>>> session = ming.Session(ds)
>>> session.db
Database(Connection('localhost', 27017), u'test')
>>> ming.configure(**{
... 'ming.main.master':'mongodb://localhost:27017',
... 'ming.main.database':'test'})
>>> Session.by_name('main').db
Database(Connection(u'localhost', 27017), u'test')
Connection + Database
Optimized for config files
Surprising Data
http://www.flickr.com/photos/pictureclara/5333266789/
Ming: Define Your Schema
from ming import schema, Field
WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))
CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))
Index created on import
Shorthand for schema.String
Ming: Define Your Schema…Once more, with feeling
from ming import Document, Session, Field
class WikiDoc(Document): class __mongometa__: session=Session.by_name(’main') name='wiki_page’ indexes=[ ('title') ] title = Field(str) text = Field(str)
Old declarative syntax continues to exist and be supported, but it’s not being actively improved
Sometimes nice when you want additional methods/attrs on your document class
Ming: Use Your Schema>>> doc = WikiDoc(dict(title='Cats', text='I can haz
cheezburger?'))>>> doc.m.save()>>> WikiDoc.m.find()<ming.base.Cursor object at 0x2c2cd90>>>> WikiDoc.m.find().all()[{'text': u'I can haz cheezburger?', '_id':
ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]>>> WikiDoc.m.find().one().textu'I can haz cheezburger?’>>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle'))>>> doc.m.save()Traceback (most recent call last): File "<stdin>", line 1, …ming.schema.Invalid: <class
'ming.metadata.Document<wiki_page>'>: Extra keys: set(['tietul'])
Documents are dict subclasses
Exception pinpoints problem
Ming Bonus:Mongo-in-Memory
>>> ming.datastore.DataStore('mim://', database='test').db
mim.Database(test)
MongoDB is (generally) fast … except when creating databases … particularly when you preallocate
Unit tests like things to be isolated
MIM gives you isolation at the expense of speed & scaling
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ODM: When a dict just won’t do
Roadmap
Ming ODM: Classes and Collections
from ming import schema, Fieldfrom ming.odm import (mapper, Mapper, RelationProperty, ForeignIdProperty)
WikiDoc = collection('wiki_page', session, … )CommentDoc = collection(’comment’, session, … )
class WikiPage(object): passclass Comment(object): pass
odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('WikiComment')))odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage')))
Plain Old Python Classes
Map classes to collection +
session
“Relations”
Ming OdM: Classes and Collections (declarative)
class WikiPage(MappedClass): class __mongometa__: session = main_odm_session name='wiki_page’ indexes = [ 'title' ]
_id = FieldProperty(S.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments = RelationProperty(’Comment’)
Ming ODM: Sessions and Queries
Session ODMSession My_collection.m… My_mapped_class.query… ODMSession actually does stuff
Track object identity Track object modifications Unit of work flushing all changes at once
>>> pg = WikiPage(title='MyPage', text='is here')
>>> session.db.wiki_page.count()
0
>>> main_orm_session.flush()
>>> session.db.wiki_page.count()
1
Ming Plugins
http://www.flickr.com/photos/39747297@N05/5229733647/
Ming ODM: Extending the Session
Various plug points in the session before_flush after_flush
Some uses Logging changes to sensitive data or for
analytics Full-text search indexing “last modified” fields Performance instrumentation
Ming ODM: Extending the Mapper
Various plug points in the mapper before_/after_:
Insert Update Delete Remove
Some uses Collection/model-specific logging (user
creation, etc.) Anything you might want a SessionExtension
for but would rather do per-model
Related Projects
Minghttp://sf.net/projects/merciless/MIT License
PyMongohttp://api.mongodb.org/pythonApache License
Questions?
Rick Copeland @rick446Arborian Consulting, LLC
http://www.flickr.com/photos/f-oxymoron/5005673112/
Feedback? http://www.surveymonkey.com/s/5DLCYKN