rapid and scalable development with mongodb, pymongo, and ming

Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Rick Copeland @rick446Arborian Consulting, LLC

Getting Acquainted

http://www.flickr.com/photos/fazen/9079179/

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ODM: When a dict just won’t do

Roadmap

PyMongo: Getting Started

>>> import pymongo>>> conn = pymongo.Connection()>>> connConnection('localhost', 27017)>>> conn.testDatabase(Connection('localhost', 27017), u'test')>>> conn.test.fooCollection(Database(Connection('localhost', 27017), u'test'),

u'foo')>>> conn['test-db']Database(Connection('localhost', 27017), u'test-db')>>> conn['test-db']['foo-collection']Collection(Database(Connection('localhost', 27017), u'test-

db'), u'foo-collection')>>> conn.test.foo.bar.bazCollection(Database(Connection('localhost', 27017), u'test'),

u'foo.bar.baz')

PyMongo: Insert / Update / Delete

>>> db = conn.test>>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2, {'k':5} ] })>>> idObjectId('4e712e21eb033009fa000000')>>> db.foo.find()<pymongo.cursor.Cursor object at 0x29c7d50>>>> list(db.foo.find())[{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'),

u'baz': [1, 2, {'k': 5}]}]>>> db.foo.update({'_id':id}, {'$set': { 'bar':2}})>>> db.foo.find().next(){u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'),

u'baz': [1, 2, {'k': 5}]}>>> db.foo.remove({'_id':id})>>> list(db.foo.find())[ ]

Auto-Generated _id

Cursors are python

generators

Remove uses same query language as

find()

PyMongo: Queries, Indexes

>>> db.foo.insert([ dict(x=x) for x in range(10) ])[ObjectId('4e71313aeb033009fa00000b'), … ] >>> list(db.foo.find({ 'x': {'$gt': 3} }))[{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},

{u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')}, {u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, …]

>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ))[{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},

{u'x': 9}]>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ) ... .skip(1).limit(2))[{u'x': 5}, {u'x': 6}]>>> db.foo.ensure_index([... ('x’,pymongo.ASCENDING),('y’,pymongo.DESCENDING)])u'x_1_y_-1’

Range Query

Partial Retrieval

Compound Indexes

PyMongo and Locking

One Rule (for now): Avoid Javascript

http://www.flickr.com/photos/lizjones/295567490/

PyMongo: Aggregation et.al.

You gotta write Javascript (for now)

It’s pretty slow (single-threaded JS engine)

Javascript is used by $where in a query .group(key, condition, initial, reduce, finalize=None) .map_reduce(map, reduce, out, finalize=None, …)

Sharding gives some parallelism with .map_reduce() (and possibly ‘$where’). Otherwise you’re single threaded.

MongoDB 2.2 with New Aggregation

FrameworkComing Real Soon

Now ™

PyMongo: GridFS>>> import gridfs

>>> fs = gridfs.GridFS(db)

>>> with fs.new_file() as fp:

... fp.write('The file')

...

>>> fp

<gridfs.grid_file.GridIn object at 0x2cae910>

>>> fp._id

ObjectId('4e727f64eb03300c0b000003')

>>> fs.get(fp._id).read()

'The file'

Arbitrary data can be stored in the ‘fp’ object – it’s just a Document (but please put it in ‘fp.metadata’) Mime type, links to other docs, etc.

Python context manager

Retrieve file by _id

PyMongo: GridFS Versioning

>>> file_id = fs.put('Moar data!', filename='foo.txt')

>>> fs.get_last_version('foo.txt').read()

'Moar data!’

>>> file_id = fs.put('Even moar data!', filename='foo.txt')

>>> fs.get_last_version('foo.txt').read()

'Even moar data!’

>>> fs.get_version('foo.txt', -2).read()

'Moar data!’

>>> fs.list()

[u'foo.txt']

>>> fs.delete(fs.get_last_version('foo.txt')._id)

>>> fs.list()

[u'foo.txt']

>>> fs.delete(fs.get_last_version('foo.txt')._id)

>>> fs.list()

[]

Create file by filename

“2nd from the last”




Roadmap

Why Ming? Your data has a schema

Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code

Sometimes you need a “migration” Changing the structure/meaning of fields Adding indexes, particularly unique indexes Sometimes lazy, sometimes eager

“Unit of work:” Queuing up all your updates can be handy

Ming: Models, DataStores, & Sessions

Model (schema)

Datastore

(database)

Session

Ming: DataStores & Sessions

>>> import ming.datastore

>>> ds = ming.datastore.DataStore('mongodb://localhost:27017', database='test')

>>> ds.db

Database(Connection('localhost', 27017), u'test')

>>> session = ming.Session(ds)

>>> session.db

Database(Connection('localhost', 27017), u'test')

>>> ming.configure(**{

... 'ming.main.master':'mongodb://localhost:27017',

... 'ming.main.database':'test'})

>>> Session.by_name('main').db

Database(Connection(u'localhost', 27017), u'test')

Connection + Database

Optimized for config files

Surprising Data

http://www.flickr.com/photos/pictureclara/5333266789/

Ming: Define Your Schema

from ming import schema, Field

WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))

CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))

Index created on import

Shorthand for schema.String

Ming: Define Your Schema…Once more, with feeling

from ming import Document, Session, Field

class WikiDoc(Document): class __mongometa__: session=Session.by_name(’main') name='wiki_page’ indexes=[ ('title') ] title = Field(str) text = Field(str)

Old declarative syntax continues to exist and be supported, but it’s not being actively improved

Sometimes nice when you want additional methods/attrs on your document class

Ming: Use Your Schema>>> doc = WikiDoc(dict(title='Cats', text='I can haz

cheezburger?'))>>> doc.m.save()>>> WikiDoc.m.find()<ming.base.Cursor object at 0x2c2cd90>>>> WikiDoc.m.find().all()[{'text': u'I can haz cheezburger?', '_id':

ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]>>> WikiDoc.m.find().one().textu'I can haz cheezburger?’>>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle'))>>> doc.m.save()Traceback (most recent call last): File "<stdin>", line 1, …ming.schema.Invalid: <class

'ming.metadata.Document<wiki_page>'>: Extra keys: set(['tietul'])

Documents are dict subclasses

Exception pinpoints problem

Ming Bonus:Mongo-in-Memory

>>> ming.datastore.DataStore('mim://', database='test').db

mim.Database(test)

MongoDB is (generally) fast … except when creating databases … particularly when you preallocate

Unit tests like things to be isolated

MIM gives you isolation at the expense of speed & scaling




Roadmap

Ming ODM: Classes and Collections

from ming import schema, Fieldfrom ming.odm import (mapper, Mapper, RelationProperty, ForeignIdProperty)

WikiDoc = collection('wiki_page', session, … )CommentDoc = collection(’comment’, session, … )

class WikiPage(object): passclass Comment(object): pass

odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('WikiComment')))odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage')))

Plain Old Python Classes

Map classes to collection +

session

“Relations”

Ming OdM: Classes and Collections (declarative)

class WikiPage(MappedClass): class __mongometa__: session = main_odm_session name='wiki_page’ indexes = [ 'title' ]

_id = FieldProperty(S.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments = RelationProperty(’Comment’)

Ming ODM: Sessions and Queries

Session ODMSession My_collection.m… My_mapped_class.query… ODMSession actually does stuff

Track object identity Track object modifications Unit of work flushing all changes at once

>>> pg = WikiPage(title='MyPage', text='is here')

>>> session.db.wiki_page.count()

0

>>> main_orm_session.flush()

>>> session.db.wiki_page.count()

1

Ming Plugins

http://www.flickr.com/photos/39747297@N05/5229733647/

Ming ODM: Extending the Session

Various plug points in the session before_flush after_flush

Some uses Logging changes to sensitive data or for

analytics Full-text search indexing “last modified” fields Performance instrumentation

Ming ODM: Extending the Mapper

Various plug points in the mapper before_/after_:

Insert Update Delete Remove

Some uses Collection/model-specific logging (user

creation, etc.) Anything you might want a SessionExtension

for but would rather do per-model

Related Projects

Minghttp://sf.net/projects/merciless/MIT License

PyMongohttp://api.mongodb.org/pythonApache License

Questions?

Rick Copeland @rick446Arborian Consulting, LLC

http://www.flickr.com/photos/f-oxymoron/5005673112/

Feedback? http://www.surveymonkey.com/s/5DLCYKN

rapid and scalable development with mongodb, pymongo, and ming

Technology