rapid and scalable development with mongodb, pymongo, and ming

28
Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 Arborian Consulting, LLC

Upload: rick-copeland

Post on 29-Nov-2014

1.991 views

Category:

Technology


0 download

DESCRIPTION

This intermediate-level talk will teach you techniques using the popular NoSQL database MongoDB and the Python library Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.

TRANSCRIPT

Page 1: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Rick Copeland @rick446Arborian Consulting, LLC

Page 2: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Getting Acquainted

http://www.flickr.com/photos/fazen/9079179/

Page 3: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ODM: When a dict just won’t do

Roadmap

Page 4: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

PyMongo: Getting Started

>>> import pymongo>>> conn = pymongo.Connection()>>> connConnection('localhost', 27017)>>> conn.testDatabase(Connection('localhost', 27017), u'test')>>> conn.test.fooCollection(Database(Connection('localhost', 27017), u'test'),

u'foo')>>> conn['test-db']Database(Connection('localhost', 27017), u'test-db')>>> conn['test-db']['foo-collection']Collection(Database(Connection('localhost', 27017), u'test-

db'), u'foo-collection')>>> conn.test.foo.bar.bazCollection(Database(Connection('localhost', 27017), u'test'),

u'foo.bar.baz')

Page 5: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

PyMongo: Insert / Update / Delete

>>> db = conn.test>>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2, {'k':5} ] })>>> idObjectId('4e712e21eb033009fa000000')>>> db.foo.find()<pymongo.cursor.Cursor object at 0x29c7d50>>>> list(db.foo.find())[{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'),

u'baz': [1, 2, {'k': 5}]}]>>> db.foo.update({'_id':id}, {'$set': { 'bar':2}})>>> db.foo.find().next(){u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'),

u'baz': [1, 2, {'k': 5}]}>>> db.foo.remove({'_id':id})>>> list(db.foo.find())[ ]

Auto-Generated _id

Cursors are python

generators

Remove uses same query language as

find()

Page 6: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

PyMongo: Queries, Indexes

>>> db.foo.insert([ dict(x=x) for x in range(10) ])[ObjectId('4e71313aeb033009fa00000b'), … ] >>> list(db.foo.find({ 'x': {'$gt': 3} }))[{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},

{u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')}, {u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, …]

>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ))[{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},

{u'x': 9}]>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ) ... .skip(1).limit(2))[{u'x': 5}, {u'x': 6}]>>> db.foo.ensure_index([... ('x’,pymongo.ASCENDING),('y’,pymongo.DESCENDING)])u'x_1_y_-1’

Range Query

Partial Retrieval

Compound Indexes

Page 7: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

PyMongo and Locking

One Rule (for now): Avoid Javascript

http://www.flickr.com/photos/lizjones/295567490/

Page 8: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

PyMongo: Aggregation et.al.

You gotta write Javascript (for now)

It’s pretty slow (single-threaded JS engine)

Javascript is used by $where in a query .group(key, condition, initial, reduce, finalize=None) .map_reduce(map, reduce, out, finalize=None, …)

Sharding gives some parallelism with .map_reduce() (and possibly ‘$where’). Otherwise you’re single threaded.

MongoDB 2.2 with New Aggregation

FrameworkComing Real Soon

Now ™

Page 9: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

PyMongo: GridFS>>> import gridfs

>>> fs = gridfs.GridFS(db)

>>> with fs.new_file() as fp:

... fp.write('The file')

...

>>> fp

<gridfs.grid_file.GridIn object at 0x2cae910>

>>> fp._id

ObjectId('4e727f64eb03300c0b000003')

>>> fs.get(fp._id).read()

'The file'

Arbitrary data can be stored in the ‘fp’ object – it’s just a Document (but please put it in ‘fp.metadata’) Mime type, links to other docs, etc.

Python context manager

Retrieve file by _id

Page 10: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

PyMongo: GridFS Versioning

>>> file_id = fs.put('Moar data!', filename='foo.txt')

>>> fs.get_last_version('foo.txt').read()

'Moar data!’

>>> file_id = fs.put('Even moar data!', filename='foo.txt')

>>> fs.get_last_version('foo.txt').read()

'Even moar data!’

>>> fs.get_version('foo.txt', -2).read()

'Moar data!’

>>> fs.list()

[u'foo.txt']

>>> fs.delete(fs.get_last_version('foo.txt')._id)

>>> fs.list()

[u'foo.txt']

>>> fs.delete(fs.get_last_version('foo.txt')._id)

>>> fs.list()

[]

Create file by filename

“2nd from the last”

Page 11: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ODM: When a dict just won’t do

Roadmap

Page 12: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Why Ming? Your data has a schema

Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code

Sometimes you need a “migration” Changing the structure/meaning of fields Adding indexes, particularly unique indexes Sometimes lazy, sometimes eager

“Unit of work:” Queuing up all your updates can be handy

Page 13: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming: Models, DataStores, & Sessions

Model (schema)

Datastore

(database)

Session

Page 14: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming: DataStores & Sessions

>>> import ming.datastore

>>> ds = ming.datastore.DataStore('mongodb://localhost:27017', database='test')

>>> ds.db

Database(Connection('localhost', 27017), u'test')

>>> session = ming.Session(ds)

>>> session.db

Database(Connection('localhost', 27017), u'test')

>>> ming.configure(**{

... 'ming.main.master':'mongodb://localhost:27017',

... 'ming.main.database':'test'})

>>> Session.by_name('main').db

Database(Connection(u'localhost', 27017), u'test')

Connection + Database

Optimized for config files

Page 15: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Surprising Data

http://www.flickr.com/photos/pictureclara/5333266789/

Page 16: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming: Define Your Schema

from ming import schema, Field

WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))

CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))

Index created on import

Shorthand for schema.String

Page 17: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming: Define Your Schema…Once more, with feeling

from ming import Document, Session, Field

class WikiDoc(Document): class __mongometa__: session=Session.by_name(’main') name='wiki_page’ indexes=[ ('title') ] title = Field(str) text = Field(str)

Old declarative syntax continues to exist and be supported, but it’s not being actively improved

Sometimes nice when you want additional methods/attrs on your document class

Page 18: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming: Use Your Schema>>> doc = WikiDoc(dict(title='Cats', text='I can haz

cheezburger?'))>>> doc.m.save()>>> WikiDoc.m.find()<ming.base.Cursor object at 0x2c2cd90>>>> WikiDoc.m.find().all()[{'text': u'I can haz cheezburger?', '_id':

ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]>>> WikiDoc.m.find().one().textu'I can haz cheezburger?’>>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle'))>>> doc.m.save()Traceback (most recent call last): File "<stdin>", line 1, …ming.schema.Invalid: <class

'ming.metadata.Document<wiki_page>'>: Extra keys: set(['tietul'])

Documents are dict subclasses

Exception pinpoints problem

Page 19: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming Bonus:Mongo-in-Memory

>>> ming.datastore.DataStore('mim://', database='test').db

mim.Database(test)

MongoDB is (generally) fast … except when creating databases … particularly when you preallocate

Unit tests like things to be isolated

MIM gives you isolation at the expense of speed & scaling

Page 20: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ODM: When a dict just won’t do

Roadmap

Page 21: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming ODM: Classes and Collections

from ming import schema, Fieldfrom ming.odm import (mapper, Mapper, RelationProperty, ForeignIdProperty)

WikiDoc = collection('wiki_page', session, … )CommentDoc = collection(’comment’, session, … )

class WikiPage(object): passclass Comment(object): pass

odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('WikiComment')))odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage')))

Plain Old Python Classes

Map classes to collection +

session

“Relations”

Page 22: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming OdM: Classes and Collections (declarative)

class WikiPage(MappedClass): class __mongometa__: session = main_odm_session name='wiki_page’ indexes = [ 'title' ]

_id = FieldProperty(S.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments = RelationProperty(’Comment’)

Page 23: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming ODM: Sessions and Queries

Session ODMSession My_collection.m… My_mapped_class.query… ODMSession actually does stuff

Track object identity Track object modifications Unit of work flushing all changes at once

>>> pg = WikiPage(title='MyPage', text='is here')

>>> session.db.wiki_page.count()

0

>>> main_orm_session.flush()

>>> session.db.wiki_page.count()

1

Page 24: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming Plugins

http://www.flickr.com/photos/39747297@N05/5229733647/

Page 25: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming ODM: Extending the Session

Various plug points in the session before_flush after_flush

Some uses Logging changes to sensitive data or for

analytics Full-text search indexing “last modified” fields Performance instrumentation

Page 26: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Ming ODM: Extending the Mapper

Various plug points in the mapper before_/after_:

Insert Update Delete Remove

Some uses Collection/model-specific logging (user

creation, etc.) Anything you might want a SessionExtension

for but would rather do per-model

Page 27: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Related Projects

Minghttp://sf.net/projects/merciless/MIT License

PyMongohttp://api.mongodb.org/pythonApache License

Page 28: Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Questions?

Rick Copeland @rick446Arborian Consulting, LLC

http://www.flickr.com/photos/f-oxymoron/5005673112/

Feedback? http://www.surveymonkey.com/s/5DLCYKN