migrating from lamp to app engine - term.ieterm.ie/data/migratingfromlamptoappengine.pdf ·...
TRANSCRIPT
Migrating from LAMP to App Engine
An Audio Visual Experience featuring Andy Smith
photo credit: pinksherbert
It comes with a handout for examples!
^W^W^WURL!http://term.ie/data/examples.txt
Quick Background
cool!
Jaiku, mobile presence sharing in Finland, PHP frontend, python backend, standard lamp, 50k lines code
October 07, bought by Google
April 08, App Engine launches, Google App Engine Helper for Django released
March 09, Jaiku re-launches on App Engine, Django, 25k lines code, Open Source!
photo credit: jussi
No More JOINor GROUP BY or nested queries or much else that
can’t be accessed in constant time
Your Mind
koolaid: but you probably didn’t want to anyway
google is used to building big things that need to scale massively, in order to do that certain restraints on your way of thinking are required
as joe and mike have mentioned, you probably don’t want to after a certain point anyway
No More Surprisesno more intangible thresholds where query time
suddenly goes exponential
Your Mind
koolaid: explicit over implicit
depending on the shape of your individual entities it can still take time to load, but otherwise no query you write will become slower over time
and you could also just try to fetch way too many things, but that’s your own damn fault
No More Slownesseverything you do hits an index, right away
Your Mind
koolaid: it’s quick, dude
quick enough to fake joins
no non-indexed result set filtering or temporary tables
nice article on indexes in app engine: http://code.google.com/appengine/articles/index_building.html
literally cannot make a non-indexed call
No More ALTER TABLEobject store means never having to say you’re sorry
Your Mind
koolaid: it’s not like you ever were able to alter a large table anyway
protocol buffers means you can add attributes to your entities at will without breaking old entities
No More Long Requestsyou haven’t got much time, so make it count
Your Mind
koolaid: keeps you honest
taking an iterative approach to problems
task queue will help this along, it’s on the roadmap, till then a simple queue + cron
No More Slashdot Effectthat’s “getting dugg” for you youngsters
Your Mind
koolaid: you’re not going to crash this database or web server
things don’t change under load
no more logging in to your server and frantically watching top or SHOW PROCESS LIST
Denormalizationyou’ll want it for counting, preferences and sometimes
referenced data
Your Data Model
Denormalization
if you’re not going to index the data don’t bother making it a property
Your Data Model
Pro Tip #1: Use a Dictionary Property
the app engine django helper implements a reasonable DictProperty based on pickle
Bonus Tip!
JaikuEngine has a reasonable implementation of DictProperty to copy
Denormalization
if you have to use a counter you’ll need to compute it at write-time, but they aren’t usually worth the pain
Your Data Model
Pro Tip #2: Avoid counting
things like counts aren’t really that important anyway
the bigger the number, the less accurate it needs to be
Bonus Tip!
don’t bother trying to keep counts super accurate past a certain point, and don’t compute them as often
Denormalization
you can do a few hundred (likely cached) get()s but Query() can be quite a bit slower
Your Data Model
Pro Tip #3: Aim for one Query() per page
sometimes this means duplicating data to prevent additional lookups
Think Like This:
at scale your data starts looking quite a lot like your presentation, organize your data into the shapes you’ll
be using it in
Integritytransactions aren’t going to save you
Your Data Model
transactions in app engine currently only cover fairly simplistic use cases such as updating a single entity while preventing race conditions
Integrity
you don’t have UNIQUE so you need to handle your own duplicate prevention
Your Data Model
Pro Tip #1: Be idempotent
you’re going to want to use guessable key names to avoid needing Query()s to check for duplicate work
What that means is:
anything unique goes in your primary key
Integrity
if something breaks halfway through, trying it again should fix it
Your Data Model
Pro Tip #2: Roll forward
for example, adding somebody as a contact may have some side effects, if the user tries to add the contact again ensure those side effects were completed
Integrity
if you get something that is broken, fix it or forget it
Your Data Model
Pro Tip #3: Clean up after yourself
a good example would be if some initial data that was supposed to be created when a user joins your site does not exist
Pit Falls
ordering thinks like a computer, not a human: separate your display from your storage and index
Your Data Model
Pit Fall #1: Case Sensitivity
Pit Falls
don’t think you can just ask for the 100th result:you need to use inequality and ordering to skip the line
Your Data Model
Pit Fall #2: Paging
you can also page queries with a special __key__ field, there’s a nice post explaining approaches to it
Bonus Tip!
__key__ post: http://groups.google.com/group/google-appengine/browse_thread/thread/ee5afbde20e13cde
Pit Falls
they’re kind of weird and tricky and probably don’t do what you want anyway: don’t bother
Your Data Model
Pit Fall #3: Entity Groups
if you do bother, they’re useful for data that is frequently edited together, but you’ll want to do some
reading
Bonus Tip!
entity group reading: http://code.google.com/appengine/docs/python/datastore/keysandentitygroups.html
Pit Falls
generally a datastore timeout, they happen infrequently but often unpredictably: be prepared
Your Data Model
Pit Fall #4: Timeouts
this is where all that data integrity stuff comes in
under race-condition load, with transactions on similar entities you _can_ lock your tables long enough to
cause timeouts, avoid this by using memcache for locks
Bonus Tip!
Pit Falls
syncing and lockings problems happen everywhere, even on app engine: proper memcache helps though
Your Data Model
Pit Fall #5: Parallel Execution
it’s not a new problem to any of us, but it isn’t going to magically go away
... helper for Django
google-app-engine-django vs app-engine-patch:not all that different
A quick word on libraries
Your App
helper tries to get the models to work with existing django architecture
patch is more interested in the additional tools
i’ve never been the type to use many of the “features” of django, probably largely because i never was able to make use of the orm, so my interest with helper was more towards core functionality, testing
... helper for Django
you’ll need to base your models on a new class that wraps the App Engine and Django models together
Step #1: Port your models
it’d be great for this to be handled automatically for most simple cases, it’s open source, come help out
Bonus Tip!
Your App
Warning!complex sql queries that don’t make sense in the
datastore will have to be re-written and re-thought :/
... helper for Django
most pure python libraries will work fine, but you’ll probably have to make zip files for deployment
Step #2: Package your dependencies
we’ve built some tools for this in JaikuEngine, check out build.py, expect them to be added to the helper
library as well
Bonus Tip!
Your App
... helper for Django
if you are trying to use a lot of django-* apps that do complex database stuff you may have to port them as
well, some database thinking just doesn’t transfer
Step #3: Check your “apps”
Your App
DisclaimerI’m not really the biggest fan of the “app” metaphor in
Django to begin with, but we can argue later :p
... helper for Django
the App Engine SDK comes with a pretty decent admin that, amongst other things includes a console and data
viewer
Step #4: Use /_ah/admin
Your App
Call to Arms
a great resource for django developers; zero to launch in seconds
Why it should be done
Your App
Let’s Fix It
google-app-engine-django and app-engine-patch
We already have some libraries
Your App a bunch of good code, but still relatively hacky
but they are hacking around django instead of working with it
Let’s Fix It
Most database features have obvious translations, some may have to be cut, the db code needs to expect this
A new database backend
Your App
approaches that are efficient in one style of database are not necessarily efficient in another
Let’s Fix It
django needs to be zipped to fit under app engine’s file limit, it’s an easy tool to write
Some lightweight support for packaging
Your App
Let’s Fix It
for deployment, for testing under app engine sdk, simple stuff that has already mostly been written
New manage.py commands
Your App
Questions?photo credit: sombraala
Andy Smith <[email protected]>