django multi-db in anger
DESCRIPTION
A talk given at Django-NYC on March 21st, 2011, on how Django MultiDB capabilities are implemented at Saaspire (www.saaspire.com).TRANSCRIPT
MultiDB In AngerProfessional Driver. Closed Course.
Hi.
Sean OConnor in background, with superior monitor.
World Headquarters
“Checkpoint Charlie” T-
Shirt
Django MultiDBAs It Was Probably Intended
The Legacy DB Problem
New DB, built for New
Web App
Legacy DB that you are Unable to
Eliminate. :(
Django Website
Django ORM, Before.
RDBMS
ORM API
Database Backend
Python Database
Driver
MultiDB Mode
RDBMS
ORM API
Database Backend
Python Database
Driver
RDBMS
Database Backend
Python Database
Driver
RDBMS
Database Backend
Python Database
Driver
DB Routing
DB Definition
DB Definition
DB Definition
Out of the Box
• Database definitions.
• --database=
• Database Routers.
• using(), using=
DB Definitions
DATABASES = { 'default': { 'NAME': 'app_data', 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'USER': 'postgres_user', 'PASSWORD': 's3krit' }, 'users': { 'NAME': 'user_data', 'ENGINE': 'django.db.backends.mysql', 'USER': 'mysql_user', 'PASSWORD': 'priv4te' }}
settings.py
--database=
./manage.py syncdb
./manage.py syncdb --database=legacy
./manage.py migrate
./manage.py migrate --database=legacy
Database Routersclass MyAppRouter(object): """A router to control all database operations on models in the myapp application"""
def db_for_read(self, model, **hints): "Point all operations on myapp models to 'other'" if model._meta.app_label == 'myapp': return 'other' return None
def db_for_write(self, model, **hints): "Point all operations on myapp models to 'other'" if model._meta.app_label == 'myapp': return 'other' return None
Database Routers
DATABASE_ROUTERS = [‘path.to.MyAppRouter’,‘path.to.OtherRouter’]
settings.py
API: “using”
Author.objects.using(‘other’).all()
p = Person(name=‘Fred’)p.save(using=‘db0’)p.save(using=‘db1’)
And now, the real world...
MultiDB Meets
The Multi-Tenant Hydra
• SaaS
• Multiple accounts on a cluster.
• Each account with “substantial” data.
• Need to think about sharding.
Core DB
Account DB
Account DB
Account DB
Account DB
Account DB
Account DB
Account DB
Account DB
OMG It’s Full of Stars
Core vs. Account• Core DB
• Account Data
• User Data
• Immutable Data
• Account DBs
• Everything Else
More Requirements
• Easy, declarative specification of core vs account for any model.
• No explicit DB references in code.
• Must route account models to correct account db.
• Must support web and Celery task usage.
...and even worse:
• No a priori knowledge of account databases. Seriously.
Oh, boy...
The Hack Solution
Declarative Model Routing
class ApplicationSeat(models.Model): """ A seat for an application. """ app = models.ForeignKey(Application) user = models.ForeignKey(UserProfile,related_name='application_seats')
global_model = True
def __unicode__(self): return '%s - %s' % (unicode(self.app),unicode(self.user))
class Meta: unique_together = (('app','user'),)
Declarative Model Routing
class AccountRouter(object): """ Routes models to account database, unless they are flagged global. """ def _is_saaspire(self,model): return model.__module__.startswith('saaspire') def _global(self,model): return hasattr(model,'global_model') and model.global_model
Routing to Coreclass AccountRouter(object): # ... def db_for_read(self,model,**hints): """ Gets the db for read. """ from saaspire.accounts.utils import current_account if self._is_saaspire(model): if self._global(model): return 'default' else: self._init_dbs() ca = current_account() if ca: return ca.slug else: return None else: return None
Routing to Accountclass AccountRouter(object): # ... def db_for_read(self,model,**hints): """ Gets the db for read. """ from saaspire.accounts.utils import current_account if self._is_saaspire(model): if self._global(model): return 'default' else: self._init_dbs() ca = current_account() if ca: return ca.slug else: return None else: return None
What account am I using?
def current_account(): """ Gets the current account. """ from environment import env # Look for explicit account specification if hasattr(env,'account') and env.account: return env.account
# ... # Look for current user if hasattr(env,'user') and not env.user.is_anonymous(): profile = env.user.get_profile() if profile: return profile.account return None
Warning:The following scene contains threadlocals.
HTTP Request
Django-Environment
Environment Generators
View executes
Threadlocal is cleared.
Start
Stop
Values associated with
threadlocal "env"
Django-Environment
What account am I using?
from environment.standard import RequestAttributeGeneratorfrom saaspire.core.envgen import AccountGenerator, InstanceGenerator
entries = { 'user':RequestAttributeGenerator('user'), 'account':AccountGenerator(), 'instance':InstanceGenerator(),}
main.env
What account am I using?
class AccountGenerator(object): """ Populates the environment with the current account. """ def generate(self,key,request): """ Generates the variable. """ if not request.user.is_anonymous(): return request.user.get_profile().account else: return None
What account am I using?
def current_account(): """ Gets the current account. """ from environment import env # Look for explicit account specification if hasattr(env,'account') and env.account: return env.account
But what about Celery?
• Celery: An async message queue.
• No HTTP request/response cycle.
• Django-environment won’t work.
use_account() context manager
@periodic_task(run_every=timedelta(minutes=60))def run_proactive_service_subinferences(): for account in Account.objects.active(): with use_account(account.name): # etc...
use_account() context managerclass use_account(object): """ Context manager that sets a specific account. """ def __init__(self,account_name): from saaspire.accounts.models import Account self.account = Account.objects.get(Q(name=account_name) | Q(slug=account_name)) self.current_account = None def __enter__(self): """ Called at beginning of with block. """ from environment import env if hasattr(env,'account'): self.current_account = env.account env.account = self.account def __exit__(self,*args,**kwargs): """ Called at end of with block. """ from environment import env env.account = self.current_account
No a priori knowledge of account databases.
This is cached in memory.
DATABASES = { 'default': { 'NAME': 'app_data', 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'USER': 'postgres_user', 'PASSWORD': 's3krit' }, 'users': { 'NAME': 'user_data', 'ENGINE': 'django.db.backends.mysql', 'USER': 'mysql_user', 'PASSWORD': 'priv4te' }}
What we need.
There is no database "X".
Someone signs up for
Saaspire account "X".
Database "X" is automatically
created.
Account "X" traffic automatically
routed to DB "X".
DBMap
from saaspire.core.db import DBMap
DATABASES = DBMap() # Dictionary-like thingie
settings.py
{% rant %}
The way DATABASES works.
MultiDB-aware code looks for a dictionary
assigned to DATABASES setting.
MultiDB-aware code proceeds to cache
dictionary entries in it's own memory store.
MultiDB-aware code ignores the object at the
DATABASES setting, henceforth.
The way DATABASES should work.
MultiDB-aware code asks for DB definition from the DATABASE_DEFINITION_PROVIDER class.
If no provider has been specified, it falls back to the included default provider, which happens to
look for a dictionary under the DATABASES setting.
Memory caching is LEFT UP TO THE PROVIDER. MultiDB-aware code does NOT do
its own memory caching.
{% endrant %}
Right, back to DBMap
class DBMap(object):
def __init__(self, core_name='core', core_engine='django.db.backends.postgresql_psycopg2', core_user='', core_password='', core_host='localhost', core_port='5432'): self.data = {'default':{'NAME':core_name, 'ENGINE':core_engine, 'USER':core_user, 'PASSWORD':core_password, 'HOST':core_host, 'PORT':core_port}}
More DBMap def __getitem__(self,key): """ Dictionary accessor. """ if not key in self.data: self.load_db(key) return self.data[key]
def load_db(self,key): """ Loads the specific database info. """ from saaspire.accounts.models import Account try: account = Account.objects.get(slug=key) self.data[key] = {'NAME':account.db_name, 'ENGINE':account.db_engine, 'USER':account.db_user, 'PASSWORD':account.db_password, 'HOST':self.data['default']['HOST'], 'PORT':self.data['default']['PORT']} except Account.DoesNotExist: raise KeyError
Dynamic Creation of DBsclass Account(models.Model):
def create_database(self):
db_info = settings.DATABASES['default']
host = db_info['HOST'] cx_params = {'host':host} if db_info.get('USER',None): cx_params['user'] = db_info['USER'] if db_info.get('PASSWORD',None): cx_params['password'] = db_info['PASSWORD']
connection = psycopg2.connect(**cx_params) connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT) cursor = connection.cursor()
cursor.execute('CREATE DATABASE "%s" OWNER "%s"' % (slugify(self.db_name), slugify(self.db_user),))
# TO BE CONTINUED...
Dynamic Creation of DBs
class Account(models.Model): def create_database(self):
# ...THE CONTINUATION...
call_command('syncdb', database=self.db_name, interactive=False, verbosity=0)
settings.DATABASES.load_db(self.db_name) # Force DBMap to load this database reload(south.db) call_command('migrate', database=self.db_name, interactive=False, verbosity=0)
Questions?