pycon de 2013 - table partitioning with django
DESCRIPTION
Table partitioning can be thought of as a division of one large table into several smaller tables which represent that original table. Table partitioning is "transparent", that means that in theory you don't need to change any code to work with partitioned tables. We will talk about table partitioning theory in general and implementations in different database servers. Why and when we need to do table partitioning. What problems we can face and how we can solve them. Django provides us with great database abstraction and ORM, but how can we use it with table partitioning ? We will talk about existing libraries for Django to work with table partitioning, their differences, which is the best (if any) and why.TRANSCRIPT
PyCon.DE 2013 1 / 52
Table partitioningwith Django
Max Tepkeev17 October 2013Cologne, Germany
PyCon.DE 2013 2 / 52
Partitioning Theory
Django PackagesRealization
PyCon.DE 2013 3 / 52
Partitioning Theory
PyCon.DE 2013 4 / 52
Definition
Table partitioning - division of one table into several tables, called partitions, which still represent original table.
PyCon.DE 2013 5 / 52
Why
• Performance• Manageability• Availability
PyCon.DE 2013 6 / 52
When
• Tables greater than 2GB• Tables with historical data• Table need to be distributed across
different types of storage devices• Queries ALWAYS contain a filter on the
partition field
PyCon.DE 2013 7 / 52
Methods
• Horizontal partitioning• Vertical partitioning
PyCon.DE 2013 8 / 52
Strategies
• Range partitioning• List partitioning• Hash partitioning• Composite partitioning
PyCon.DE 2013 9 / 52
Strategies
PyCon.DE 2013 10 / 52
Strategies
PyCon.DE 2013 11 / 52
Example
id user_id entry added
1 345 Login 2013-08-22 17:24:43
2 345 Went to Store section 2013-08-22 17:25:01
3 345 Ordered a book 2013-08-22 17:33:28
4 345 Payed for a book 2013-08-22 17:35:54
5 345 Logout 2013-08-22 17:38:32
PyCon.DE 2013 12 / 52
Example
INSERT INTO user_actions (user_id, entry, added) VALUES (237, 'Login', '2013-08-21 11:54:08')
Goes to user_actions_y2013m08
INSERT INTO user_actions (user_id, entry, added) VALUES (198, 'Logout', '2013-09-01 08:43:42')
Goes to user_actions_y2013m09
PyCon.DE 2013 13 / 52
Example
SELECT * FROM user_actions
id user_id entry added
1 237 Login 2013-08-21 11:54:08
2 198 Logout 2013-09-01 08:43:42
Table partitioning is “transparent”. You don’t need to change your code to work with partitioned tables.
PyCon.DE 2013 14 / 52
Realization
PyCon.DE 2013 15 / 52
RDBMS
• PostgreSQL• MySQL
PyCon.DE 2013 16 / 52
PostgreSQL
Methods:• Horizontal partitioning
Strategies:• Range partitioning• List partitioning
PyCon.DE 2013 17 / 52
PostgreSQL
Implementation:• Inheritance
Available:• >= 8.1
PyCon.DE 2013 18 / 52
PostgreSQL
Steps:• Master table• Child tables• Correct partition insertion function• Trigger that calls partition insertion function• Function to delete duplicate rows from master• Trigger that calls delete duplicate rows function
PyCon.DE 2013 19 / 52
PostgreSQL
CREATE TABLE logs ( id serial, entry text NOT NULL, added timestamp(6) NOT NULL, CONSTRAINT logs_pkey PRIMARY KEY (id))
Master table:
PyCon.DE 2013 20 / 52
PostgreSQL
CREATE TABLE logs_y2013m08 ( CHECK ( added >= '2013-08-01 00:00:00'::timestamp AND added <= '2013-08-31 23:59:59'::timestamp )) INHERITS (logs);
Child table:
PyCon.DE 2013 21 / 52
PostgreSQL
CREATE FUNCTION "logs_insert_child"() RETURNS "trigger" AS $BODY$ DECLARE tablename TEXT; BEGIN tablename := 'logs_' || to_char(NEW.added, '"y"YYYY"m"MM'); EXECUTE 'INSERT INTO ' || tablename || ' VALUES (($1).*);' USING NEW; RETURN NEW; END; $BODY$LANGUAGE plpgsql;
Correct partition insertion function:
PyCon.DE 2013 22 / 52
PostgreSQL
CREATE TRIGGER "before_insert_logs_trigger"BEFORE INSERT ON "logs"FOR EACH ROW EXECUTE PROCEDURE
"logs_insert_child"();
Trigger that calls partition insertion function:
PyCon.DE 2013 23 / 52
PostgreSQL
CREATE FUNCTION "logs_delete_master"() RETURNS "trigger" AS $BODY$ BEGIN DELETE FROM ONLY logs WHERE id = NEW.id; RETURN NEW; END; $BODY$LANGUAGE plpgsql;
Function to delete duplicate rows from master:
PyCon.DE 2013 24 / 52
PostgreSQL
CREATE TRIGGER "after_insert_logs_trigger" AFTER INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_delete_master"();
Trigger that calls delete duplicate rows function:
PyCon.DE 2013 25 / 52
Code for automatic new partition creation
PostgreSQL
DECLARE start_date TIMESTAMP;start_date := date_trunc('month', NEW.added);
IF NOT EXISTS( SELECT relname FROM pg_class WHERE relname=tablename) THEN EXECUTE 'CREATE TABLE ' || tablename || ' ( CHECK ( added >= ''' || start_date || ''' AND added <= ''' || start_date + '1 month'::interval || ''' ) ) INHERITS ('logs');';END IF;
PyCon.DE 2013 26 / 52
MySQL
Methods:• Horizontal partitioning
Strategies:• Range partitioning• List partitioning• Hash partitioning• Composite partitioning
PyCon.DE 2013 27 / 52
MySQL
Implementation:• Native (PARTITION BY)
Available:• >= 5.1
PyCon.DE 2013 28 / 52
How that works
MySQL
CREATE TABLE members ( username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL)PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (2012), PARTITION p1 VALUES LESS THAN (2013), PARTITION p2 VALUES LESS THAN MAXVALUE);
PyCon.DE 2013 29 / 52
MySQL
Limitations
• From lowest to highest (range)• Foreign Key• No real-time partition creation
PyCon.DE 2013 30 / 52
Django Packages
PyCon.DE 2013 31 / 52
Packages
• django-parting• django-db-parti
PyCon.DE 2013 32 / 52
django-parting
RDBMS:• PostgreSQL
PyCon.DE 2013 33 / 52
django-parting
Features:• Partition tables with Foreign Keys
PyCon.DE 2013 34 / 52
django-parting
From pypi:
$ pip install django-parting
or clone from github:
$ git clone git://github.com/danfairs/django-parting.git
PyCon.DE 2013 35 / 52
django-parting
Add parting to PYTHONPATH and installed applications:
INSTALLED_APPS = ( ... 'parting' )
PyCon.DE 2013 36 / 52
django-partingfrom django.db import modelsfrom django.utils import timezone
class Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now())
class Star(models.Model): tweet = models.ForeignKey(Tweet) user = models.TextField()
PyCon.DE 2013 37 / 52
django-partingfrom django.utils import timezonefrom parting import PartitionManagerfrom dateutil.relativedelta import relativedelta
def _key_for_date(dt): return dt.strftime('%Y%m')
class TweetPartitionManager(PartitionManager):
def current_partition(self): return _key_for_date(timezone.now())
def next_partition(self): one_months_time = timezone.now() + relativedelta(months=1) return _key_for_date(one_months_time)
PyCon.DE 2013 38 / 52
django-partingclass Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now())
partitions = TweetPartitionManager()
class Meta: abstract = True
class Star(models.Model): tweet = models.PartitionForeignKey(Tweet) user = models.TextField()
partitions = TweetPartitionManager()
class Meta: abstract = True
PyCon.DE 2013 39 / 52
django-partingimport jsonfrom django.utils.timezone import make_aware, utc
tweet_data = { 'created_at': make_aware( datetime.datetime(2012, 12, 6, 14, 23), utc) 'json': json.dumps({'key': 'value'}), 'user': 'Jimmy'}
partition_key = _key_for_dt(tweet_data['created_at'])partition = Tweet.partitions.get_partition(partition_key)tweet = partition(**tweet_data)tweet.save()
PyCon.DE 2013 40 / 52
django-partingCREATE TABLE "testapp_tweet_2013_03" ( "id" integer NOT NULL PRIMARY KEY, "json" text NOT NULL, "created" datetime NOT NULL);
CREATE TABLE "testapp_star_2013_03" ( "id" integer NOT NULL PRIMARY KEY, "tweet_id" integer NOT NULL REFERENCES "testapp_tweet_2013_03" ("id"), "user" text NOT NULL);
PyCon.DE 2013 41 / 52
django-parting
Problems:• Not database-level partitioning• No django admin support• No active development
PyCon.DE 2013 42 / 52
django-db-parti
RDBMS:• MySQL• PostgreSQL
PyCon.DE 2013 43 / 52
django-db-parti
Features:• Real database-level partitioning• Automatic new partition creation in real-time• Django admin support
PyCon.DE 2013 44 / 52
django-db-parti
From pypi:
$ pip install django-db-parti
or clone from github:
$ git clone git://github.com/maxtepkeev/django-db-parti.git
PyCon.DE 2013 45 / 52
django-db-parti
Add dbparti to PYTHONPATH and installed applications:
INSTALLED_APPS = ( ... 'dbparti' )
PyCon.DE 2013 46 / 52
django-db-parti
In models.py add import statement:
from dbparti.models import Partitionable
Make your model to inherit from Partitionable:
class YourModelName(Partitionable):
PyCon.DE 2013 47 / 52
django-db-parti
Add a Meta class to your model with a few settings:
class Meta(Partitionable.Meta): partition_type = 'range' partition_subtype = 'date' partition_range = 'month' partition_column = 'added'
Lastly initialize some database stuff with the command:
$ python manage.py partition app_name
PyCon.DE 2013 48 / 52
django-db-parti
Possible model settings
partition_type:• range
partition_subtype:• date
partition_range:• day• week• month• year
PyCon.DE 2013 49 / 52
django-db-parti
Customize how data will be displayed in the Django admin
In admin.py add import statement:
from dbparti.admin import PartitionableAdmin
Make your admin to inherit from PartitionableAdmin:
class YourModelAdminName(PartitionableAdmin): partition_show = 'all'
PyCon.DE 2013 50 / 52
django-db-parti
Possible model admin settings
partition_show:• all (default)• current• previous
PyCon.DE 2013 51 / 52
django-db-parti
Problems:• Only range partitioning (datetime)• Database backend limitations
PyCon.DE 2013 52 / 52
Question time
https://www.github.com/maxtepkeev/django-db-parti
email: [email protected]: max.tepkeev