pycon de 2013 - table partitioning with django

52
PyCon.DE 2013 1 / 52 Table partitioning with Django Max Tepkeev 17 October 2013 Cologne, Germany

Upload: max-tepkeev

Post on 23-Jan-2015

284 views

Category:

Software


0 download

DESCRIPTION

Table partitioning can be thought of as a division of one large table into several smaller tables which represent that original table. Table partitioning is "transparent", that means that in theory you don't need to change any code to work with partitioned tables. We will talk about table partitioning theory in general and implementations in different database servers. Why and when we need to do table partitioning. What problems we can face and how we can solve them. Django provides us with great database abstraction and ORM, but how can we use it with table partitioning ? We will talk about existing libraries for Django to work with table partitioning, their differences, which is the best (if any) and why.

TRANSCRIPT

Page 1: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 1 / 52

Table partitioningwith Django

Max Tepkeev17 October 2013Cologne, Germany

Page 2: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 2 / 52

Partitioning Theory

Django PackagesRealization

Page 3: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 3 / 52

Partitioning Theory

Page 4: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 4 / 52

Definition

Table partitioning - division of one table into several tables, called partitions, which still represent original table.

Page 5: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 5 / 52

Why

• Performance• Manageability• Availability

Page 6: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 6 / 52

When

• Tables greater than 2GB• Tables with historical data• Table need to be distributed across

different types of storage devices• Queries ALWAYS contain a filter on the

partition field

Page 7: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 7 / 52

Methods

• Horizontal partitioning• Vertical partitioning

Page 8: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 8 / 52

Strategies

• Range partitioning• List partitioning• Hash partitioning• Composite partitioning

Page 9: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 9 / 52

Strategies

Page 10: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 10 / 52

Strategies

Page 11: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 11 / 52

Example

id user_id entry added

1 345 Login 2013-08-22 17:24:43

2 345 Went to Store section 2013-08-22 17:25:01

3 345 Ordered a book 2013-08-22 17:33:28

4 345 Payed for a book 2013-08-22 17:35:54

5 345 Logout 2013-08-22 17:38:32

Page 12: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 12 / 52

Example

INSERT INTO user_actions (user_id, entry, added) VALUES (237, 'Login', '2013-08-21 11:54:08')

Goes to user_actions_y2013m08

INSERT INTO user_actions (user_id, entry, added) VALUES (198, 'Logout', '2013-09-01 08:43:42')

Goes to user_actions_y2013m09

Page 13: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 13 / 52

Example

SELECT * FROM user_actions

id user_id entry added

1 237 Login 2013-08-21 11:54:08

2 198 Logout 2013-09-01 08:43:42

Table partitioning is “transparent”. You don’t need to change your code to work with partitioned tables.

Page 14: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 14 / 52

Realization

Page 15: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 15 / 52

RDBMS

• PostgreSQL• MySQL

Page 16: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 16 / 52

PostgreSQL

Methods:• Horizontal partitioning

Strategies:• Range partitioning• List partitioning

Page 17: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 17 / 52

PostgreSQL

Implementation:• Inheritance

Available:• >= 8.1

Page 18: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 18 / 52

PostgreSQL

Steps:• Master table• Child tables• Correct partition insertion function• Trigger that calls partition insertion function• Function to delete duplicate rows from master• Trigger that calls delete duplicate rows function

Page 19: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 19 / 52

PostgreSQL

CREATE TABLE logs ( id serial, entry text NOT NULL, added timestamp(6) NOT NULL, CONSTRAINT logs_pkey PRIMARY KEY (id))

Master table:

Page 20: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 20 / 52

PostgreSQL

CREATE TABLE logs_y2013m08 ( CHECK ( added >= '2013-08-01 00:00:00'::timestamp AND added <= '2013-08-31 23:59:59'::timestamp )) INHERITS (logs);

Child table:

Page 21: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 21 / 52

PostgreSQL

CREATE FUNCTION "logs_insert_child"() RETURNS "trigger" AS $BODY$ DECLARE tablename TEXT; BEGIN tablename := 'logs_' || to_char(NEW.added, '"y"YYYY"m"MM'); EXECUTE 'INSERT INTO ' || tablename || ' VALUES (($1).*);' USING NEW; RETURN NEW; END; $BODY$LANGUAGE plpgsql;

Correct partition insertion function:

Page 22: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 22 / 52

PostgreSQL

CREATE TRIGGER "before_insert_logs_trigger"BEFORE INSERT ON "logs"FOR EACH ROW EXECUTE PROCEDURE

"logs_insert_child"();

Trigger that calls partition insertion function:

Page 23: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 23 / 52

PostgreSQL

CREATE FUNCTION "logs_delete_master"() RETURNS "trigger" AS $BODY$ BEGIN DELETE FROM ONLY logs WHERE id = NEW.id; RETURN NEW; END; $BODY$LANGUAGE plpgsql;

Function to delete duplicate rows from master:

Page 24: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 24 / 52

PostgreSQL

CREATE TRIGGER "after_insert_logs_trigger" AFTER INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_delete_master"();

Trigger that calls delete duplicate rows function:

Page 25: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 25 / 52

Code for automatic new partition creation

PostgreSQL

DECLARE start_date TIMESTAMP;start_date := date_trunc('month', NEW.added);

IF NOT EXISTS( SELECT relname FROM pg_class WHERE relname=tablename) THEN EXECUTE 'CREATE TABLE ' || tablename || ' ( CHECK ( added >= ''' || start_date || ''' AND added <= ''' || start_date + '1 month'::interval || ''' ) ) INHERITS ('logs');';END IF;

Page 26: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 26 / 52

MySQL

Methods:• Horizontal partitioning

Strategies:• Range partitioning• List partitioning• Hash partitioning• Composite partitioning

Page 27: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 27 / 52

MySQL

Implementation:• Native (PARTITION BY)

Available:• >= 5.1

Page 28: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 28 / 52

How that works

MySQL

CREATE TABLE members ( username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL)PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (2012), PARTITION p1 VALUES LESS THAN (2013), PARTITION p2 VALUES LESS THAN MAXVALUE);

Page 29: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 29 / 52

MySQL

Limitations

• From lowest to highest (range)• Foreign Key• No real-time partition creation

Page 30: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 30 / 52

Django Packages

Page 31: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 31 / 52

Packages

• django-parting• django-db-parti

Page 32: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 32 / 52

django-parting

RDBMS:• PostgreSQL

Page 33: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 33 / 52

django-parting

Features:• Partition tables with Foreign Keys

Page 34: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 34 / 52

django-parting

From pypi:

$ pip install django-parting

or clone from github:

$ git clone git://github.com/danfairs/django-parting.git

Page 35: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 35 / 52

django-parting

Add parting to PYTHONPATH and installed applications:

INSTALLED_APPS = ( ... 'parting' )

Page 36: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 36 / 52

django-partingfrom django.db import modelsfrom django.utils import timezone

class Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now())

class Star(models.Model): tweet = models.ForeignKey(Tweet) user = models.TextField()

Page 37: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 37 / 52

django-partingfrom django.utils import timezonefrom parting import PartitionManagerfrom dateutil.relativedelta import relativedelta

def _key_for_date(dt): return dt.strftime('%Y%m')

class TweetPartitionManager(PartitionManager):

def current_partition(self): return _key_for_date(timezone.now())

def next_partition(self): one_months_time = timezone.now() + relativedelta(months=1) return _key_for_date(one_months_time)

Page 38: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 38 / 52

django-partingclass Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now())

partitions = TweetPartitionManager()

class Meta: abstract = True

class Star(models.Model): tweet = models.PartitionForeignKey(Tweet) user = models.TextField()

partitions = TweetPartitionManager()

class Meta: abstract = True

Page 39: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 39 / 52

django-partingimport jsonfrom django.utils.timezone import make_aware, utc

tweet_data = { 'created_at': make_aware( datetime.datetime(2012, 12, 6, 14, 23), utc) 'json': json.dumps({'key': 'value'}), 'user': 'Jimmy'}

partition_key = _key_for_dt(tweet_data['created_at'])partition = Tweet.partitions.get_partition(partition_key)tweet = partition(**tweet_data)tweet.save()

Page 40: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 40 / 52

django-partingCREATE TABLE "testapp_tweet_2013_03" ( "id" integer NOT NULL PRIMARY KEY, "json" text NOT NULL, "created" datetime NOT NULL);

CREATE TABLE "testapp_star_2013_03" ( "id" integer NOT NULL PRIMARY KEY, "tweet_id" integer NOT NULL REFERENCES "testapp_tweet_2013_03" ("id"), "user" text NOT NULL);

Page 41: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 41 / 52

django-parting

Problems:• Not database-level partitioning• No django admin support• No active development

Page 42: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 42 / 52

django-db-parti

RDBMS:• MySQL• PostgreSQL

Page 43: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 43 / 52

django-db-parti

Features:• Real database-level partitioning• Automatic new partition creation in real-time• Django admin support

Page 44: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 44 / 52

django-db-parti

From pypi:

$ pip install django-db-parti

or clone from github:

$ git clone git://github.com/maxtepkeev/django-db-parti.git

Page 45: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 45 / 52

django-db-parti

Add dbparti to PYTHONPATH and installed applications:

INSTALLED_APPS = ( ... 'dbparti' )

Page 46: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 46 / 52

django-db-parti

In models.py add import statement:

from dbparti.models import Partitionable

Make your model to inherit from Partitionable:

class YourModelName(Partitionable):

Page 47: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 47 / 52

django-db-parti

Add a Meta class to your model with a few settings:

class Meta(Partitionable.Meta): partition_type = 'range' partition_subtype = 'date' partition_range = 'month' partition_column = 'added'

Lastly initialize some database stuff with the command:

$ python manage.py partition app_name

Page 48: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 48 / 52

django-db-parti

Possible model settings

partition_type:• range

partition_subtype:• date

partition_range:• day• week• month• year

Page 49: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 49 / 52

django-db-parti

Customize how data will be displayed in the Django admin

In admin.py add import statement:

from dbparti.admin import PartitionableAdmin

Make your admin to inherit from PartitionableAdmin:

class YourModelAdminName(PartitionableAdmin): partition_show = 'all'

Page 50: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 50 / 52

django-db-parti

Possible model admin settings

partition_show:• all (default)• current• previous

Page 51: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 51 / 52

django-db-parti

Problems:• Only range partitioning (datetime)• Database backend limitations

Page 52: PyCon DE 2013 - Table Partitioning with Django

PyCon.DE 2013 52 / 52

Question time

https://www.github.com/maxtepkeev/django-db-parti

email: [email protected]: max.tepkeev