caching techniques in python, europython2010

Post on 12-May-2015

2.645 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides from europython2010 conference in Birmingham on the subject of caching in python.

TRANSCRIPT

Caching techinques in python

Michael Domanskieuropython 2010

czwartek, 22 lipca 2010

who I am

• python developer, professionally for a few years now

• experienced also in c and objective-c

• currently working for 10clouds.com

czwartek, 22 lipca 2010

Interesting intro

• a bit of theory

• common patterns

• common problems

• common solutions

czwartek, 22 lipca 2010

How I think about cache

• imagine a giant dict storing all your data

• you have to manage all data manually

• or provide some automated behaviour

czwartek, 22 lipca 2010

similar to....

• manual memory managment in c

• cache is memory

• and you have to controll it manually

czwartek, 22 lipca 2010

profits

• improved performance

• ...?

czwartek, 22 lipca 2010

problems

• managing any type of memory is hard

• automation often have to be done custom each time

czwartek, 22 lipca 2010

common patterns

czwartek, 22 lipca 2010

memoization

czwartek, 22 lipca 2010

• very old pattern (circa 1968)

• we own the name to Donald Mitchie

czwartek, 22 lipca 2010

• we assosciate input with output, and store in somewhere

• based on the assumption that for a given input, output is always the same

how it works

czwartek, 22 lipca 2010

code example

CACHE_DICT = {}

def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper

czwartek, 22 lipca 2010

what if output can change?

• our pattern is still usefull

• we simply need to add something

czwartek, 22 lipca 2010

cache invalidation

czwartek, 22 lipca 2010

There are only two hard problems in Computer Science: cache invalidation and naming things

Phil Karlton

czwartek, 22 lipca 2010

• basically, we update data in cache

• we need to know when and what to change

• the more granular you want to be, the harder it gets

czwartek, 22 lipca 2010

def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key

code example

czwartek, 22 lipca 2010

common problems

czwartek, 22 lipca 2010

invalidating too much/not enough

• flushing all data any time something changes

• not flushing cache at all

• tragic effects

czwartek, 22 lipca 2010

@cached('key1')def simple_function1(): return db_get(id=1)

@cached('key2')def simple_function2(): return db_get(id=2)

# SUPPOSE THIS IS IN ANOTHER MODULE

@cached('big_key1')def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ def inner_workings(): db_set(1, 'something totally new') ####### ## imagine 100 lines of code here :) ###### inner_workings()

return [simple_function1(),simple_function2()]

if __name__ == '__main__': simple_function1() simple_function2() a,b = some_bigger_function() assert a == db_get(id=1), "this fails because we didn't invalidated cache properly"

czwartek, 22 lipca 2010

invalidating too soon/too late

• your cache have to be synchronised to you db

• sometimes very hard to spot

• leads to tragic mistakes

czwartek, 22 lipca 2010

@cached('key1')def simple_function1(): return db_get(id=1)

@cached('key2')def simple_function2(): return db_get(id=2)

# SUPPOSE THIS IS IN ANOTHER MODULE

def some_bigger_function(): db_set(1, 'something') value = simple_function1() db_set(2, 'something else') #### now we know we used 2 cached functions so.... invalidate('key1') invalidate('key2') #### now we know we are safe, but for a price return simple_function2()

if __name__ == '__main__': some_bigger_function()

czwartek, 22 lipca 2010

superposition of dependancy

• somehow less obvious problem

• eventually you will start caching effects of computation

• you have to know very preciselly of what your data is dependant

czwartek, 22 lipca 2010

@cached('key1')def simple_function1(): return db_get(id=1)

@cached('key2')def simple_function2(): return db_get(id=2)

# SUPPOSE THIS IS IN ANOTHER MODULE

@cached('key')def some_bigger_function():

return { '1': simple_function1(), '2': simple_function2(), '3': db_get(id=3) }

if __name__ == '__main__': simple_function1() # somewhere else db_set(1, 'foobar') # and again db_set(3, 'bazbar') invalidate('key') # ooops, we forgot something data = some_bigger_function() assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the keys"

czwartek, 22 lipca 2010

summing up

• know your data....

• be aware what and when you cache

• take care when using cached data in computation

czwartek, 22 lipca 2010

common solutions

czwartek, 22 lipca 2010

process level cache

czwartek, 22 lipca 2010

why?

• very fast access

• simple to implement

• very effective as long as you’re using single process

czwartek, 22 lipca 2010

clever tricks with dicts

czwartek, 22 lipca 2010

code example

CACHE_DICT = {}

def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper

czwartek, 22 lipca 2010

invalidation

czwartek, 22 lipca 2010

def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key

code example

czwartek, 22 lipca 2010

application level cache

czwartek, 22 lipca 2010

memcache

czwartek, 22 lipca 2010

• battle tested

• scales

• fast

• supports a few cool features

• behaves a lot like dict

• supports time-based expiration

czwartek, 22 lipca 2010

• python-memcache

• python-libmemcache

• python-cmemcache

• pylibmc

libraries?

czwartek, 22 lipca 2010

why no benchmarks

• not the point of this talk :)

• benchmarks are generic, caching is specific

• pick your flavour, think for yourself

czwartek, 22 lipca 2010

cache = memcache.Client(['localhost:11211'])

def memcached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): value = cache.get(str(key)) if not value: value = func(*args, **kwargs) cache.set(str(key), value) return value return arg_wrapper return func_wrapper

code example

czwartek, 22 lipca 2010

invalidation

czwartek, 22 lipca 2010

def mem_invalidate(key): cache.set(str(key), None)

code example

czwartek, 22 lipca 2010

batch key managment

czwartek, 22 lipca 2010

• what if I don’t want to expire each key manually

• that’s a lot to remember

• and we have to be carefull :(

czwartek, 22 lipca 2010

groups?

• group keys into sets

• which are tied to one key per set

• expire one key, instead of twenty

czwartek, 22 lipca 2010

how to get there?

• store some extra data

• you can store dicts in cache

• and cache behaves like dict

• so it’s a case of comparing keys and values

czwartek, 22 lipca 2010

#we start with specified key and groupkey='some_key'group='some_group'

# now retrieve some data from memcacheddata=memcached_client.get_multi(key, group)# now data is a dict that should look like #{'some_key' :{'group_key' : '1234',# 'value' : 'some_value' },# 'some_group' : '1234'}#if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value']

czwartek, 22 lipca 2010

def cached(key, group_key='', exp_time=0 ):

# we don't want to mix time based and event based expiration models if group_key : assert exp_time==0, "can't set expiration time for grouped keys" def f_wrapper(func): def arg_wrapper(*args, **kwargs): value = None if group_key: data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)]) data_dict = data.get(tools.make_key(key)) if data_dict: value = data_dict['value'] group_value = data_dict['group_value'] if group_value != data[tools.make_key(group_key)]: value = None else: value = cache.get(key) if not value: value = func(*args, **kwargs) if exp_time: cache.set(tools.make_key(key), value, exp_time) elif not group_key: cache.set(tools.make_key(key), value) else: # exp_time not set and we have group_keys group_value = make_group_value(group_key) data_dict = { 'value':value, 'group_value': group_value} cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value }) return value arg_wrapper.__name__ = func.__name__ return arg_wrapper return f_wrapper

czwartek, 22 lipca 2010

questions?

czwartek, 22 lipca 2010

follow me

twitter: mdomansblog: blog.mdomans.com

czwartek, 22 lipca 2010

top related