advanced redis data structures

34
Advanced Redis data structures by Amir Salihefendic

Upload: amix3k

Post on 15-Jul-2015

7.916 views

Category:

Technology


1 download

TRANSCRIPT

Advanced Redis data structures

by Amir Salihefendic

About me

Founder

Millions of data items

Co-founder, former CTO

Billions of data items

Redis: Greatness

Everything is in memory, data is persistent

Amazing Performance

The Hacker’s database

Redis: Greatness

Great lead dev

Amazing progress (Sentinel, Cluster, …)

Redis Rich Datatypes• Relational databases

Schemas, tables, columns, rows, indexes etc.

• Column databases (BigTable, hBase etc.) Schemas, columns, column families, rows etc.

• Rediskey-value, sets, lists, hashes, bitmaps, etc.

Redis datatypes resemble datatypes in programming languages.

They are natural to us!

redis_wrap

A wrapper for Redis datatypes, so they mimic the datatypes found in

Python

https://github.com/Doist/redis_wrap

# Mimic of Python listsbears = get_list('bears')bears.append('grizzly')

assert len(bears) == 1assert 'grizzly' in bears

# Mimic of hashes villains = get_hash('villains')assert 'riddler' not in villains

villains['riddler'] = 'Edward Nigma'assert 'riddler' in villainsassert len(villains.keys()) == 1

del villains['riddler']assert len(villains) == 0

# Mimic of Python setsfishes = get_set('fishes')assert 'nemo' not in fishes

fishes.add('nemo')assert 'nemo' in fishes

for item in fishes: assert item == 'nemo'

redis_wrap: usage

redis_graph

A simple graph database in Python

https://github.com/Doist/redis_graph

# Adding an edge between nodesadd_edge(from_node='frodo', to_node='gandalf')assert has_edge(from_node='frodo', to_node='gandalf') == True # Getting neighbors of a nodeassert list(neighbors('frodo')) == ['gandalf']

# Deleting edgesdelete_edge(from_node='frodo', to_node='gandalf')

# Setting node valuesset_node_value('frodo', '1')assert get_node_value('frodo') == '1'

# Setting edge valuesset_edge_value('frodo_baggins', '2')assert get_edge_value('frodo_baggins') == '2'

redis_graph: Usage

redis_graph: Implementationfrom redis_wrap import *

#--- Edges ----------------------------------------------def add_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) edges.add( to_node )

def delete_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system )

key_node_y = to_node if key_node_y in edges: edges.remove( key_node_y )

def has_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) return to_node in edges

def neighbors(node_x, system='default'): return get_set( node_x, system=system )

#--- Node values ----------------------------def get_node_value(node_x, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).get( node_key )

def set_node_value(node_x, value, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).set( node_key, value )

#--- Edge values -----------------------------def get_edge_value(edge_x, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).get( edge_key )

def set_edge_value(edge_x, value, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).set( edge_key, value )

redis_simple_queue

A simple queue in Python using Redis

https://github.com/Doist/redis_simple_queue

redis_queue: usage

from redis_simple_queue import *

delete_jobs('tasks')

put_job('tasks', '42')

assert 'tasks' in get_all_queues()assert queue_stats('tasks')['queue_size'] == 1

assert reserve_job('tasks') == '42'assert queue_stats('tasks')['queue_size'] == 0

redis_queue: Implementation

from redis_wrap import *

def put(queue, job_data, system='default'): get_list(queue, system=system).append(job_data)

def reserve(queue, system='default'): return get_list(queue, system=system).pop()

def delete_jobs(queue, system='default'): get_redis(system).delete(queue)

def get_all_queues(system='default'): return get_redis(system).keys('*').split(' ')

def queue_stats(queue, system='default'): return { 'queue_size': len(get_list(queue)) }

Cohort/Retention Tracking

How bitmapist was born

bitmapist: The idea

MixPanel looks great!

bitmapist: Problem with MixPanel

MixPanel would cost $2000/USD++/month

bitmapist + bitmapist.cohort• Implements an advanced analytics library on top of Redis bitmaps

• https://github.com/Doist/bitmapist

•Hundreds of millions of events for Todoist

•O(1) execution

bitmapist: Features

•Has user 123 been online today? This week? •Has user 123 performed action "X"? •How many users have been active have this month? •How many unique users have performed action "X" this week? •How many % of users that were active last week are still active? •How many % of users that were active last month are still active this month?

•O(1)! Using very small amounts of memory.

bitmapist: Bitmaps?

• SETBIT, GETBIT, BITCOUNT, BITOP

• SETBIT somekey 8 1

•GETBIT somekey 8

•BITOP AND destkey somekey1 somekey2

• http://en.wikipedia.org/wiki/Bit_array

bitmapist: Usage# Mark user 123 as active and has played a songmark_event('active', 123)mark_event('song:played', 123)

# Answer if user 123 has been active this monthassert 123 in MonthEvents('active', now.year, now.month)assert 123 in MonthEvents('song:played', now.year, now.month)

# How many users have been active this week?print len(WeekEvents('active', now.year, now.isocalendar()[1]))

# Perform bit operations. How many users that# have been active last month are still active this month?active_2_months = BitOpAnd( MonthEvents('active', last_month.year, last_month.month), MonthEvents('active', now.year, now.month))print len(active_2_months)

bitmapist.cohort: Visualization

Read more http://amix.dk/blog/post/19718

fixedlist

How fixedlist was born

fixedlist: Problem

Timelines: Exponential data growth

fixedlist: The Easy Solution

Throw money at the problem

fixedlist: Cheating!

• Fixed timeline size •O(1) insertion •O(1) update •O(1) get • Cacheable

Solution that Facebook and Twitter use

fixedlist

2.5x faster than pure Redis solution

1.4x less memory than pure Redis solution

https://github.com/Doist/fixedlist

fixedlist: Usage# Add a value to a list fixedlist.add('hello', 'world')

# Add mutliple values to multiple keys at once fixedlist.add(['hello1', 'hello2'], ['world1', 'world2'])

# Get valuesfrom a list assert fixedlist.get('hello') == ['world', 'world1', 'world2']

# Remove a value fixedlist.remove('hello', 'world1')

Saved Plurk tens of thousands of $

Redis+Lua+Python

When you want:

More complex data types

Better performance

Redis+Python: Incr implementation

def incr_python(key, delta=1, system='default'): client, scripts = get_redis(system)

with client.pipeline() as p: p.watch(key) value = delta old = p.get(key) if old: value = int(old) + delta p.set(key, value) p.unwatch() return value

Redis+Lua: Incr implementation

scripts = { 'incr': client.register_script(_load_lua_script('incr.lua'))

} ...

def incr_lua(key, delta=1, system='default'): client, scripts = get_redis(system) return scripts['incr'](keys=['key', 'delta'], args=[key, delta])

local delta = tonumber(ARGV[2]) local value = delta local old = tonumber(redis.call('get', ARGV[1])) if old then value = value + old end if not redis.call('set', ARGV[1], value) then return nil end return value

Perfomance: Lua 3x fasterPythontime python test_incr_python.py 300000 python test_incr_python.py 300000 37.77s user 12.00s system 73% cpu 1:07.73 total

Luatime python test_incr_lua.py 300000 python test_incr_lua.py 300000 10.76s user 2.85s system 66% cpu 20.513 total

https://github.com/amix/demo-redis-python-lua

fixedlist in Lua

Proof of conceptTokyo Tyrant example

https://gist.github.com/amix/f15508ac6a8b534c3290

Q & A

More questions:

[email protected]

@amix3k