advanced python, part 2
TRANSCRIPT
© 2014 Zaar Hai tech.zarmory.com
More topics in Advanced Python● Generators● Async programming
© 2014 Zaar Hai tech.zarmory.com
© 2014 Zaar Hai tech.zarmory.com2
Appetizer – Slots vs Dictionaries
(Almost) every python object has built-in __dict__ dictionaryIt can be memory wasteful for numerous objects having only small amount of attributesclass A(object):
pass
class B(object):__slots__ = ["a","b"]
>>> A().c = 1>>> B().c = 1Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'B' object has no attribute 'c'
Slots come to save memory (and CPU)But do they really?
© 2014 Zaar Hai tech.zarmory.com3
Slots vs Dictionaries - competitors
class A(object): # __slots__ = ["a", "b", "c"] def __init__(self): self.a = "foot" self.b = 2 self.c = True
l = []for i in xrange(50000000): l.append(A())
import resourceprint resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
© 2014 Zaar Hai tech.zarmory.com4
Slots vs Dictionaries – memory
1000 10000 100000 10000000
50
100
150
200
250
300
350
400
Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict
Pypy dict
Objects
Me
mo
ry -
me
ga
byt
es
© 2014 Zaar Hai tech.zarmory.com5
Slots vs Dictionaries – MEMORY
1000 10000 100000 1000000 10000000 500000000
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict Pypy dict
Objects
Me
mo
ry -
me
ga
byt
es
© 2014 Zaar Hai tech.zarmory.com6
Slots vs Dictionaries – cpu
1000 10000 100000 10000000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict Pypy dict
Objects
Tim
e -
se
con
ds
© 2014 Zaar Hai tech.zarmory.com7
Slots vs Dictionaries – CPU
1000 10000 100000 1000000 10000000 500000000
10
20
30
40
50
60
70
Py 2.7 slots Py 3.4 slots Pypy slotsPy 2.7 dict Py 3.4 dict Pypy dict
Objects
Tim
e -
se
con
ds
© 2014 Zaar Hai tech.zarmory.com8
Slots vs Dictionaries - conclusions
Slots vs dicts – and the winner is... PyPy
Seriously – forget the slots, and just move to PyPy if
performance becomes an issue. As a bonus you get
performance improvements in other areas
Most important – run your micro benchmarks before jumping
into new stuff
© 2014 Zaar Hai tech.zarmory.com10
The magic yield statement
A function becomes a generator if it contains yield statementdef gen(): yield 1 yield 2
When invoked - “nothing” happens. i.e. function code does not run yet>>> g = gen()>>> g<generator object gen at 0x7f423b1b3f00>
next() method runs function until next yield statement and returns yielded value>>> g.next()1>>> g.next()2>>> g.next()Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
© 2014 Zaar Hai tech.zarmory.com11
Generator exceptions
>>> for i in gen():... print i... 12
StopIteration is raised when generator is exhaustedfor statement catches StopIteration automagically
If generator function raises exception, generator stops
def gen2(): yield 1 raise ValueError yield 2
>>> g = gen2()>>> g.next()1>>> g.next()Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in gen2ValueError>>> g.next()Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
© 2014 Zaar Hai tech.zarmory.com12
Stopping generator prematurely
In the above example connection will never be closed. Fix:
def producer(): conn = db.connection() for row in conn.execute("SELECT * FROM t LIMIT 1000") yield row conn.close()
def consumer(): rows = producer() print "First row %s" % rows.next()
def producer(): conn = db.connection()
try: for row in conn.execute("SELECT * FROM t LIMIT 1000") yield row finally: conn.close()
def consumer(): rows = producer() print "First row %s" % rows.next() rows.close() # Will raise GeneratorExit in producer code
© 2014 Zaar Hai tech.zarmory.com13
Syntactic sugar
Most of us use generators without even knowing about them>>> [i for i in [1,2,3]][1, 2, 3]
However there is generator inside […] above>>> ( i for i in [1,2,3] )<generator object <genexpr> at 0x7f423b1b3f00>
list's constructor detects that input argument is a sequence and iterates through it to create itself
More goodies:>>> [i for i in range(6, 100) if i % 6 == i % 7 ][42, 43, 44, 45, 46, 47, 84, 85, 86, 87, 88, 89]
© 2014 Zaar Hai tech.zarmory.com14
Generators produce stuff on demand
Writing Fibonacci series generator is a piece of cake:def fibogen(): a,b = 0,1 yield a yield b while True: a, b = b, a + b yield b
No recursionO(1) memoryGenerates as much as you want to consume
© 2014 Zaar Hai tech.zarmory.com15
Returning value from a generator
Only None can be returned from generator until Python 3.3Since 3.3 you can:def gen(): yield 1 yield 2 return 3
>>> g=gen()>>> next(g)1>>> next(g)2>>> try:... next(g)... except StopIteration as e:... print(e.value)... 3
In earlier versions:class Return(Exception): def __init__(self, value): self.value = value
Then raise it from generator and catch outside
© 2014 Zaar Hai tech.zarmory.com16
Consumer generator
You can send stuff back to generatordef db_stream(): conn = db.connection() try:
while True:try:
row = yield conn.execute("INSERT INTO t VALUES(%s)", row)
except ConnCommit:conn.commit()
except ConnRollBack:conn.rollback
except GeneratorExit:conn.commit()
finally:conn.close()
>>> g = db_stream()>>> g.send([1])>>> g.throw(ConnCommit)>>> g.close()
© 2014 Zaar Hai tech.zarmory.com18
Async in the nutshell
Technion CS “Introduction to Operating Systems”, HW 2Setup:import socket, select, timefrom collections import defaultdict, deque
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)sock.bind(("", 1234)); sock.listen(20000); sock.setblocking(0)
rqueue = set([sock]);wqueue = set()pending = defaultdict(deque)
© 2014 Zaar Hai tech.zarmory.com19
Async in the nutshell – event loop
Technion CS “Introduction to Operating Systems”, HW 2while True: rq, wq, _ = select.select(rqueue, wqueue, []) for s in rq: if s == sock: new_sock, _ = sock.accept() new_sock.setblocking(0) rqueue.add(new_sock) continue data = s.recv(1024) if not data: s.close() rqueue.remove(s) else: pending[s].append(data) wqueue.add(s) for s in wq: if not pending[s]: wqueue.remove(s) continue data = pending[s].popleft() sent = s.send(data) if sent != len(data): data = data[sent:] pending[s].appendleft(data)
© 2014 Zaar Hai tech.zarmory.com20
Why bother with async?
Less memory resourcesStack memory allocated for each spawned thread. 2Mb on x86 LinuxFor a server to handle 10k connection – 20Gb of memory required just for starters!
Less CPU resourcesContext switching 10k threads is expensiveAsync moves switching logic for OS / interpreter level to application level – which is always more efficient
© 2014 Zaar Hai tech.zarmory.com21
C10k problem
The art of managing large amount of connections
Why is that a problem? - long polling / websockets
With modern live web applications, each client / browser
holds an open connection to the server
Gmail has 425 million active users
I.e. gmail servers have to handle ~400 million active
connections at any given time
© 2014 Zaar Hai tech.zarmory.com22
Concurrency vs Parallelism
ConcurrencyDealing with several tasks simultaneouslyBut with one task a timeAll Intel processors up to Pentium were concurrent
ParallelismDealing with several tasks simultaneouslyBut with several tasks at any given timeAll Intel processors since Pentium can execute more then one instruction per clock cycle
(C)Python is always concurrentEither with threads or with async approach
© 2014 Zaar Hai tech.zarmory.com23
Thread abuse
Naive approach – spawn a thread for every tiny task:Resource waste
Burden on OS / Interpreter
Good single-thread code can saturate a single coreUsually you don't need more then 1 thread / process per CPUIn web word
Your application need to scale beyond single machineI.e. you'll have to run in multiple isolated processes anyway
© 2014 Zaar Hai tech.zarmory.com24
Explicit vs Implicit context switching
Implicit context switchingOS / Interpreter decides when to switchCoder needs to assume he can use control any timeSynchronization required – mutexes, etc
Explicit context switchingCoder decides when give up execution controlNo synchronization primitives required!
© 2014 Zaar Hai tech.zarmory.com25
Explicit vs Implicit context switching
def transfer(acc_f, acc_t, sum): acc_f.lock() if acc_f.balance > sum: acc_f.balance -= sum acc_t.balance += sum acc_f.commit_balance() acc_t.commit_balance() acc_f.release()
def transfer(acc_f, acc_t, sum): if acc_f.balance > sum: acc_f.balance -= sum acc_t.balance += sum yield acc_f.commit_balance() yield acc_t.deposit(sum)
Threads Explicit Async
© 2014 Zaar Hai tech.zarmory.com26
Practical approach
Traditionally, async approach was implemented through callbacksIn JavaScript it can get as nasty as this:button.on("click", function() { JQuery.ajax("http://...", {
success: function(data) { // do something } }}
Thankfully, Python's support for anonymous functions is not that good
© 2014 Zaar Hai tech.zarmory.com27
Back to fun – Async frameworks in python
ExplicitTornadoTwistedTulip – part of Python standard lib since 3.4
ImplicitGevent (for python < 3)
© 2014 Zaar Hai tech.zarmory.com28
Tornado Hello World
import tornado.ioloopimport tornado.web
class MainHandler(tornado.web.RequestHandler): def get(self): self.write("Hello, world")
application = tornado.web.Application([ (r"/", MainHandler),])
if __name__ == "__main__": application.listen(8888) tornado.ioloop.IOLoop.instance().start()
So far everything is synchronous
© 2014 Zaar Hai tech.zarmory.com29
Tornado + database = async magic
from tornado.get import coroutinefrom momoko.connections import Pool
db = Pool(host=...)class MainHandler(tornado.web.RequestHandler):
@coroutine def get(self): cursor = yield db.execute("SELECT * FROM greetings") for row in cursor.fetchall() self.write(str(row)) self.finish()
© 2014 Zaar Hai tech.zarmory.com30
Demystifying the magic
Future – proxy to an object that will be available laterAKA “promise” in JavaScript, “deferred” in TwistedTraditional thread-related usage:
future = r.invoke("model_get")res = future.get_result()
future = Future()new_thread({ r = _invoke(...) future.set_result(r)})return future
© 2014 Zaar Hai tech.zarmory.com31
Futures in async
@coroutinedef get(self): rows = yield db.execute(...)
def coroutine(func): def wrapper(func): gen = func() future = gen.next() Runner(gen, future) return wrapper
from tornado import IOloopclass Runner(object): def __init__(self, gen, future): self.iploop = IOloop.instance() self.gen = gen self.future = future self.handle_yield()
def run(self): value = future.result() next_future = self.gen.send(value) # check StopIteration self.future = next_future self.handle_yield():
def handle_yield(self): if self.future.done(): self.run() else: self.ioloop.add_future( future, cb=self.run)
© 2014 Zaar Hai tech.zarmory.com32
Now the magical db.execute(...)
class Connection(object): def __init__(self, host=...): self.sock = …
def execute(self, query): self.future = Future() self.query = query self.ioloop.add_handler(self.sock, self.handle_write, IOloop.WRITE) return self.future
def handle_write(self): self.sock.write(query) self.ioloop.add_handler(self.sock, self.handle_read, IOloop.READ) def handle_read(self): rows = self.sock.read() self.future.set_result(rows)
© 2014 Zaar Hai tech.zarmory.com33
Writing async-ready libraries
You have a library that uses, lets say, socketsYou want to make it async compatibleTwo options:
Either choose which ioloop implementation you use (Tornado IOLoop, Python 3.4 Tuplip, etc). But its hard choice, limiting your usersImplementing library in a poll-able way. This way it can be plugged into any ioloop.
© 2014 Zaar Hai tech.zarmory.com34
(dumb) Pollable example: psycopg2 async mode
The following example is dumb, because it uses async in a sync way. But it demonstrates the principlefrom psycopg2.extensions import POLL_OK, POLL_WRITE, POLL_READ
def wait(conn): while 1: state = conn.poll() if state == POLL_OK: break elif state == POLL_WRITE: select.select([], [conn.fileno()], []) elif state == POLL_READ: select.select([conn.fileno()], [], []) else: raise psycopg2.OperationalError("...")
>>> aconn = psycopg2.connect(database='test', async=1)>>> wait(aconn)>>> acurs = aconn.cursor()>>> acurs.execute("SELECT pg_sleep(5); SELECT 42;")>>> wait(acurs.connection)>>> acurs.fetchone()[0]42
© 2014 Zaar Hai tech.zarmory.com35
Pollable example – the goal
class POLL_BASE(object): passclass POLL_OK(POLL_BASE): passclass POLL_READ(POLL_BASE): passclass POLL_WRITE(POLL_BASE): pass
class Connection(object): …
conn = Connection(host, port, …)conn.read(10)wait(conn) # poll, poll, pollprint "Received: %s" % conn.buff
© 2014 Zaar Hai tech.zarmory.com36
Pollable example - implementation
class POLL_BASE(object): passclass POLL_OK(POLL_BASE): passclass POLL_READ(POLL_BASE): passclass POLL_WRITE(POLL_BASE): pass
class Connection(object): def __init__(self, …): self.async_queue = deque()
def _read(self, total): buff = [] left = total while left: yield POLL_READ data = self.sock.recv(left) left -= len(data) buff.append(data) raise Return("".join(buff))
def _read_to_buff(self, total): self.buff = yield self._read(total)
def read(self, total): self.async_queue.append(self._read_to_buff(total))
© 2014 Zaar Hai tech.zarmory.com37
Pollable example – implementation cont
def poll(self, value=None): try: if value: value = self.async_queue[0].send(value) else: # Because we can't send non-None values to not started gens value = next(self.async_queue[0]) except (Return, StopIteration) as err: value = getattr(err, "value", None) self.async_queue.popleft()
if not len(self.async_queue): return POLL_OK # All generators are done - operation finished
if value in (POLL_READ, POLL_WRITE): return value # Need to wait for socket
if isinstance(value, types.GeneratorType): self.async_queue.appendleft(value) return self.poll() # Continue "pulling" next generator
# Pass return value to previous (caller) generator return self.poll(value)