advanced python, part 2

38
© 2014 Zaar Hai tech.zarmory.com More topics in Advanced Python Generators Async programming © 2014 Zaar Hai tech.zarmory.com

Upload: zaar-hai

Post on 15-Jul-2015

194 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com

More topics in Advanced Python● Generators● Async programming

© 2014 Zaar Hai tech.zarmory.com

Page 2: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com2

Appetizer – Slots vs Dictionaries

(Almost) every python object has built-in __dict__ dictionaryIt can be memory wasteful for numerous objects having only small amount of attributesclass A(object):

pass

class B(object):__slots__ = ["a","b"]

>>> A().c = 1>>> B().c = 1Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'B' object has no attribute 'c'

Slots come to save memory (and CPU)But do they really?

Page 3: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com3

Slots vs Dictionaries - competitors

class A(object): # __slots__ = ["a", "b", "c"] def __init__(self): self.a = "foot" self.b = 2 self.c = True

l = []for i in xrange(50000000): l.append(A())

import resourceprint resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

Page 4: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com4

Slots vs Dictionaries – memory

1000 10000 100000 10000000

50

100

150

200

250

300

350

400

Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict

Pypy dict

Objects

Me

mo

ry -

me

ga

byt

es

Page 5: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com5

Slots vs Dictionaries – MEMORY

1000 10000 100000 1000000 10000000 500000000

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict Pypy dict

Objects

Me

mo

ry -

me

ga

byt

es

Page 6: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com6

Slots vs Dictionaries – cpu

1000 10000 100000 10000000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict Pypy dict

Objects

Tim

e -

se

con

ds

Page 7: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com7

Slots vs Dictionaries – CPU

1000 10000 100000 1000000 10000000 500000000

10

20

30

40

50

60

70

Py 2.7 slots Py 3.4 slots Pypy slotsPy 2.7 dict Py 3.4 dict Pypy dict

Objects

Tim

e -

se

con

ds

Page 8: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com8

Slots vs Dictionaries - conclusions

Slots vs dicts – and the winner is... PyPy

Seriously – forget the slots, and just move to PyPy if

performance becomes an issue. As a bonus you get

performance improvements in other areas

Most important – run your micro benchmarks before jumping

into new stuff

Page 9: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com

Generators

Page 10: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com10

The magic yield statement

A function becomes a generator if it contains yield statementdef gen(): yield 1 yield 2

When invoked - “nothing” happens. i.e. function code does not run yet>>> g = gen()>>> g<generator object gen at 0x7f423b1b3f00>

next() method runs function until next yield statement and returns yielded value>>> g.next()1>>> g.next()2>>> g.next()Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration

Page 11: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com11

Generator exceptions

>>> for i in gen():... print i... 12

StopIteration is raised when generator is exhaustedfor statement catches StopIteration automagically

If generator function raises exception, generator stops

def gen2(): yield 1 raise ValueError yield 2

>>> g = gen2()>>> g.next()1>>> g.next()Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in gen2ValueError>>> g.next()Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration

Page 12: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com12

Stopping generator prematurely

In the above example connection will never be closed. Fix:

def producer(): conn = db.connection() for row in conn.execute("SELECT * FROM t LIMIT 1000") yield row conn.close()

def consumer(): rows = producer() print "First row %s" % rows.next()

def producer(): conn = db.connection()

try: for row in conn.execute("SELECT * FROM t LIMIT 1000") yield row finally: conn.close()

def consumer(): rows = producer() print "First row %s" % rows.next() rows.close() # Will raise GeneratorExit in producer code

Page 13: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com13

Syntactic sugar

Most of us use generators without even knowing about them>>> [i for i in [1,2,3]][1, 2, 3]

However there is generator inside […] above>>> ( i for i in [1,2,3] )<generator object <genexpr> at 0x7f423b1b3f00>

list's constructor detects that input argument is a sequence and iterates through it to create itself

More goodies:>>> [i for i in range(6, 100) if i % 6 == i % 7 ][42, 43, 44, 45, 46, 47, 84, 85, 86, 87, 88, 89]

Page 14: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com14

Generators produce stuff on demand

Writing Fibonacci series generator is a piece of cake:def fibogen(): a,b = 0,1 yield a yield b while True: a, b = b, a + b yield b

No recursionO(1) memoryGenerates as much as you want to consume

Page 15: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com15

Returning value from a generator

Only None can be returned from generator until Python 3.3Since 3.3 you can:def gen(): yield 1 yield 2 return 3

>>> g=gen()>>> next(g)1>>> next(g)2>>> try:... next(g)... except StopIteration as e:... print(e.value)... 3

In earlier versions:class Return(Exception): def __init__(self, value): self.value = value

Then raise it from generator and catch outside

Page 16: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com16

Consumer generator

You can send stuff back to generatordef db_stream(): conn = db.connection() try:

while True:try:

row = yield conn.execute("INSERT INTO t VALUES(%s)", row)

except ConnCommit:conn.commit()

except ConnRollBack:conn.rollback

except GeneratorExit:conn.commit()

finally:conn.close()

>>> g = db_stream()>>> g.send([1])>>> g.throw(ConnCommit)>>> g.close()

Page 17: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com

Async programming approach

Page 18: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com18

Async in the nutshell

Technion CS “Introduction to Operating Systems”, HW 2Setup:import socket, select, timefrom collections import defaultdict, deque

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)sock.bind(("", 1234)); sock.listen(20000); sock.setblocking(0)

rqueue = set([sock]);wqueue = set()pending = defaultdict(deque)

Page 19: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com19

Async in the nutshell – event loop

Technion CS “Introduction to Operating Systems”, HW 2while True: rq, wq, _ = select.select(rqueue, wqueue, []) for s in rq: if s == sock: new_sock, _ = sock.accept() new_sock.setblocking(0) rqueue.add(new_sock) continue data = s.recv(1024) if not data: s.close() rqueue.remove(s) else: pending[s].append(data) wqueue.add(s) for s in wq: if not pending[s]: wqueue.remove(s) continue data = pending[s].popleft() sent = s.send(data) if sent != len(data): data = data[sent:] pending[s].appendleft(data)

Page 20: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com20

Why bother with async?

Less memory resourcesStack memory allocated for each spawned thread. 2Mb on x86 LinuxFor a server to handle 10k connection – 20Gb of memory required just for starters!

Less CPU resourcesContext switching 10k threads is expensiveAsync moves switching logic for OS / interpreter level to application level – which is always more efficient

Page 21: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com21

C10k problem

The art of managing large amount of connections

Why is that a problem? - long polling / websockets

With modern live web applications, each client / browser

holds an open connection to the server

Gmail has 425 million active users

I.e. gmail servers have to handle ~400 million active

connections at any given time

Page 22: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com22

Concurrency vs Parallelism

ConcurrencyDealing with several tasks simultaneouslyBut with one task a timeAll Intel processors up to Pentium were concurrent

ParallelismDealing with several tasks simultaneouslyBut with several tasks at any given timeAll Intel processors since Pentium can execute more then one instruction per clock cycle

(C)Python is always concurrentEither with threads or with async approach

Page 23: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com23

Thread abuse

Naive approach – spawn a thread for every tiny task:Resource waste

Burden on OS / Interpreter

Good single-thread code can saturate a single coreUsually you don't need more then 1 thread / process per CPUIn web word

Your application need to scale beyond single machineI.e. you'll have to run in multiple isolated processes anyway

Page 24: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com24

Explicit vs Implicit context switching

Implicit context switchingOS / Interpreter decides when to switchCoder needs to assume he can use control any timeSynchronization required – mutexes, etc

Explicit context switchingCoder decides when give up execution controlNo synchronization primitives required!

Page 25: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com25

Explicit vs Implicit context switching

def transfer(acc_f, acc_t, sum): acc_f.lock() if acc_f.balance > sum: acc_f.balance -= sum acc_t.balance += sum acc_f.commit_balance() acc_t.commit_balance() acc_f.release()

def transfer(acc_f, acc_t, sum): if acc_f.balance > sum: acc_f.balance -= sum acc_t.balance += sum yield acc_f.commit_balance() yield acc_t.deposit(sum)

Threads Explicit Async

Page 26: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com26

Practical approach

Traditionally, async approach was implemented through callbacksIn JavaScript it can get as nasty as this:button.on("click", function() { JQuery.ajax("http://...", {

success: function(data) { // do something } }}

Thankfully, Python's support for anonymous functions is not that good

Page 27: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com27

Back to fun – Async frameworks in python

ExplicitTornadoTwistedTulip – part of Python standard lib since 3.4

ImplicitGevent (for python < 3)

Page 28: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com28

Tornado Hello World

import tornado.ioloopimport tornado.web

class MainHandler(tornado.web.RequestHandler): def get(self): self.write("Hello, world")

application = tornado.web.Application([ (r"/", MainHandler),])

if __name__ == "__main__": application.listen(8888) tornado.ioloop.IOLoop.instance().start()

So far everything is synchronous

Page 29: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com29

Tornado + database = async magic

from tornado.get import coroutinefrom momoko.connections import Pool

db = Pool(host=...)class MainHandler(tornado.web.RequestHandler):

@coroutine def get(self): cursor = yield db.execute("SELECT * FROM greetings") for row in cursor.fetchall() self.write(str(row)) self.finish()

Page 30: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com30

Demystifying the magic

Future – proxy to an object that will be available laterAKA “promise” in JavaScript, “deferred” in TwistedTraditional thread-related usage:

future = r.invoke("model_get")res = future.get_result()

future = Future()new_thread({ r = _invoke(...) future.set_result(r)})return future

Page 31: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com31

Futures in async

@coroutinedef get(self): rows = yield db.execute(...)

def coroutine(func): def wrapper(func): gen = func() future = gen.next() Runner(gen, future) return wrapper

from tornado import IOloopclass Runner(object): def __init__(self, gen, future): self.iploop = IOloop.instance() self.gen = gen self.future = future self.handle_yield()

def run(self): value = future.result() next_future = self.gen.send(value) # check StopIteration self.future = next_future self.handle_yield():

def handle_yield(self): if self.future.done(): self.run() else: self.ioloop.add_future( future, cb=self.run)

Page 32: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com32

Now the magical db.execute(...)

class Connection(object): def __init__(self, host=...): self.sock = …

def execute(self, query): self.future = Future() self.query = query self.ioloop.add_handler(self.sock, self.handle_write, IOloop.WRITE) return self.future

def handle_write(self): self.sock.write(query) self.ioloop.add_handler(self.sock, self.handle_read, IOloop.READ) def handle_read(self): rows = self.sock.read() self.future.set_result(rows)

Page 33: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com33

Writing async-ready libraries

You have a library that uses, lets say, socketsYou want to make it async compatibleTwo options:

Either choose which ioloop implementation you use (Tornado IOLoop, Python 3.4 Tuplip, etc). But its hard choice, limiting your usersImplementing library in a poll-able way. This way it can be plugged into any ioloop.

Page 34: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com34

(dumb) Pollable example: psycopg2 async mode

The following example is dumb, because it uses async in a sync way. But it demonstrates the principlefrom psycopg2.extensions import POLL_OK, POLL_WRITE, POLL_READ

def wait(conn): while 1: state = conn.poll() if state == POLL_OK: break elif state == POLL_WRITE: select.select([], [conn.fileno()], []) elif state == POLL_READ: select.select([conn.fileno()], [], []) else: raise psycopg2.OperationalError("...")

>>> aconn = psycopg2.connect(database='test', async=1)>>> wait(aconn)>>> acurs = aconn.cursor()>>> acurs.execute("SELECT pg_sleep(5); SELECT 42;")>>> wait(acurs.connection)>>> acurs.fetchone()[0]42

Page 35: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com35

Pollable example – the goal

class POLL_BASE(object): passclass POLL_OK(POLL_BASE): passclass POLL_READ(POLL_BASE): passclass POLL_WRITE(POLL_BASE): pass

class Connection(object): …

conn = Connection(host, port, …)conn.read(10)wait(conn) # poll, poll, pollprint "Received: %s" % conn.buff

Page 36: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com36

Pollable example - implementation

class POLL_BASE(object): passclass POLL_OK(POLL_BASE): passclass POLL_READ(POLL_BASE): passclass POLL_WRITE(POLL_BASE): pass

class Connection(object): def __init__(self, …): self.async_queue = deque()

def _read(self, total): buff = [] left = total while left: yield POLL_READ data = self.sock.recv(left) left -= len(data) buff.append(data) raise Return("".join(buff))

def _read_to_buff(self, total): self.buff = yield self._read(total)

def read(self, total): self.async_queue.append(self._read_to_buff(total))

Page 37: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com37

Pollable example – implementation cont

def poll(self, value=None): try: if value: value = self.async_queue[0].send(value) else: # Because we can't send non-None values to not started gens value = next(self.async_queue[0]) except (Return, StopIteration) as err: value = getattr(err, "value", None) self.async_queue.popleft()

if not len(self.async_queue): return POLL_OK # All generators are done - operation finished

if value in (POLL_READ, POLL_WRITE): return value # Need to wait for socket

if isinstance(value, types.GeneratorType): self.async_queue.appendleft(value) return self.poll() # Continue "pulling" next generator

# Pass return value to previous (caller) generator return self.poll(value)

Page 38: Advanced Python, Part 2

© 2014 Zaar Hai tech.zarmory.com

Thank you …