concurrency and distributed systems
TRANSCRIPT
![Page 1: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/1.jpg)
Concurrency and Distributed systems
... With Python today.
Jesse Noller
Saturday, March 28, 2009
![Page 2: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/2.jpg)
30,000 Foot View
• Introduction
• Concurrency/Parallelism
• Distributed Systems
• Where Python is today
• Ecosystem
• Where can we go?
• Questions
Saturday, March 28, 2009
![Page 3: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/3.jpg)
Hello there!
• Who am I?
• Why am I doing this?
• Email: [email protected]
• Blog - http://www.jessenoller.com
• Pycon - http://jessenoller.com/category/pycon-2009/
Saturday, March 28, 2009
![Page 4: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/4.jpg)
Most of all, it’s fun!
Saturday, March 28, 2009
![Page 5: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/5.jpg)
No Code, Why?
Saturday, March 28, 2009
![Page 6: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/6.jpg)
Bike sheds
Saturday, March 28, 2009
![Page 7: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/7.jpg)
Concurrency
• What is it?
• Doing many things “at once”
• Typically local to the machine running the app.
• Implementation Options:
• threads / multiple processes
• cooperative multitasking
• coroutines
• asynchronous programming
Saturday, March 28, 2009
![Page 8: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/8.jpg)
... vs Parallelism
• What is it?
• Doing many things simultaneously
• Implementation options:
• threads
• multiple processes
• distributed systems
Saturday, March 28, 2009
![Page 9: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/9.jpg)
... vs Distributed Systems
• What is it?
• Doing many things, across multiple machines, simultaneously
• Many cores, on many machines
• There are many designs
• There are eight fallacies...
Saturday, March 28, 2009
![Page 10: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/10.jpg)
8 fallacies of distributed systems
Saturday, March 28, 2009
![Page 11: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/11.jpg)
The network is reliable
Saturday, March 28, 2009
![Page 12: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/12.jpg)
Latency is zero
Saturday, March 28, 2009
![Page 13: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/13.jpg)
Bandwidth is infinite
Saturday, March 28, 2009
![Page 14: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/14.jpg)
The network is secure
Saturday, March 28, 2009
![Page 15: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/15.jpg)
Topology doesn’t change
Saturday, March 28, 2009
![Page 16: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/16.jpg)
There is only one administrator
Saturday, March 28, 2009
![Page 17: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/17.jpg)
Transport cost is zero
Saturday, March 28, 2009
![Page 18: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/18.jpg)
The network is homogenous
Saturday, March 28, 2009
![Page 19: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/19.jpg)
Summary• All 3 are related to one another, the fundamental
goals of which are to:
• Decrease latency
• Increase throughput
• Applications start simple, progress to concurrent systems and evolve into parallel, distributed systems
• As the system evolves, the fallacies become more pertinent, you have to account for them early
Saturday, March 28, 2009
![Page 20: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/20.jpg)
Saturday, March 28, 2009
![Page 21: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/21.jpg)
• We have threads. Shiny, real OS ones
• Except for the Global Interpreter Lock
• The GIL makes the interpreter easier to maintain
• ...And it simplifies extension module code
Where is (C)Python?
Saturday, March 28, 2009
![Page 22: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/22.jpg)
• Yes. Sorta. Maybe. It depends.
• I/O Bound / C extensions release it!
• Most applications are I/O bound
• The GIL still has non-zero overhead
• The GIL is not going away*
• You can build concurrent applications regardless of the GIL
Is the GIL a problem?
* ... more on this in a moment, dun dun dun.
Saturday, March 28, 2009
![Page 23: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/23.jpg)
Multiprocessing!
• Added in the 2.6/3.0 timeline, PEP 371
• Processes and IPC (via pipes) to allow parallelism
• Same(ish) API as threading and queue
• Includes Pool, remote Managers for data sharing over a network, etc
• Multiprocessing “outperforms” threading
• IPC requires pickle-ability. Incurs overhead
Saturday, March 28, 2009
![Page 24: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/24.jpg)
Summary
• We have the Global Interpreter Lock
• We also have multiprocessing (no GIL)
• Threads (as an approach) are good for some problems
• They’re not impossible to use correctly
• While hampered, python threads are still useful
• Python still allows you to leverage other approaches to concurrency
Saturday, March 28, 2009
![Page 25: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/25.jpg)
(remember that asterisk?)*
Saturday, March 28, 2009
![Page 26: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/26.jpg)
• Python on the JVM (in Java)
• 2.5-Compatible
• Frank and the others are awesome for resurrecting this project
• May allow python in the Java door
• Pros:
• Unrestricted threading
• Hooray java.util.concurrent!
• Cons:
• No C extensions
Saturday, March 28, 2009
![Page 27: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/27.jpg)
IronPython
• Python on the .NET CLR
• 2.5.2 Compatible
• Matured rapidly, highly usable
• Great for windows environments
• Pros:
• Unrestricted threading
• Some C extensions via ironclad
• Cons:
• Mostly windows only, barring mono
Saturday, March 28, 2009
![Page 28: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/28.jpg)
Stackless
• Modified CPython interpreter
• Offers Coroutines, Channels - “lightweight threads”
• Cooperative multitasking (single thread executes)
• (mostly) Still alive courtesy of CCP Games
• Still has a GIL
• “Stackless is dead, long live PyPy”
Saturday, March 28, 2009
![Page 29: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/29.jpg)
• Python written in (R)Python
• Getting close to 2.5-Compatibility
• Complete “rethink” of the interpreter
• Focusing on JIT/interpreter speed right now
• Still has the GIL
• Some Stackless features (e.g. coroutines, channels)
• Not mature
Saturday, March 28, 2009
![Page 30: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/30.jpg)
The Ecosystem
Saturday, March 28, 2009
![Page 31: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/31.jpg)
That’s a lot of nuts!
• When I started, I had around 40 libraries on my list
• Coroutines, messaging, frameworks, etc
• Python has a huge ecosystem of “stuff”
• Unfortunately, much of is long in the tooth, or of beta quality
• New libraries/frameworks/approaches are coming out every week
Saturday, March 28, 2009
![Page 32: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/32.jpg)
ConcurrencyFrameworks
Saturday, March 28, 2009
![Page 33: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/33.jpg)
Twisted
• “OK, who hasn’t tried twisted?”
• Asynchronous, Event Driven multitasking
• Vast networking library, large ecosystem
• Supports thread usage, but twisted code may not be thread safe
• Supports using processes (not mprocessing).
• Can be mind-bending
Saturday, March 28, 2009
![Page 34: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/34.jpg)
Kamaelia
• Came out of BBC Research
• Uses an easy to understand “components talking via mailboxes” approach
• Cooperative multitasking via generators by default.
• Honkin’ library of cool things
• Supports thread-based components as well
• Very easy to get up and running
• Abstracts IPC, Process, Threads, etc “away”
Saturday, March 28, 2009
![Page 35: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/35.jpg)
Frameworks
• Both kamaelia and twisted have nice networking support
• Both use schedulers which allow scheduled items to schedule other items
• Two different approaches to thinking about the problem
• Both can be used to build distributed apps
• Like all frameworks, you adopt the methodology
Saturday, March 28, 2009
![Page 36: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/36.jpg)
New: Concurrence
• New on the scene (’09) version 0.3
• Lightweight tasks-with-message passing
• Has a main scheduler/dispatcher
• Built on top of stackless/greenlets/libevent
• Network-oriented (HTTP, WSGI servers)
• Still raw (more docs please)
• Very promising (minus compilation problems)
Saturday, March 28, 2009
![Page 37: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/37.jpg)
Coroutines
• Coroutines are essentially light-weight threads of control, Think micro/green threads
• Typically use explicit task switching (cooperative)
• Most implementations have a scheduler, and some communications method (e.g. pipes)
• Not parallel unless used in a distributed fashion
• Both Kamaelia and Twisted “fit” here
• Enhanced generators make these easy to build
Saturday, March 28, 2009
![Page 38: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/38.jpg)
Coroutine libraries
• Fibra: microthreads, tubes, scheduler
• Greenlet: C based, microthreads, no scheduler
• Eventlet: Network “framework” layer on top of greenlet. Has an Actor implementation \o/
• Circuits: Event-based, components/microthreads
• Cogen: network oriented, scheduler, microthreads
• Multitask: microthreads, no channels (it’s dead jim)
Saturday, March 28, 2009
![Page 39: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/39.jpg)
Actors
• Isolated, self reliant components
• Can spawn other Actors
• Communicate via message passing only (by value)
• Operate in parallel
• Communication is asynchronous
• A good model to overcome the fallacies
• See also: Erlang, Scala
Saturday, March 28, 2009
![Page 40: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/40.jpg)
Actor Libraries• Dramatis (alpha quality)
• Great start, excellent base to start working with them
• Parley (alpha quality)
• Another excellent start, supports actors in threads, greenlets or stackless tasklets
• Candygram (2004)
• Old, implements erlang primitives, spawns in threads
• Kamaelia components can fit here(ish)
Saturday, March 28, 2009
![Page 41: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/41.jpg)
(local) Parallelism• Multiprocessing
• Processes and IPC via the threading API, in Python-Core as of 2.6
• Parallel Python
• Allows local parallelism, but also distributed parallelism in a “full” package
• pprocess
• Another easy to use fork/process based package
• Has IPC mechanisms
Saturday, March 28, 2009
![Page 42: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/42.jpg)
Distributed Systems
• Lots of various technologies to help build something
• communications libraries
• socket/networking libraries
• message queues
• some shared memory implementations
• No “full stack” approach
• Most users end up rolling their own, using some combinations of libraries and tools
Saturday, March 28, 2009
![Page 43: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/43.jpg)
Distributed Processing
• Frameworks:
• Parallel Python is the closest for a processing cluster
• The Disco Project is an erlang-based (with python bindings) map-reduce framework
Saturday, March 28, 2009
![Page 44: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/44.jpg)
RPC/Messaging
• RPC:
• Pyro
• rPyc
• Thrift
• Messaging:
• pySage
• python-spread
• XMPP
• Protocol Buffers
Saturday, March 28, 2009
![Page 45: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/45.jpg)
Shared Memory/Message Qs
• Shared Memory
• Posh (dead)
• Memcached
• posix_ipc
• Message Queues
• Apache ActiveMQ
• RabbitMQ
• Stomp
• MemcacheQ
• Beanstalkd
Saturday, March 28, 2009
![Page 46: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/46.jpg)
So...
• Where the hell do we point new users?
• While good, Twisted and Kamaelia have a documentation problem
• The rest is a mish-mash of technologies
• Concurrency is hard let’s go shopping!
Saturday, March 28, 2009
![Page 47: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/47.jpg)
Where does this leave us?
• The GIL is here for the foreseeable future
• Not entirely a bad thing (extensions!)
• Python-Core is not the right place for much of this, but can provide some basics
• Actor implementation
• Java.util.concurrent-like abstractions
• Anything going in must make this work safe
Saturday, March 28, 2009
![Page 48: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/48.jpg)
Where does this leave us?
• Lots of great community work
• Continued room for growth, adoption of other language’s technologies
• If we can build a stack of reusable, swappable components for all three areas: everyone wins
• Anyone for a “distributed Django”?
• “loose coupling and tight cohesion”
• Must take the fallacies into account
Saturday, March 28, 2009
![Page 49: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/49.jpg)
Django?
• The point of a framework is to make the easy things easy, and the hard things easier
• The abstractions must be leaky
• Go see abstractions as leverage!
• It must be safe
• It can not ignore the fallacies
• I shall call it Mustaine (Megadeth)
Saturday, March 28, 2009
![Page 50: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/50.jpg)
Questions?
Saturday, March 28, 2009
![Page 51: Concurrency and Distributed systems](https://reader030.vdocuments.us/reader030/viewer/2022012016/615b02a73d32ef56a54cf1a0/html5/thumbnails/51.jpg)
Fin.
Saturday, March 28, 2009