event machine
TRANSCRIPT
Scalable Ruby Processing with EventMachineMike Perham
And you are...?
Developer at OneSpot
memcache-client maintainer
data_fabric author
Scalable Processing?
Map/Reduce (Hadoop)
Message Queues
Efficient Processing!
Focus on maximizing machine utilization
Google tries for ~80% utilization
Status Quo
Typical Message Queue processing in Ruby:
Single Threaded
200MB (or more!) to process one message/sec?!
Load Average: 0.10 0.12 0.09
Blocking IO sucks
Rule of Thumb
Your code will spend 90% waiting for IO, 10% doing actual work
Blocking
Why do you add indexes to a database table?
Why do you put data in memcached?
Blocking IO
File
Database
memcached
Net::HTTP
DNS lookups
system()
Solutions?
How do we maximize the blue?
Threading?
Create 10 threads, each process a message concurrently
10% CPU * 10 = 100% CPU!
Java: good at threading
Ruby: not so much...
Threading?
Thread-unsafe extensions / libraries
Poor thread implementation
Ruby 1.8: Green Threads
Ruby 1.9: GIL
JRuby: the only good threading solution
Alternative?
What if we could...
Have Ruby work on one operation while another waited on I/O?
Fill in the green gaps?
Without threads?
Evented IO rules
EventMachine
Ruby implementation of the Reactor pattern
Single threaded by default
Allows us to interleave multiple IO ops and a single CPU op simultaneously
a concurrent programming pattern for handling service requests delivered concurrently to a service handler by one or more inputs. The service handler then demultiplexes the incoming requests and dispatches them synchronously to the associated request handlers.
How does it work?
IO.select(rd, wr, ex)
select
epoll on Linux 2.6
kqueue on BSD
/dev/poll on Solaris
All bets are off on Windows
Issues
Inversion of Control
Application code becomes callbacks
makes error handling difficult
somewhat solved by Fibers
Inversion of ControlWithout Fibers With Fibers
Coding
Difficult to understand
Little, poor documentation
Learning curve for newbies
Testing
Global context: reactor
Each test must setup/teardown a reactor
Whack-A-Mole
Blocking IO is everywhere
Easy to lose parallelism
Code
Evented
My EventMachine sample code repository
http://github.com/mperham/evented
Thumbnailer
Rack middleware to dynamically create thumbnails
Thin, EventMachine, ImageScience, em-http-request
Thumbnailer
Qanat
SQS processing daemon
Event-based S3, SimpleDB and SQS APIs
Uses Fibers with Ruby 1.9
EventMagick
system ==> EM.system
Execute ‘identify <JPEG>’ 640 times
system: 10 sec
EM.system: 5 sec
Example: system()
em_postgresql
ActiveRecord driver for Postgresql with EM
http://github.com/mperham/em_postgresql
Requires Ruby 1.9
Mysql? Use mysqlplus.
em_postgresql
Conclusions
Threading sucks
Blocking IO is everywhere
Use EM for IO to peg a single core
Use multiple processes for multi-core
Ruby 1.9 makes evented code nicer