work queue systems

26
WORK QUEUE SYSTEMS

Upload: david-butler

Post on 29-Nov-2014

195 views

Category:

Software


2 download

DESCRIPTION

An overview of work queue systems with an emphasis on Ruby libraries

TRANSCRIPT

WORK QUEUESYSTEMS

MOTIVATIONSDo work in the background

Parallelize tasks

Distribute work among many machines

DESIGN CONSIDERATIONSExpect failure and design accordingly (process crashes,machine reboots, network partition)

Break work into small, bite-size tasks

Idempotency: ensure nothing bad will happen if your job runsmultiple times

WORK DISTRIBUTIONSTRATEGIES

SINGLE MACHINEDistribute work to multiple worker threads or forked worker

processes.

Can easily parallelize work, but jobs go away if the processrestarts

Cannot distribute work to multiple machines this way

IPC (Inter-Process Communication) is difficult to do right

Big no-no for web apps (you want to offload work to aseparate machine)

MULTIPLE MACHINESDistribute work to workers on other machines directly over the

network

Ruby’s DRb can distribute work, but is unstable under highload

A dedicated messaging system can be used to distribute workreliably

Jobs are (usually) not persistent so can be lost if somethingcrashes

PERSISTENT QUEUEWorkers pull jobs from a persistent backend queue

Suitable when many jobs need to be queued up and workedover time

Jobs can still be lost if workers crash or database hiccups

“Reliable” queueing can recover jobs if workers crash

CAPABILITIES OF A (GOOD)WORK QUEUE SYSTEM

RETRIESThings go wrong all the time. You want jobs to be automatically

retried.

RELIABILITYMessages / Jobs should never be lost.

SCHEDULINGSchedule a job to run at a certain time instead of running

immediately.

STATUSReport back to the application on the job’s completion

percentage and whether it succeeded or failed.

PRIORITYIf your queue fills up, important jobs might be waiting in the backof the queue. A priority queue allows important jobs to go to the

top so they can be executed ASAP.

TYPES OF QUEUING BACKENDS

DEDICATED QUEUING SYSTEMBackend built specifically for the purpose of queueing

Natively supports desired properties of queues

Gearman: One of the originals. Out of date, not as fully-features as modern alternatives

Beanstalkd: Very fully featured and well-maintained

GENERAL­PURPOSE DATABASESimple to use if you’re already using a standard database

May not scale to massive / high-throughput workloads

SQL: May have locking / concurrency issues

Document Store: Probably won’t provide reliability

Redis: Swiss-Army Knife of key-value stores, used by Resqueand Sidekiq. Everything has to fit in memory.

MESSAGING SYSTEMProvides generic message-passing capabilities (queues arejust a special case)

Very scalable and high-throughput

Can be very complex to set up and use (topics, consumers,exchanges, brokers, OH MY)

ActiveMQ, RabbitMQ, ZeroMQ, HornetQ

- distributed commit logApache Kafka

BATCH PROCESSING SYSTEMMapReduce on huge volumes of data

Apache Hadoop

Apache Spark

Amazon Elastic MapReduce - hosted Hadoop

REALTIME PROCESSINGSYSTEM

Continual stream of input (firehose), need results withinseconds or minutes

Apace Storm

THIRD PARTY SERVICE - reliable message queue service

Amazon SQS: Scalable, but very bare-bones (lacks good Rubyworker client)

IronMQ / IronWorker

RUBY WORK QUEUE LIBRARIESA backend isn’t very useful without a good worker library to run

the jobs. Often the library can provide capabilities that thebackend does not.

RESQUE VS SIDEKIQResque forks workers, Sidekiq uses threads via Celluloid

Both use Redis for the backend and are mostly compatiblewith each other

Very fully featured (often via a separate gem)

Both come with web UI to make it easier to monitor job status

Sidekiq has a performance edge, and Sidekiq Pro offersreliability and batches

DELAYED JOBUses Active Record, so easy to plug into existing Rails app

Fairly well supported in the community

Alternatives that take advantage of PostgreSQL advancedfeatures: Queue Classic, Que, Toro

IN­MEMORYSucker Punch and Threaded In Memory Queue run workers in

the same process (in background threads) and distribute thejobs directly to these workers.

HONORABLE MENTIONSSneakers - RabbitMQ

Backburner - Beanstalkd

TorqueBox Backgroundable (JRuby-only)

Qu - Supports multiple backends (Redis, MongoDB, SQS). Notas well maintained or fully-featured.

ADAPTERSYou may want to change queueing backends / libraries without

rewriting all your jobs.

MultiWorker - Adapts all the libraries mentioned in thispresentation

ActiveJob - Built into Rails 4.2.0 (beta), but can be used as aseparate gem