dopplr: it's made of messages - matt biddulph

18
DOPPLR made of messages matt biddulph I’m here today to talk to you about an important systems architecture concept that most people are going to need sooner or later.

Upload: carsonified-team

Post on 19-Aug-2015

17.416 views

Category:

Technology


1 download

TRANSCRIPT

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

made of messagesmatt biddulph

I’m here today to talk to you about an important systems architecture concept that most people are going to need sooner or later.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

&This is my favourite thing about Unix. “Do it in the background”

Why does that matter? Because of an important part of the Unix philosophy: “Write programs to work together.”

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

Modern web systems need gearing—because their parts don’t all run at the same speeds. It’s important to think about serverside architecture as an asynchronous system.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

Send IMs

IM Bot

Receive IMs

ActivityWork

Precalc

Facebook

Web API

Flickr

CronjobSend Email Alerts

AMEE

Friendfeed GMail

iCal Sources

Incoming EmailsBatch

Calculations

Mr&Mrs Smith

DopplrCache

Outgoing Emails

Everything in this diagram is something that *doesn’t* happen as part of an HTTP request.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

“The Flickr engineering team is obsessed with making pages load as quickly as possible.

Myles Granthttp://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/

“To that end, we’re refactoring large amounts of our code to do only the essential work up front, and rely on our queueing system to do the rest.”

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

Enterprise Integration Patternsby Gregor Hohpe, Bobby Woolf et al

Here comes the science bit. Concentrate...

This is a great book. Really. Ignore the name.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

“Messaging enables high-speed, asynchronous,program-to-program communication with reliable delivery.”

Here’s a technical definition. Asynchronous: it's not just for the clientside.

It’s important to note that I’m talking about a mechanism that sits inside your systems architecture. Not internet-scale messaging such as IM, XMPP or Comet.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

“A message is an atomic packet of data that can be transmitted on a channel.”

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

{"id"=>377826, "model"=>"Trip", "method"=>"calculate_coincidences!", "created_at"=>"2008-10-07T15:40:18"}

Here’s an actual message from inside the Dopplr system. It’s just a piece of JSON.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

“Does it matter—to this person, at this moment—whether it shows up simultaneously in a friend’s inbox, the public timeline, a global tag page, or even an RSS or Atom feed?”

Leslie Orchardhttp://decafbad.com/blog/2008/07/04/queue-everything-and-delight-everyone

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

{"id"=>377826, "model"=>"Trip", "method"=>"calculate_coincidences!", "created_at"=>"2008-10-07T15:40:18"}

Back to the message. What do we do with it? It gets send on a queue we call “Precalc” and it’s an instruction to do the heavy-lifting calculations for the coincidences on a trip.

This message is sent when someone adds a trip. We calculate the coincidences from their point of view to render the web page they see, but we save the calculations from everyone else’s point of view for background processing. Sending a message takes almost no time, so removing this processing from the HTTP request cycle gains a lot of subjective performance for the user.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

AMEEhttp://amee.cc

We use AMEE to calculate the carbon impact of our travellers’ trips. AMEE have fast servers, but it still takes a while to upload a whole profile of trips from our database to theirs.

So in this case, not only do we send the trips data over our message queue, we also give user feedback while doing so. We do this by passing a unique ID along with the instructions. This ID is used to make a memcache key, and progress reports are stored under that key. We then use a very cheap-to-serve memcache wrapper and have some javascript in the client poll our server every half a second (of course we could also use some Comet long-polling style to achieve that part).

The nice thing about this is that it’s stateless in our Rails app. The client has the unique ID to do the polling, so it

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

N>1Joe Gregorio

http://bitworking.org/news/218/N-1

This is the hardest thing about scaling a new web app.

“The problem with current data storage systems, with rare exception, is that they are all "one box native" applications, i.e. from a world where N = 1. From Berkeley DB to MySQL, they were all designed initially to sit on one box.”

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

You can also use message topics for monitoring. Every time our Rails appservers finish a request, they broadcast a little packet of JSON describing the request just processed. It includes details about the client, the processing time taken, and the current user. We can dip into this stream at any time with a few lines of code.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

In the cloud

Splitting up work across a cluster of machines is a great way to scale. It’s going to be a necessary part of any serious scale cloud computing. If you have the ability to fire up 100 new servers in a matter of minutes, you need a way to distribute work over them. Messages can encapsulate this work and provide a reliable delivery mechanism that’s low in coordination cost.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

beanstalkd

starling... and lots more

Here are some technologies to look into—mostly free software. At Dopplr we use ActiveMQ and it suits us just fine. It’s good that there are several to choose from, and I believe the most important thing is to think about serverside architecture as an asynchronous system. Choice of software is an implementation detail, mostly because of language-agnostic protocols like STOMP and AMQP that have libraries in most languages.

And I haven’t even mentioned the fun you can get into with Internet-scale messaging using XMPP, and making “push APIs”. See Rabble, Kellan and Blaine passim.

DOPPLRDOPPLR

DOPPLR

Where next?Where next?

Where next?

thank youmatt biddulph

[email protected]

special thanks to:http://www.flickr.com/photos/riotjane/2880420762/http://www.flickr.com/photos/hitherto/2365952880/http://www.flickr.com/photos/sophie-/366116173/http://www.flickr.com/photos/tim_d/155441805/http://www.flickr.com/photos/nicholas_t/293413649/http://www.flickr.com/photos/ennor/527510576/

I’m here today to talk to you about an important systems architecture concept that most people are going to need sooner or later.