spotify services (sdc 2013)

29
The whole is greater than the sum of the parts Spotify services Niklas Gustavsson måndag 27 maj 13

Post on 17-Oct-2014

817 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Spotify services (SDC 2013)

The whole is greater than the sum of the partsSpotify servicesNiklas Gustavsson

måndag 27 maj 13

Page 2: Spotify services (SDC 2013)

Distributed systems geekSpotify since [email protected]@protocol7

Me

måndag 27 maj 13

Page 3: Spotify services (SDC 2013)

Architectural overviewLots of questions!

Last year

måndag 27 maj 13

Page 4: Spotify services (SDC 2013)

Spotify has more than a hundred backend services. They handle enormous amounts of data. They should always be available. How are they built?

Today

måndag 27 maj 13

Page 5: Spotify services (SDC 2013)

In praise of small services

måndag 27 maj 13

Page 6: Spotify services (SDC 2013)

A small code base is simpler to understand and reason aboutDoing one thing and one thing only means no compromises

In praise of small servicesC

CC C

AP

SS S

S

måndag 27 maj 13

Page 7: Spotify services (SDC 2013)

“Rule of Modularity: Developers should build a program out of simple parts connected by well defined interfaces, so problems are local, and parts of the program can be replaced in future versions to support new features. This rule aims to save time on debugging complex code that is complex, long, and unreadable.”

Eric S. Raymond, The Art of Unix Programming

måndag 27 maj 13

Page 8: Spotify services (SDC 2013)

“Decouple until it breaks, and then back of just a little”Strive to make services autonomousWatch your latency, but commonly not significant

DecoupleC

CC C

AP

SS S

S

måndag 27 maj 13

Page 9: Spotify services (SDC 2013)

Use scaffolding to quickly get the basic service structureReuse in librariesDon’t overuse patterns. Don’t use layers upon layers. Keep it simple

Simple codebases

måndag 27 maj 13

Page 10: Spotify services (SDC 2013)

We build services in Python and JavaPython is awesome for quick development and beautiful codeThe JVM is stable, performant and transparent

Languages and runtimes

måndag 27 maj 13

Page 11: Spotify services (SDC 2013)

Performance at scale

måndag 27 maj 13

Page 12: Spotify services (SDC 2013)

Care about your performance. Set clear goals. Measure, measure, measure.Have an architecture that allows for scale. Build out as needed. Measure, measure, measure.

Performance at scale

http://www.bbc.co.uk/programmes/b01qzdc1

måndag 27 maj 13

Page 13: Spotify services (SDC 2013)

Prefer stateless services when possibleScales out linearIsolate mutating operations

Prefer stateless services

måndag 27 maj 13

Page 14: Spotify services (SDC 2013)

Fast, efficient, RESTful protocolsConnection pools are hard. Overloaded TCP servers are complicatedUse queues. Proper pushback. Naturally asynchronous.

Efficient protocols

måndag 27 maj 13

Page 16: Spotify services (SDC 2013)

ZeroMQ. Light-weight, fast as hell, queue basedProtobuf. Small, fast, schema-based, simple binary formatRequest-reply and pub/sub

Hermes

måndag 27 maj 13

Page 17: Spotify services (SDC 2013)

Don’t be afraid to drop requests (and replies) when overloadedUse shallow queuesUse short timeoutsUse small thread poolsUse small connection pools

Drop requests

måndag 27 maj 13

Page 18: Spotify services (SDC 2013)

måndag 27 maj 13

Page 19: Spotify services (SDC 2013)

We use the best tool for each case from a small, carefully selected set of optionsPostgreSQL as the default mutable storageCassandra for large scale (heavy writes) or multi-site servicesVarious read-only key-value storeshttp://labs.spotify.com/2013/02/25/in-praise-of-boring-technology/

Scaling storage

måndag 27 maj 13

Page 20: Spotify services (SDC 2013)

Always fail, never fail

måndag 27 maj 13

Page 21: Spotify services (SDC 2013)

Stuff is always broken. Deal with it.Always design for redundancyAlways keep an eye on your worldDon’t DDoS yourself

Always fail, never fail

måndag 27 maj 13

Page 22: Spotify services (SDC 2013)

Build your system to run on multiple serversUse service discovery everywhere. We use DNS SRV records.Make deployment and configuration automated and repeatableMake sure your service is actually running

Many commodity servers

måndag 27 maj 13

Page 23: Spotify services (SDC 2013)

Instrument your code with metrics everywhereWe use our own for Python. http://metrics.codahale.com for javaMonitor your infrastructure. JVMs, OS, network, storage

Measure everything

måndag 27 maj 13

Page 24: Spotify services (SDC 2013)

Graph your important metrics, strive for seconds latencyWe use a heavily extended derivative of Munin

Graph

måndag 27 maj 13

Page 25: Spotify services (SDC 2013)

Hard to know beforehand, err on the side of logging too much (within reasons)Use a structured formatUse syslogCollect your logs in a central placeStore your logs and make them analyzable

Log what’s important

måndag 27 maj 13

Page 26: Spotify services (SDC 2013)

Consistently build to some form of packages. Keep track of dependenciesWe build everything* to Debian packages and use package dependenciesDebian is awesome. Use it.

Automate deployment

* Except Maven dependencies

måndag 27 maj 13

Page 27: Spotify services (SDC 2013)

Keep everything under version controlUse a provisioning toolWe use Puppet and store every configuration in Git. Everything*.250 modules, 880 classes

Automate configuration

* Everything

måndag 27 maj 13

Page 28: Spotify services (SDC 2013)

Trust your developers and ops. Let your teams be autonomousLong-term ownershipMinimize interruptions (aka meetings)Favor asynchronous communication. We coordinate over IRC and use mailShip.

Development

måndag 27 maj 13

Page 29: Spotify services (SDC 2013)

We’re hiring → spotify.com/jobs ([email protected])Questions?

måndag 27 maj 13