spotify: playing for millions, tuning for more

18
Barcelona Developers Conference - 18 November 2011 Playing for millions, tuning for more David Poblador i Garcia - @davidpoblador Nick Barkas - @snb fredag 18 november 11

Upload: nick-barkas

Post on 28-Nov-2014

1.993 views

Category:

Technology


3 download

DESCRIPTION

Barcelona Developers Conference presentation by Nick Barkas and David Poblador i Garcia, 18 November 2011. How we manage a huge collection of servers and some of the technologies we use for building a scalable, high performance music streaming service.

TRANSCRIPT

Page 1: Spotify: Playing for millions, tuning for more

Barcelona Developers Conference - 18 November 2011

Playing for millions, tuning for more

David Poblador i Garcia - @davidpobladorNick Barkas - @snb

fredag 18 november 11

Page 2: Spotify: Playing for millions, tuning for more

Spotifiera anyone?

fredag 18 november 11

Page 3: Spotify: Playing for millions, tuning for more

Outline

Growth

Deploying lots of servers

Backend architecture overview

Communication protocols

Storage

Monitoring

Future improvements

fredag 18 november 11

Page 4: Spotify: Playing for millions, tuning for more

We’re kind of big

Over ten million registered users

Over two million paying subscribers

Launched in 12 countries

Over 15 million tracks*

Over 400 million playlists

Three datacentres

Over 1300 servers

* Number of tracks licensed globally. Catalogue size varies in each country.

fredag 18 november 11

Page 5: Spotify: Playing for millions, tuning for more

We’re getting bigger!

More countries

• Added US (July) and Denmark (October) this year

• Austria, Switzerland, and Belgium added this week

More users

• Sign-up via Facebook

• From one to two million paying subscribers in six months

More music!

• Adding over 20,000 tracks each day

fredag 18 november 11

Page 6: Spotify: Playing for millions, tuning for more

How to manage so many servers?

ServerDBFAIDebian PackagingPuppet (yes, we also hate it sometimes)Monitoring

fredag 18 november 11

Page 7: Spotify: Playing for millions, tuning for more

ServerDB

In house toolAn authoritative database of equipment• Locations• Datacentres• Hostnames

Aiming to have it as the unique source of info• DNS config• What server does what• Puppet classes• FAI classes

fredag 18 november 11

Page 8: Spotify: Playing for millions, tuning for more

FAI and Puppet

FAI installs all the basic stuff on TFTP boot• Partitions based on server type (and FAI class)• Installs base packages (.deb, of course)• Sets the basic network configuration• Bootstraps Puppet

Puppet takes over• Installs packages based on Puppet recipes• Our devs write Puppet manifests• We hate it (sometimes)

fredag 18 november 11

Page 9: Spotify: Playing for millions, tuning for more

Let’s install a server!

fredag 18 november 11

Page 10: Spotify: Playing for millions, tuning for more

Overview of Spotify components

accesspoint

storage

search

playlist

user

web api

browse

...

Backend services

Clients

www.spotify.com

adssocial

key Facebook

Amazon S3

CDN

Content ingestion, indexing, and transcoding

Log analysis (hadoop)

Record labels

fredag 18 november 11

Page 11: Spotify: Playing for millions, tuning for more

Reducing bandwidth: P2P and caching

fredag 18 november 11

Page 12: Spotify: Playing for millions, tuning for more

DNS: finding services and resources

What’s the hostname and port for the service I want?• SRV record:_frobnicator._http.example.com. 3600 SRV 10 50 8081 frob1.example.com.

name ttl prio weight port host

Which service instance should I ask for a resource? • Distributed hash tables (DHT). Ring configuration:config._frobnicator._http.example.com. 3600 TXT “slaves=0”config._frobnicator._http.example.com. 3600 TXT “slaves=2 redundancy=host”

• Mapping ring segment to service instance:tokens.8081.frob1.example.com. 3600 TXT “00112233445566778899aabbccddeeff”

fredag 18 november 11

Page 13: Spotify: Playing for millions, tuning for more

Communication between services

Clients -> AP: proprietary protocol

AP -> service and service <-> service

• HTTP

‣ Originally all services used this

‣ Simple, well known, battle tested

‣ Each service defines its own (usually) RESTful protocol

• Splat: Service Platform

‣ Custom-built by Spotify devs

‣ Protocol defined with Thrift

‣ Provides replication and load balancing

fredag 18 november 11

Page 14: Spotify: Playing for millions, tuning for more

New communication framework: hermes

Thin layer on top of ØMQ

Data in messages are serialized as protobuf

• Services define their APIs partly as protobuf messages

Hermes messages embedded in client <-> AP protocol

• AP doesn’t need to translate protocols; acts as ØMQ router

In addition to request/reply, we get pub/sub

fredag 18 november 11

Page 15: Spotify: Playing for millions, tuning for more

Storage technologies

Critical, consistency important: PostgreSQL• User info required for authentication

Huge, growing, eventual consistency OK: Cassandra• Playlists, other user info, social

Fast, small, read-only key-value: Tokyo Cabinet• Track/artist/album metadata, encryption keys

Large files, read-only: Nginx caching proxy + Amazon S3• Music files, album cover art

fredag 18 november 11

Page 16: Spotify: Playing for millions, tuning for more

Monitoring

We graph all our systems

• Munin plugins to collect data

‣ Server related figures (CPU, disk...)

‣ Systems related figures (latency, playbacks...)

• We use our own frontend to display the data

Alerts are handled using Zabbix

• We classify alerts by severity

• High severity alerts are delivered to our pagers

‣ Currently we only get a handful per week

fredag 18 november 11

Page 17: Spotify: Playing for millions, tuning for more

Future (and current) challenges

Self-recovery

• Diagnose

• Take measures

Auto notification

• Do not bother ops, bother our suppliers

Auto scaling

• Bring up new servers

Better way to register services than DNS

• ZooKeeper? Faster to update, always consistent

fredag 18 november 11

Page 18: Spotify: Playing for millions, tuning for more

Gràcies!

Preguntes?Nick Barkas @snb

David Poblador i Garcia @davidpoblador

fredag 18 november 11