Download - Spotify: Playing for millions, tuning for more

Barcelona Developers Conference - 18 November 2011

Playing for millions, tuning for more

David Poblador i Garcia - @davidpobladorNick Barkas - @snb

fredag 18 november 11

Spotifiera anyone?


Outline

Growth

Deploying lots of servers

Backend architecture overview

Communication protocols

Storage

Monitoring

Future improvements


We’re kind of big

Over ten million registered users

Over two million paying subscribers

Launched in 12 countries

Over 15 million tracks*

Over 400 million playlists

Three datacentres

Over 1300 servers

* Number of tracks licensed globally. Catalogue size varies in each country.


We’re getting bigger!

More countries

• Added US (July) and Denmark (October) this year

• Austria, Switzerland, and Belgium added this week

More users

• Sign-up via Facebook

• From one to two million paying subscribers in six months

More music!

• Adding over 20,000 tracks each day


How to manage so many servers?

ServerDBFAIDebian PackagingPuppet (yes, we also hate it sometimes)Monitoring


ServerDB

In house toolAn authoritative database of equipment• Locations• Datacentres• Hostnames

Aiming to have it as the unique source of info• DNS config• What server does what• Puppet classes• FAI classes


FAI and Puppet

FAI installs all the basic stuff on TFTP boot• Partitions based on server type (and FAI class)• Installs base packages (.deb, of course)• Sets the basic network configuration• Bootstraps Puppet

Puppet takes over• Installs packages based on Puppet recipes• Our devs write Puppet manifests• We hate it (sometimes)


Let’s install a server!


Overview of Spotify components

accesspoint

storage

search

playlist

user

web api

browse

...

Backend services

Clients

www.spotify.com

adssocial

key Facebook

Amazon S3

CDN

Content ingestion, indexing, and transcoding

Log analysis (hadoop)

Record labels


Reducing bandwidth: P2P and caching


DNS: finding services and resources

What’s the hostname and port for the service I want?• SRV record:_frobnicator._http.example.com. 3600 SRV 10 50 8081 frob1.example.com.

name ttl prio weight port host

Which service instance should I ask for a resource? • Distributed hash tables (DHT). Ring configuration:config._frobnicator._http.example.com. 3600 TXT “slaves=0”config._frobnicator._http.example.com. 3600 TXT “slaves=2 redundancy=host”

• Mapping ring segment to service instance:tokens.8081.frob1.example.com. 3600 TXT “00112233445566778899aabbccddeeff”


Communication between services

Clients -> AP: proprietary protocol

AP -> service and service <-> service

• HTTP

‣ Originally all services used this

‣ Simple, well known, battle tested

‣ Each service defines its own (usually) RESTful protocol

• Splat: Service Platform

‣ Custom-built by Spotify devs

‣ Protocol defined with Thrift

‣ Provides replication and load balancing


New communication framework: hermes

Thin layer on top of ØMQ

Data in messages are serialized as protobuf

• Services define their APIs partly as protobuf messages

Hermes messages embedded in client <-> AP protocol

• AP doesn’t need to translate protocols; acts as ØMQ router

In addition to request/reply, we get pub/sub


Storage technologies

Critical, consistency important: PostgreSQL• User info required for authentication

Huge, growing, eventual consistency OK: Cassandra• Playlists, other user info, social

Fast, small, read-only key-value: Tokyo Cabinet• Track/artist/album metadata, encryption keys

Large files, read-only: Nginx caching proxy + Amazon S3• Music files, album cover art


Monitoring

We graph all our systems

• Munin plugins to collect data

‣ Server related figures (CPU, disk...)

‣ Systems related figures (latency, playbacks...)

• We use our own frontend to display the data

Alerts are handled using Zabbix

• We classify alerts by severity

• High severity alerts are delivered to our pagers

‣ Currently we only get a handful per week


Future (and current) challenges

Self-recovery

• Diagnose

• Take measures

Auto notification

• Do not bother ops, bother our suppliers

Auto scaling

• Bring up new servers

Better way to register services than DNS

• ZooKeeper? Faster to update, always consistent


Gràcies!

Preguntes?Nick Barkas @snb

David Poblador i Garcia @davidpoblador


Download - Spotify: Playing for millions, tuning for more

Top Related