spotify infrastructure

Spotify infrastructure

Behind the scenes

Javier Ubillos2013-11-25

Javier Ubillos

What is Spotify?

► Lightweight on-demand streaming

► Over 24 Million active users

► Over 6 Million subscribers

► 20 Million tracks (totally licenced, catalogue size varies per country)

► Available in 32 countries

► Convenient (more so than piracy)

► Legal

2013-11-25

Javier Ubillos

More numbers

► Over 20,000 new tracks added each day

► Over 1 billion playlists created

► Over $500M has now been paid to rights holders since launch in October 2008

► Spotify is the overall number two digital service in Europe, after iTunes. (IFPI Digital Music Report, Jan 2011)

2013-11-25

Javier Ubillos

Work setting

► Spotify version of SCRUM or KANBAN

►Three week sprints (varies greatly, lower is encouraged)

► Features are developed in squads of ~3 - ~10 people

►Each squad should feel like a mini start-up company

► Tech offices in Stockholm, Gothenburg, New york and San Francisco

► Hack days, 1.5 days every sprint (~10% of work-time)

2013-11-25

Javier Ubillos

About me

► Started at Spotify 2011 as a backend-infrastructure developer► Previously phd student within computer-communications for Mälardalen University (MDH), hired at Swedish Institute of Computer Science (SICS)

► Mainly research at (OSI-)layer 3

► Now working as a “technical product owner” for our team developing our tools for operational monitoring.

► Create the necessary alignment to allow the squad autonomy and freedom to act and develop (both technically as well as personally)

2013-11-25

Javier Ubillos

The infrastructure

Client Accesspoint

Spotifybackend

Client

Client

Client

P2P

2013-11-25

Javier Ubillos

Latency is everything

► Google websearch: Adding 100ms – 400ms latency to serve the searchresults increases drops in searches from -0.2% to -0.6%

- Speed Matters for Google Web Search, Jake Brutlag, 2009

► Amazon reports that every 100ms of latency costs 1% of profit

► Google maps found that an increase from 400ms to 900ms to render search results dropped traffic with 20%

► A blink of an eye takes between 300ms and 400ms

► Median playback latency in Spotify is 265ms (feels immedate)

2013-11-25

Javier Ubillos

Caching

► Player caches tracks it has played

► Default policy is to use 10% of free space (capped at 10 GB)

► Caches are large (56% are over 5GB)

► Least Recently Used policy for cache eviction

► Over 50% of data comes from local cache

► Cached files are served in P2P overlay

Note:Numbers are from the desktop client

2013-11-25

Javier Ubillos

► 55.4% from local caches

► 35.8% from P2P

► 8.8% from Spotify servers

► Median latency 265 ms

► 75th percentile: 515 ms

► 90th percentile: 1047 ms

► Less than 1% of playbacks had stutter occurrences.

Data sources (2012)

P2P

55.4%

Spotify backend

Local cache

35.8%

8.8%

2013-11-25

Javier Ubillos

Transfer strategy

Client Accesspoint S

poti

fy b

acke

nd

Client

Client

Client

► Fetch first piece from Spotify servers

► Meanwhile; search P2P for remainder

► Switch back and forth between Spotify servers and peers as needed

► Towards end of a track, start prefetching the next track

► Spotify traffic is very bursty

2013-11-25

Javier Ubillos

The parts

► The streaming protocol

► The P2P net

► The backend

2013-11-25

Javier Ubillos

Streaming

► Proprietary protocol

► Designed for on-demand streaming

► Only Spotify can add tracks

► 96-320 kbps audio streams (most are Ogg Vorbis q5, 160 kbps)

► Peer-assisted streaming (YAY!)

2013-11-25

Javier Ubillos

Protocol use

► Everything over TCP (almost)

► Everything encrypted (almost)

► Multiplex messages over a single TCP connection

► TCP_NODELAY (disables Nagles algorithm)

► Avoids with TCP slow start

► Persistent TCP connection to server while logged in reaming

2013-11-25

Javier Ubillos

Intermission:NAT and DNS error reporting

► Even when 'normal' internet access is blocked, you can often get DNS packets through

► We use this for error signaling, to find out why our users are having problems

► Other implementations of this way of communication are Ozymandns & iodine, both of who tunnel traffic over DNS

► Ozymandns & iodine

2013-11-25

Send: request TXT record for com.domain.sub.<base32encoded data>

Receive: reply in TXT record <base32encoded data>

Javier Ubillos

Why P2P?

Client

Client

► Offloading our servers

► Less bandwidth

► Less servers to buy & support

► Somewhat higher reliability

2013-11-25

Javier Ubillos

P2P overlay

Client

Client

► One P2P overlay for all tracks (actually three)

► Unstructured network (not DHT)

► All peers are equal (no supernodes)

► Nodes who are not active (do not play) drop out of the overlay

2013-11-25

Javier Ubillos

Peer coordination

Client

Client

Rendezvous

Spo

tify

bac

kend

► Server-side tracker (BitTorrent style)

► No overlay routing (e.g. Kademila)

► Neighborhood in overlay (Gnutella style)

► Server-side tracker

► Gnutella style queries (search) (2 hops)

► LAN peer discovery

2013-11-25

Javier Ubillos

Peer coordination

Client

Client

Rendezvous

Spo

tify

bac

kend

► Fixed maximum degree (60)

► Do not enforce fairness (e.g. tit-for-tat)

► Neighbor eviction by heuristic evaluation of utility

► No pro-active caching! Be kind to the client

► Nodes who are not active (do not play) drop out. The tracker return the last few hundreds of observed plays of each specific track. A balance between collaboration from nodes and not being abusive to our customers.

2013-11-25

Javier Ubillos

Rendezvous?

Client

Client

Rendezvous

Spo

tify

bac

kend

► We use TCP (almost) everywhere

► Both sender and receiver are instructed by the rendezvous service to connect to each other connect

► Typically, one of the two sides will be connectable

► We try to open ports using UPNP

► In some regions, UPNP and connectivity totally sucks! [1]

► On average, 65% … failure.

[1] Measurements on the spotify peer-assisted music-on-demand streaming systems, Mikael Goldmann & Gunnar Kreitz, P2P – IEEE 2011

2013-11-25

Javier Ubillos

Peer finding efficency

Client

Client

2013-11-25

Sources for peers Fraction of searches

Tracker and P2P 75.1%

Only tracker 9.0%

Only P2P 7.0%

No peers found 8.9%

► Most efficient during

► Weekends

► Peak hours

Javier Ubillos

Peer finding efficency

Client

Client

2013-11-25

Type Fraction

Music data, used 94.80%

Music data, unused 2.38%

Search overhead 2.33%

Other overhead 0.48%

► Measured at socket layer

► Unused data means cancelled/duplicate

Javier Ubillos

When should we start playing?

► Model the users bandwidth using a Markov chain

► Feed the Markov model with bandwidth transition observations made every 100ms

► Simulate 100 times

► If more than 1 would result in stuttering, download more

20KB/s 20KB/s

30KB/s

10KB/s

20KB/s

30KB/s

10KB/s

Finished

x%

y%

z%

x%

y%

z%

x%

y%

z%

2013-11-25

Javier Ubillos

Cellphones, my nemesis

► Spotify on handheld devices has proven to be very popular.

► But it breaks my baby (p2p)!

► Giving life to P2P on the cellphone is a very hard problem.

RBS RNCSpotify

backend

Client

Device

$£€!!

Latency, packetdrop, congestion

Client

2013-11-25

Javier Ubillos

Cellphones, discoveries so far

► PhD student Raul Jimenez (KTH) did a study on battery consumption

► The radio is up most of the time► The radio & screen are the biggest battery consumers

► We can optimize for radio downtime by buffering communication between the device and the backend.

► The current situation, where the radio is up most of the time suggests that P2P could be a viable option.

2013-11-25

Javier Ubillos

Client Accesspoint

The backend

2013-11-25

Accesspoint

Javier Ubillos

The backend

Client

► Built on many small services

► All traffic enter and exit via the Access Points

► Try to keep service interdependence low

► (Lots of) Small services

2013-11-25

Javier Ubillos

Client Accesspoint

Playlist

Search

Storage

User

Proprietary

Proprietary

HTTP

HTTP

HTTP► A change to a better protocol wouldallow us to.

► Route on service (not specific host)

► Message passing style communication

► Push and publish-subscribe will actually work

Backend internals► The request-reply style of RPC is limiting

►A (bad) alternative would beHTTP long-polling

2013-11-25

Javier Ubillos

Client Accesspoint

Playlist

Search

Storage

User

Proprietary

Subset of the backend protocol

Hermes

Hermes

Hermes

Hermes► Can be routed on the service name (~DNS)

► Request-reply, Push/Pull, Pub/Sub

► Message passing style communication

► Push and publish-subscribe will actually work

Backend internals► So, we did that.

► A “service oriented protocol” called Hermes

2013-11-25

Javier Ubillos

Playlist

► Our most complex service!

► Simultaneous writes with automatic conflict resolution

► Publish-subscribe to clients

► Changes automatically versioned, transmits deltas

► Terabyte sizes of data, over 500 million playlists

► Cassandra database

2013-11-25

Javier Ubillos

Data collection

► Four general monitoring mechanisms

► Plain logging of events

► Probabilistic logging (mainly to avoid data transfer overhead in the client)

► Munin (one datasample every fifth minute)

► In house built statistics library

2013-11-25

Data collection

► Business data as well as operational data are on two separate systems

► Business data take on average ~4h to propagate from producer to consumer

► Operational data take 5-10 minutes from production to display.

► Unreasonable to use data as foundation for decisions when developing in tight iterations.

Javier Ubillos 2013-11-25

Accesspoint data

► Produce both business data as well as operational data

► Produce about half of all data (most likely more, as it contains all user-produced logs as well)

► ~2TB every day

► Played songs, user login/logouts, p2p-data, user-views, user-events (e.g. clicks)

► All data in flying in as log-files

Javier Ubillos 2013-11-25

Javier Ubillos

Operational data

► For services in the backend we use munin and RRDs*

► Every datasource is sampled once every five minutes

► Collected by site-local data-collectors

► Forwarded to central data-repository

► Graphed and presented to service owners and operations

* Round Robbin Database Read/write data-store of constant size

2013-11-25

Javier Ubillos

Merging the datastreams

► How can we merge these datastreams?

► Pipe logs through kafka (publish-subscribe infrastructure developed by linked-in)

► Push to storm (developed by twitter)

► Back to kafka, deliver to subscribers (graphing systems)

► Have all operational metrics pushed to a narow waist (collectd)

► Publish to kafka, deliver to subscribers



2013-11-25

Javier Ubillos










2013-11-25

Linux kernel send-window limitations

Javier Ubillos










2013-11-25


Completly different message-passing paradigm

Javier Ubillos










2013-11-25


UDP too spammy, need to batch with TCP


Javier Ubillos










2013-11-25


UDP too spammy, need to batch with TCP


Websockets?Browsers don't handle high rate of packets very well.

Javier Ubillos

Thank you!

[email protected]

We are recruiting!2013-11-25

mailto:[email protected]

spotify infrastructure

Documents