spotify: horizontal scalability for great success

28
EuroPython - 22 June 2011 Horizontal Scalability for Great Success Nick Barkas @snb onsdag den 22 juni 2011

Upload: nick-barkas

Post on 23-Jan-2015

2.429 views

Category:

Technology


0 download

DESCRIPTION

Talk for EuroPython 2011 by Nick Barkas from Spotify. Discussion of some things to consider when building a scalable network service, including details about how we handle the challenges that come along with this at Spotify

TRANSCRIPT

Page 1: Spotify: Horizontal Scalability for Great Success

EuroPython - 22 June 2011

Horizontal Scalability for Great Success

Nick Barkas@snb

onsdag den 22 juni 2011

Page 2: Spotify: Horizontal Scalability for Great Success

Outline

Introduction

• Spotify

• Kinds of scalability

Designing scalable network applications

• Distributing work

• Handling shared data

Related Spotify tools and methods

• Supervision

• Round-robin DNS and SRV records

• Distributed hash tables (DHT)

onsdag den 22 juni 2011

Page 3: Spotify: Horizontal Scalability for Great Success

Introduction

onsdag den 22 juni 2011

Page 4: Spotify: Horizontal Scalability for Great Success

What is Spotify?On-demand music streaming service

Also play your music files, or buy mp3 downloads

Premium subscriptions with mobile and offline support, or free with ads

Create playlists available on any computer or device with Spotify

Connect with friends via Facebook and send songs to each other

Sync downloads and your own music files with iPods

Available today in Sweden, Norway, Finland, UK, Netherlands, France, and Spain

onsdag den 22 juni 2011

Page 5: Spotify: Horizontal Scalability for Great Success

Overview of Spotify network

accesspoint

playlist

search

storage user

web api

browse ...

Backend services

Clients

onsdag den 22 juni 2011

Page 6: Spotify: Horizontal Scalability for Great Success

Scaling vertically: “bigger” machines

+Maybe no or only small code changes

+Fewer servers is easier operationally

-Hardware prices don’t scale linearly

-Servers can only get so big

-Multithreading can be hard

-Single point of failure (SPOF)

onsdag den 22 juni 2011

Page 7: Spotify: Horizontal Scalability for Great Success

Scaling horizontally: more machines

+You can always add more machines!

+No threads (maybe)

+Possible to run in “the cloud” (EC2, Rackspace)

-Need some kind of load balancer

-Data sharing/synchronization can be hard

-Complexity: many pieces, maybe hidden SPOFs

±Fundamental to the application’s design

onsdag den 22 juni 2011

Page 8: Spotify: Horizontal Scalability for Great Success

Why horizontal for Spotify?

We are too big

• Over 13 million songs

• And over 10 million users, who have lots of playlists

CPython kind of doesn’t give us a choice anyway

• Global interpreter lock (GIL) = no simultaneous threads

onsdag den 22 juni 2011

Page 9: Spotify: Horizontal Scalability for Great Success

Why the GIL is kind of a good thing

Forces horizontally scalable design

Multiple cores require multiple Python processes

• Basically the same when scaling to multiple machines

Multi-process apps encourage share-nothing design

• Sharing nothing avoids difficult, slow synchronization

onsdag den 22 juni 2011

Page 10: Spotify: Horizontal Scalability for Great Success

Designing scalable network applications

onsdag den 22 juni 2011

Page 11: Spotify: Horizontal Scalability for Great Success

Separate services for separate features

The UNIX way: small, simple programs doing one thing well

• Can do the same with network services

• Simple applications are easier to scale

• Can focus on services with high usage/availability needs

• Development is fast and scalable too

‣ If well-defined interfaces between services

onsdag den 22 juni 2011

Page 12: Spotify: Horizontal Scalability for Great Success

Many instances of each service

N instances/machine where N <= # cores, many machines

Need a way to spread requests amongst instances

• Hardware load balancers

• Round-robin DNS

• Proxy servers (Varnish, Nginx, Squid, LigHTTPD, Apache...)

onsdag den 22 juni 2011

Page 13: Spotify: Horizontal Scalability for Great Success

Sharding data

Each server/instance responsible for subset of data

Can be easy if you share nothing

Must direct client to instance that has its data

Harder if you want things like replication

onsdag den 22 juni 2011

Page 14: Spotify: Horizontal Scalability for Great Success

Brewer’s CAP theorem

You only get to have one or two

Consistency

PartitionTolerance

Availability

Image: http://thecake.info/

onsdag den 22 juni 2011

Page 15: Spotify: Horizontal Scalability for Great Success

Brewer’s CAP theorem

You only get to have one or two. The cake is a lie.

Consistency

PartitionTolerance

Availability

Image: http://thecake.info/

onsdag den 22 juni 2011

Page 16: Spotify: Horizontal Scalability for Great Success

Eventual consistency

Lots of NoSQLish options work this way

• Reads of just written data not guaranteed to be up-to-date

Example: Cassandra

• Combination of ideas from Dynamo and BigTable

• Available (fast writes, replication)

• Partition tolerant (retries later if replica node unreachable)

• Also can get consistency if willing to sacrifice the other two

• But rather young project, big learning curve

onsdag den 22 juni 2011

Page 17: Spotify: Horizontal Scalability for Great Success

Sometimes you need consistency

Locking, atomic operations

• Creating globally unique keys, e.g. usernames

• Transactions, e.g. billing

PostgreSQL (and other RDBMSs) are great at this

• Availability via replication, hot standby masters

• Store only what you absolutely must in global databases

onsdag den 22 juni 2011

Page 18: Spotify: Horizontal Scalability for Great Success

Tips for many instances of a service

Processor affinity

Watch out for connection limits (e.g. in RBDMS)

Lots of processes can share memcached

OS page cache for read-heavy data

onsdag den 22 juni 2011

Page 19: Spotify: Horizontal Scalability for Great Success

Related Spotify tools and methods

onsdag den 22 juni 2011

Page 20: Spotify: Horizontal Scalability for Great Success

Supervision

Spotify developed daemon that launches other daemons

• Usually as many instances as cores - 1

• Restarts supervised instance if one fails

• Also restarts all instances of an application on upgrade

See also: systemd

onsdag den 22 juni 2011

Page 21: Spotify: Horizontal Scalability for Great Success

Finding services and load balancing

Each service has an SRV DNS record

• One record with same name for each service instance

• Clients (AP) resolve to find servers providing that service

• Lowest priority record is chosen with weighted shuffle

• Clients must retry other instances in case of failures

onsdag den 22 juni 2011

Page 22: Spotify: Horizontal Scalability for Great Success

Finding services and load balancing

Sometimes also use Varnish or Nginx for HTTP services

• Can have caching too

_frobnicator._http.example.com. 3600 SRV 10 50 8081 frob1.example.com.

name TTL type prio weight port host

Example SRV record

onsdag den 22 juni 2011

Page 23: Spotify: Horizontal Scalability for Great Success

Distributed hash tables (DHT) in DNS

• Distributes data among service instances

‣ Instance B owns keys in (14, 3f], E owns (9e, c1]

• Redundancy? Hash again, write to replica instance

• Must transition data when ring changes

• Good also for non-sharded data: cache locality

C

B

AF

D

E

14

3f

68

9e

c1

e7

key k = 1c

key j = bd

Service instances correspond to a range of hash keys

onsdag den 22 juni 2011

Page 24: Spotify: Horizontal Scalability for Great Success

DHT DNS record examples

tokens.8081.frob1.example.com. 3600 TXT “00112233445566778899aabbccddeeff”name: tokens.port.host. TTL type last key

config._frobnicator._http.example.com. 3600 TXT “slaves=0”name: config.srv_name. TTL type no replication

config._frobnicator._http.example.com. 3600 TXT “slaves=2 redundancy=host”name: config.srv_name. TTL type three replicas on separate hosts

Ring segment, per instance

Configuration of DHT

onsdag den 22 juni 2011

Page 25: Spotify: Horizontal Scalability for Great Success

Further reading about DHTs

“Chord: A scalable peer-to-peer lookup service for internet applications”

• http://portal.acm.org/citation.cfm?id=964723.383071

“Dynamo: Amazon’s Highly Available Key-value Store”

• http://portal.acm.org/citation.cfm?id=1294281

onsdag den 22 juni 2011

Page 26: Spotify: Horizontal Scalability for Great Success

One last thing to remember/dev/null is web scale!

Image: http://www.xtranormal.com/watch/6995033/

onsdag den 22 juni 2011

Page 27: Spotify: Horizontal Scalability for Great Success

Questions?Or write me something: [email protected], @snb on Twitter

onsdag den 22 juni 2011

Page 28: Spotify: Horizontal Scalability for Great Success

Thank you

onsdag den 22 juni 2011