scaling dropbox - qconf sf 11/08/2016 - qconsf.com

72
Scaling Dropbox PRESLAV LE, NOVEMBER 7TH, 2016

Upload: buibao

Post on 13-Feb-2017

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Scaling DropboxP R E S L AV L E , N O V E M B E R 7 T H , 2 0 1 6

Page 2: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com
Page 3: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com
Page 4: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

block.dropbox.com

Zone (west)

Zone (east)

Zone (central)

Page 5: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

block.dropbox.com

Zone (west)

Zone (east)

Zone (central)

Page 6: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

block.dropbox.com

Zone (west)

Zone (east)

Zone (central)

Page 7: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Fear of the unknown

Page 8: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

M E M O R Y L E A K

Page 9: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

S Y N C H O R N I Z AT I O N E V E N T

Page 10: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Success story

Page 11: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

T O D AY ’ S TA L K

• 2012

• SCALING CHALLENGES

• 2016

• Q&A

Page 12: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

P R E S L AV L E

• At Dropbox since 2013

• Projects: Magic Pocket, Infrastructure Performance, Traffic team

Page 13: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

F I L E , S Y N C & S H A R E

Page 14: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

5 0 0 M I L L I O N U S E R S

Page 15: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

2 0 1 2

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

Page 16: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

B L O C K D ATA I N S 3

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters AWS

Page 17: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

M E TA D ATA I N M Y S Q L

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacentersDropbox’s datacenters

Page 18: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

1 . F E T C H M E TA D ATA

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

metaserver

DB

LB

clients

Memcached

Page 19: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

2 . D O W N L O A D B L O C K S

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

blockserver

S3

LBLB

clients

Page 20: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

3 . WA I T F O R N OT I F I C AT I O N S

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

notification server

clients

metaserver

Page 21: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

P Y T H O N E V E R Y W H E R E

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

Page 22: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Dropbox’s datacenters

meta-clientmeta-client

meta-clientmeta-client

meta-clientmeta-web

meta-apimeta-api

meta-apimeta-mobile

meta-mobilemeta-mobile

C L U S T E R I SOLAT ION

Page 23: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Scaling Databases Scaling as Organization

Scaling Software Managing Complexity

Page 24: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

S C A L I N G D ATA B A S E S

mysqlmaster

mysqlreplica

mysqlreplica

metaserverMemcachedMemcachedMemcached

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

Page 25: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

H O R I Z O N TA L S C A L I N G

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaservermetaserver metaserver

Page 26: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

C O N N E C T I O N S

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaservermetaserver metaserver

Page 27: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

S Q L P R O X Y

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaservermetaserver metaserver

SQL Proxy SQL Proxy SQL Proxy

Page 28: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Scaling as Organization

Scaling Software Managing Complexity

Scaling Databases

Page 29: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

G L O B A L D ATA B A S E

Page 30: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

AVA I L A B I L I T Y I S S U E S

Page 31: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

P L AY B O O K

1. Check for ongoing deployments or newly enabled features

Page 32: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

P L AY B O O K

1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs

Page 33: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs 3. DBA oncall, please help!

P L AY B O O K

Page 34: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Dropbox grew from 100 to 500 employees

Page 35: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

• Slow queries would adversely impact performance across the board

Page 36: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

• Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL

Page 37: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

• Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL • Reactively (re)sharding individual databases as they hit capacity

Page 38: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

• Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL • Reactively (re)sharding individual databases as they hit capacity • Impacted developer productivity

Page 39: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

S C A L A B L E M E TA D ATA S T O R E D E S I G N E D F O R M U L T I -T E N A N C Y

KMOD ON SCALE

2013 — Present

Page 40: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

S H A R D I N G A N D C A C H I N G B E H I N D T H E S C E N E S

KMOD ON SCALE

Page 41: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

E N T I T I E S A N D A S S O C I AT I O N S

KMOD ON SCALE

Page 42: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

F I R S T G O S E R V I C E

Page 43: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Scaling Software

Scaling as Organization

Managing Complexity

Scaling Databases

Page 44: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com
Page 45: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

P E R F E C T S T O R M

Page 46: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

S H A R D I N G

Page 47: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

P H O T O A L B U M S

Page 48: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

T E A M A D M I N C O N S O L E

Page 49: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

R E Q U E S T F A N O U T

request

Page 50: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Colocation ID Counter

8 bytes 8 bytes

G L O B A L I D

• Colocation ID: Identifies a shard • Counter: Unique ID within the shard

Page 51: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Lack of colocation also hurts performance

Page 52: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

N E W S E R V I C E : F I L E J O U R N A L

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

Page 53: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

S H A R D F A I L U R E

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

shard1master

Page 54: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

S H A R D I N G ( PA R T I I )

Page 55: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

L O N G T I M E O U T S

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

shard1master

Page 56: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

R U N O U T O F W O R K E R S

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

shard1master

File JournalFile Journal File Journal

Page 57: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

C A S C A D I N G F A I L U R E

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

shard1master

File JournalFile Journal File Journal

metaserver metaserver metaserver metaservermetaserver metaserver

Page 58: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Limit resources dedicated to processing a single shard

S H A R D I SOLAT ION

Page 59: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Managing Complexity

Scaling as Organization

Scaling Software

Scaling Databases

Page 60: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

500PB+ user block data

3+ geographic regions

500+ million users

M A G I C P O C K E T B L O C K S T O R A G E S Y S T E M

Page 61: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Zone (west)

Zone (east)

Zone (central)

put

put putget get

get

Page 62: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

complicated!

☺simple

complicated!

complicated!

☹complicated!

complicated!

☹complicated!

Page 63: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

P Y T H O N , G O & R U S T

Page 64: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

https://blogs.dropbox.com/tech/

Page 65: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

2 0 1 6

meta-clientmeta-client

meta-clientmeta-client

meta-clientmeta-web

meta-apimeta-api

meta-apimeta-mobile

meta-mobilemeta-mobile

File JournalFile Journal

File JournalSearch

SearchSearch

AuthAuthAuthservice

Block RoutingBlock

RoutingBlock Routing

AuthAuth

Edgestore

AuthAuthPresence

&Notications

File JournalFile Journal

Cape…

blockserverblockserver

blockserver

Magic PocketMagic

PocketMagic Pocket

Blockservice

RivieraRivieraThumbnail

service

Page 66: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

H O W T O P R E V E N T C A S C A D I N G F A I L U R E ?

meta-clientmeta-client

meta-clientmeta-client

meta-clientmeta-web

meta-apimeta-api

meta-apimeta-mobile

meta-mobilemeta-mobile

File JournalFile Journal

File JournalSearch

SearchSearch

AuthAuthAuthservice

Block RoutingBlock

RoutingBlock Routing

AuthAuth

Edgestore

AuthAuthPresence

&Notications

File JournalFile Journal

Cape…

blockserverblockserver

blockserver

Magic PocketMagic

PocketMagic Pocket

Blockservice

RivieraRivieraThumbnail

service

Search

Page 67: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

B A N D A I D : P E R R O U T E I SOLAT ION

Page 68: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Q U E U E P R I O R I T I Z AT I O N

Page 69: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Partition & Isolate (data or services)

Page 70: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

cluster isolation:

data model isolation:

shard isolation:

region isolation:

route isolation:

Metaserver

Edgestore

File Journal

Magic Pocket

Bandaid

Page 71: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com

Isolation

Page 72: Scaling Dropbox - QConf SF 11/08/2016 - qconsf.com