!"" Million Active Users
!M
"!M
#$!M
#%!M
$&!M
'!!M
#""$ #""! #""% #""& #""' #""(
!""M
## property on the Internet (time on site)
)""s of billions of monthly page views
>$ trillion feed actions processed per day
)""s of millions of cache queries per second
Over # trillion objects cached
Over !"" million active users, half log in every day
)"" billion photo files stored
Over #" billion minutes spent every Day
Over # billion pieces of content uploaded every week
Move FastServer ScalingReliability
!"" Million Active Users
!M
"!M
#$!M
#%!M
$&!M
'!!M
#""$ #""! #""% #""& #""' #""(
!""M
Over ) million active users per engineer
!
"#!,!!!
#$!,!!!
%"!,!!!
&'!,!!!
(,"!!,!!!
Facebook Google* Amazon* Microsoft*
&!,"""$%,"""#$","""
#,#"","""
* Conservative estimates based on publicly available data
Few external deadlines... but
The site can’t go down
Frequent small changesNever a delay waiting for a pushEasier to isolate bugs
Major changes dark launched
Traditional websites
Bob
Bob’s data
Traditional websites
Bob
Bob’s data
Julie
Julie’s data
Dan
Dan’s data
Beth
Beth’s data
Sue
Sue’s data
Erin
Erin’s data
Facebook the data is interconnectedBob ErinBeth
Servers
Scale Horizontally
Database
Memcache
Web Server
Database
Memcache
Web Server
Database
Memcache
Web Server
Database
Memcache
Web Server
Network Incast
Memcache Memcache Memcache Memcache
Switch
PHP Client
Network Incast
Many SmallGet Requests
Memcache Memcache Memcache Memcache
Switch
PHP Client
Memcache Memcache Memcache Memcache
Switch
PHP Client
Many bigdata packets
Network Incast
Memcache Memcache Memcache Memcache
Switch
PHP Client
Network Incast
Memcache Memcache Memcache Memcache
Switch
PHP Client
Network Incast
Memcache Memcache Memcache Memcache
Switch
PHP Client
Network Incast
Reliability
Single Points of Failure
Single Points of FailureSoftware can be a SPOF
Don’t make small problems big
Don’t make small problems bigDon’t push problems upstream
Don’t make small problems bigDon’t push problems upstreamBe wary of “smart” failover
Don’t make small problems bigDon’t push problems upstreamBe wary of “smart” failoverShed load when you’re in trouble
If you lose half of your machinesyou’re doing well if you’re serving half of your traffic
Measure Everything
p(! vs p((
CultureAlways do a post-mortemRelease oftenControl and Responsibility
Facebook Platform
Lessons LearnedFederate everythingKeep failures containedMeasure distributionsUnderstand every problemMake a person responsible
(c) $!!( Facebook, Inc. or its licensors. )"Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. #.!