netflix story of embracing the cloud
DESCRIPTION
Neil Hunt and Yury Izrailevsky talk 2012 AWS re:invent conference about embracing the cloudTRANSCRIPT
![Page 1: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/1.jpg)
Netflix: Embracing the Cloud
Neil Hunt, CPO / Yury Izrailevsky, VP Engineering
![Page 2: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/2.jpg)
Embracing the Cloud:Confronting the Challenge
Neil Hunt
![Page 3: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/3.jpg)
Motivation
Netflix – Service Unavailable – Database Crashed
Rest assured that the right peopleare losing sleep to fix this problem!
We expect to resume service in approximately 72h
12 Aug 2008 03:12am
![Page 4: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/4.jpg)
A Business in Transition
OLD – DVD delivery
• Value from DVDs at home• Website load small and
predictable
• Traditional DC technology:• Linux, Apache, Oracle, Java
NEW – Streaming
• Value via Internet delivery• Website and APIs high load
and rapidly growing
• Need more robustness• Cloud as opportunity for
fresh start
![Page 5: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/5.jpg)
Mission: Cloud – High Level Goals
Availability
Scale Performance
4 x nines
Unconstrainedhorizontal scaling
Unlimitedcompute
![Page 6: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/6.jpg)
Forklift, or Rewrite?
OLD NEW
MonolithicApp
Oracle NoSQL
Service
Assembly
![Page 7: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/7.jpg)
Old Style – A large 18 wheeler
• Big• Reliable• Efficient (when full)
• Expensive• Inflexible capacity• Many single points of failure
![Page 8: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/8.jpg)
New Style – A fleet of leased pickups with drivers
• Scalable to small or large loads• Reliability through redundancy• Requires rethinking the whole problem
![Page 9: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/9.jpg)
SQL or NoSQL?
MySQL/RDB:
• Developer familiarity
• Developers imagine transactional consistency requirements in every scenario
NoSQL
• Availability & Scale
• Avoid overhead and riskof managing SQL
• Experimented with both• Ended up with NoSQL for almost everything important
![Page 10: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/10.jpg)
Service Oriented Architecture
• Optimizes for small independent teams with well-defined interfaces
• Better independence from subsystem failures
• Scaling applied to each tier separately NoSQL
![Page 11: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/11.jpg)
How to Manage the Migration?Rebuilding a complex system while in operation
NoSQL
MonolithicApp
Oracle
![Page 12: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/12.jpg)
Transitional Infrastructure: “Roman Riding”
![Page 13: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/13.jpg)
Transitional Infrastructure: Create a read-only copy
NoSQL
Source of Truth
Display onlyExample: Membership records
MonolithicApp
Oracle
![Page 14: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/14.jpg)
Transitional Infrastructure: Move the master copy
NoSQL
Source of Truth
Display only
Example: AB Test Data (account tags controlling test experience)
MonolithicApp
Oracle
![Page 15: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/15.jpg)
Transitional Infrastructure: Full Multi-Master duplicate
NoSQL
Multi-master
Example: Queue
MonolithicApp
Oracle
![Page 16: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/16.jpg)
Organizational Challenges
IT Ops• Initial extensive role
managing legacy DC• Raised visibility during
transition• New DC vulnerabilities
and dependencies to manage
DevOps:• Components at a higher
level abstraction• More opportunities for
automation• Automated build-push tools• Autoscaling• Monitoring and automatic
cutouts and failover
A gradually diminishing role A rapidly expanding role
![Page 17: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/17.jpg)
The Journey
Phase Components Data & PrerequisitesTrial (2009) Streaming Player Content keys (RO)
Membership status (RO)
Development(2010-11)
Member product pages and APIs
Content catalog (RW)Personalization data (RW) & recs algorithmsAB Test data (RW)
Followthrough(2011-12)
Account and membership
Membership data (RW)
Final (2013) Payments PCI and SOX data
![Page 18: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/18.jpg)
Lessons Learned…
• Embrace the whole concept:Take the opportunity to build a modern architecturerather than forklifting SQL and monolithic apps
• Plan to discard your first experimentsYou’ll learn so much that you’ll be glad to redo it right
• Invest in transitional infrastructure:Migration will take a while,and it’s worth the effort to make it easy
• Expect your team to learn new ways …… but some won’t make the transition
![Page 19: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/19.jpg)
Embracing the Cloud:Delivering the Cloud Solution
Yury Izrailevsky
![Page 20: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/20.jpg)
Mission: Cloud – High Level Goals
Availability4 x nines
ScaleUnconstrained
horizontal scaling
PerformanceUnlimitedcompute
![Page 21: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/21.jpg)
PerformanceScalability Availability
![Page 22: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/22.jpg)
PerformanceScalability Availability
![Page 23: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/23.jpg)
23
1/4/
2009
2/5/
2009
3/9/
2009
4/10
/200
9
5/12
/200
9
6/13
/200
9
7/15
/200
9
8/16
/200
9
9/17
/200
9
10/1
9/20
09
11/2
0/20
09
12/2
2/20
09
1/23
/201
0
2/24
/201
0
3/28
/201
0
4/29
/201
0
5/31
/201
0
7/2/
2010
8/3/
2010
9/4/
2010
10/6
/201
0
11/7
/201
0
12/9
/201
0
1/10
/201
1
2/11
/201
1
3/15
/201
1
4/16
/201
1
5/18
/201
1
6/19
/201
1
7/21
/201
1
8/22
/201
1
9/23
/201
1
10/2
5/20
11
11/2
6/20
11
12/2
8/20
11
1/29
/201
2
3/1/
2012
4/2/
2012
5/4/
2012
6/5/
2012
7/7/
2012
8/8/
2012
Scaling Netflix Streaming Service: Weekly Streaming Starts
![Page 24: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/24.jpg)
Netflix Cross-Regional Cloud Architecture
![Page 25: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/25.jpg)
Goal: Regional Failover
![Page 26: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/26.jpg)
Building Global Netflix Streaming Product
![Page 27: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/27.jpg)
PerformanceScalability Availability
![Page 28: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/28.jpg)
Weekly Cloud Cost Per Streaming Start (last 12 months)
28
![Page 29: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/29.jpg)
Simian Army: Cloud Efficiency Automation
Janitor Monkey
Regularly scrape unused capacity
Clean up instances, ASGs, ELBs, SGs, etc.
Efficiency Monkey
AI-based resource under-usage detection (CPU, memory, etc.)
Automated Deletion of Old Data
TTL for S3 (using ObjectExpiration)
29
![Page 30: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/30.jpg)
Cyclical Streaming Usage Pattern
30
![Page 31: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/31.jpg)
Load-Based Auto Scaling
3131
50%+ Cost SavingScale up/down
by 70%+
Move to Load-Based Scaling
![Page 32: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/32.jpg)
PerformanceScalability Availability
![Page 33: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/33.jpg)
A Truly Great Service…
33
Availability Goal: 99.99%(30 secs/week at peak traffic)
Has To Just Work!
![Page 34: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/34.jpg)
7/17
/201
1
7/31
/201
1
8/14
/201
1
8/28
/201
1
9/11
/201
1
9/25
/201
1
10/9
/201
1
10/2
3/20
11
11/6
/201
1
11/2
0/20
11
12/4
/201
1
12/1
8/20
11
1/1/
2012
1/15
/201
2
1/29
/201
2
2/12
/201
2
2/26
/201
2
3/11
/201
2
3/25
/201
2
4/8/
2012
4/22
/201
2
5/6/
2012
5/20
/201
2
6/3/
2012
6/17
/201
2
7/1/
2012
7/15
/201
2
7/29
/201
2
8/12
/201
2
8/26
/201
2
9/9/
2012
9/23
/201
2
10/7
/201
2
10/2
1/20
12
11/4
/201
2
June 29th, 2012 AWS / Netflix Outage
Other AWS Outages
Historical Streaming Availability (13wkMA)
Using Redundancy in AWS Infrastructure to Survive Failures
![Page 35: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/35.jpg)
Cascading Failures
35
API
InstantQueue
SimpleDB
![Page 36: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/36.jpg)
Netflix Cloud Architecture
36
![Page 37: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/37.jpg)
Cascading Failures
37
99% Availability
X …
99% 300 = 4.90%
99% Availability 99% Availability
![Page 38: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/38.jpg)
Strategies to Improve Availability
38
Graceful Degradation Redundancy
![Page 39: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/39.jpg)
Graceful Degradation
39
![Page 40: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/40.jpg)
Redundancy
40
Zone A
Zone B
Zone C
Redundancy Across Availability Zones
Storage Redundancy Across Regions,
Vendors
S3 Backup
Secure Cloud Backup
A B C
Cassandra
![Page 41: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/41.jpg)
Testing Fault Tolerance: Simian Army
41
Chaos Monkey Latency Monkey Chaos Gorilla
![Page 42: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/42.jpg)
Open Source Portal at http://netflix.github.com
![Page 43: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/43.jpg)
Superstorm Sandy
AWS Infrastructure Held Up
>2x Netflix Streaming Usage in East Coast Markets
Boston
New York
Philadelphia
Baltimore
D.C.
![Page 44: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/44.jpg)
Focus on Building a Great Streaming Product
44
![Page 45: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/45.jpg)
Netflix at 2012 re:Invent
Date/Time Presenter Topic
Wed 8:30-10:00 Reed Hastings Keynote with Andy Jassy
Wed 1:00-1:45 Coburn Watson Optimizing Costs with AWS
Wed 2:05-2:55 Kevin McEntee Netflix’s Transcoding Transformation
Wed 3:25-4:15 Neil Hunt / Yury I. Netflix: Embracing the Cloud
Wed 4:30-5:20 Adrian Cockcroft High Availability Architecture at Netflix
Thu 10:30-11:20 Jeremy Edberg Rainmakers – Operating Clouds
Thu 11:35-12:25 Kurt Brown Data Science with Elastic Map Reduce (EMR)
Thu 11:35-12:25 Jason Chan Security Panel: Learn from CISOs working with AWS
Thu 3:00-3:50 Adrian Cockcroft Compute & Networking Masters Customer Panel
Thu 3:00-3:50 Ruslan M./Gregg U. Optimizing Your Cassandra Database on AWS
Thu 4:05-4:55 Ariel Tseitlin Intro to Chaos Monkey and the Simian Army
![Page 46: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/46.jpg)
We are sincerely eager to hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation form when you have a
chance.
![Page 47: Netflix Story of Embracing the Cloud](https://reader038.vdocuments.us/reader038/viewer/2022102922/5462d5b4af79594d4d8b6fc2/html5/thumbnails/47.jpg)
We are sincerely eager to hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation form when you have a
chance.