cloud capacity planning..an oxymoron? - south bay sre meetup aug-09-2016
Post on 10-Apr-2017
237 Views
Preview:
TRANSCRIPT
● Cloud Capacity Planning..an Oxymoron?
● Santa Cloud: How Netflix Does Holiday Capacity Planning
● The Data Behind the Planning
Presenting...
● > 83M households
● 190 Countries
● 35% of Internet traffic in US at peak
● Entirely on Cloud*, three regions
● Evacuate a region monthly...for 24 hours
● Capacity planning ~ 5 people! (in the room :-)
* Content served from homegrown OpenConnect CDN
Capacity Planning Concerns
● Facility considerations (Space, Power, Network, Cooling)
● Supply Chain Management Constraints and Relationships
● Hardware lifetime contour & failure rates (MTBF)
● Systems management staff
● Seasonal and unexpected burst considerations
● Workload colocation and performance demands
● Over-provisioning for reliability and rate of innovation
● Effective tooling
● Business continuity planning
(Cloud) Capacity Planning Concerns
● Facility considerations (Power, Network, Cooling)
● Supply Chain Management Constraints and Relationships
● Hardware lifetime contour & failure rates (MTBF)
● Systems management staff
● Seasonal and unexpected burst considerations
● Workload colocation and performance demands
● Over-provisioning for reliability and rate of innovation
● Effective tooling
● Business continuity planning
Cloud-specific CP Factors
● Capacity bounds..unknown (-)
● Vendor Decisions (-/+)
○ Hardware/Offering Evolution Timeline
○ Resource Demand (CPU/Mem/Disk/Net) Matrix
● On-Demand Capability (+)
Netflix Model
● Depend on the AWS on-demand pool for elasticity
● Monitor insufficient capacity exceptions (ICEs) for boundaries
● Invest heavily in 3 year reservations
● Maintain relatively few, large reserved pools
● Cloud Capacity Analytics team develops tools for insight
● Leverage cross-account resource borrowing
Considerations of Scale
● Capacity required for critical footprint might require “guarantees”
● API-based observability has limits
● All resources have capacity limits/throttles
● Resource limits by default set for lowest common denominator
● Get creative with unused, but paid for capacity
● Billing file size!
Coburn Watson
● Director of Performance and Reliability at Netflix
○ Site Reliability Engineering, Performance and OS Engineering, Traffic Management, Chaos Engineering,
Capacity Planning, Cloud Network Engineering
● @coburnw, cwatson@netflix.com
● Looking for some great capacity planning-minded folks
● Performance and Reliability Youtube Channel
top related