travis newhouse @travis newhouse brian eckblad ... · viasat viasat is a global broadband services...
TRANSCRIPT
How Service Optimization Keeps ViaSat Flying
Mike Craft - @crafty_houseBrian Eckblad - @brianeckbladTravis Newhouse @travis_newhouse
OutlineWho is ViaSat?
Why OpenStack? Yussa lika some cloudz?
Current state of the cloud
Key areas of interest to be successful
Challenges and solutions
ViaSatViaSat is a global broadband services and technology company.
We provide consumer, commercial, and government customers with communications services and systems that exceed expectations for performance, anywhere in the world.
We think big, we act intelligently, and we’re not done…we’re just beginning.
www.viasat.com
Why OpenStack at ViaSat?Motivation:
● On-Demand Infrastructure with self service● Reduce infrastructure capital and operational costs● Transparent capacity scaling
Range of Applications …
● Customer-facing (Airline, Government, Residential, Commercial)● Employee-facing enterprise applications● Internal development and test environments
○ Enterprise, Service Delivery w/NFV
History of OpenStack at ViaSatInternal POC environment went into service mid-2014 used by select development teams - Havana w/OVS
Production design started in Sept-2014, buildout started in Early Nov-2014 with the first production cloud going live in Dec 2014 - Icehouse w/Linux Bridge
Production clouds are running Juno, Kilo and Liberty
For our production releases we partnered with Rackspace using both professional services and support allowing us to bring the cloud to the enterprise quicker
Evolving toward in-house supported deployments
OpenStack Operations at ViaSat5 private clouds deployed using OpenStack Ansible (OSA)
200+ hosts
7000+ instances
2+ PB storage between Cinder and Swift
Linux Bridge ML2 plugin
6 member devops team, no silos!
300+ internal users/customers
Unique aspectsProviding self-service IT to users
Supporting projects spanning private and public platforms?
Lean operations team. 1 operator to 1750+ instances
Highly dynamic workloads: 100+ instances created/deleted per day
Oversubscribed compute ratio of up to 4:1 depending on workload
Densest cloud supports NFV development: 50+ hosts, 2500+ instances, up to 80 instances per host
Network underlay is vendor agnostic
Operating a lean teamAvailability
Visibility
Cost Management
Capacity Planning
Self-service all the things
Availability is keyUnderstand your customer
Operator must know if resources are available
Real-time data + history for comparison against baseline
Visibility into hypervisor and instancesWhere does problem exist: hypervisor or instance?
Is hypervisor overloaded?
Which instance is generating IOPS load?
Manage infrastructure costPrivate cloud differs from pay-per-use model of public cloud
Organization must collaborate to utilize resource efficiently
Reclaim unused and under-utilized resources
What instances are under-utilized?
Does a user require a specific flavor size for an instance? Make the legos fit!
Capacity PlanningWhat is the utilization of the infrastructure? Memory? Disk? CPU?
What is the utilization trend over time?
When will infrastructure resources be exhausted?
Self-Service for UsersPartnered with AppFormix to provide monitoring as a service for projects and instances
Expose underlying infrastructure data points
Enables users to answer questions about their resource utilization
Users can understand issues around hypervisor and storage health
Transparency is key to empower users
Standardized network design
Challenge: Right-sizing InstancesStarted with custom flavors for everything.● Not scalable for operations team● Inefficient workload placement, e.g., CPU exhaustion vs disk.
Now, standardized on flavor sizes:● Avoids resource fragmentation● Improves capacity planning
Users need data to choose right size
Challenge: Right-sizing Instances
Challenge: Right-sizing Instances
Challenge: Hypervisor healthVirtual memory thrashing
- Is it instance memory oversubscription?- Is it disk block cache exhaustion?
CPU contention
Disk I/O
- Latency issues- Tenant misusing software RAID over Cinder LVM
Challenge: Hypervisor health
Challenge: Right-sizing network Initially allowed each project to request a network of any size or design. To many snowflakes.
Standardized on L2/L3 project design
Standardized IP project allocations
Re-Architected the underlay to simplify the design and provide better tenant isolation
Resulted in a better end user experience
Questions?