devops chicago - the game of operations and the operation of games
DESCRIPTION
Operating online games is fun and challenging. Games are some of the spikiest workloads around, and real-time really means *real-time*. Randy shares many of the DevOps techniques he has been putting into practice at KIXEYE, including migrating to the cloud, organizing around services, and focusing on automation. He illustrates his points with war stories from operating large-scale services at Google and eBay. Please see companion video at https://vimeo.com/95841677.TRANSCRIPT
![Page 1: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/1.jpg)
The Game of Operationsand
The Operation of Games
Randy Shoup @randyshoup
linkedin.com/in/randyshoup
DevOps Chicago Meetup, May 19 2014
![Page 2: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/2.jpg)
Background
CTO at KIXEYE• Real-time strategy games for web and
mobile
Director of Engineering for Google App Engine• World’s largest Platform-as-a-Service
Chief Engineer at eBay• Multiple generations of eBay’s real-time
search infrastructure
![Page 3: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/3.jpg)
Real-Time Strategy Games are …
• Real-time• Spiky• Computationally-
intensive• Constantly evolving• Constantly pushing
boundaries
Technically and operationally demanding
![Page 4: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/4.jpg)
Operating Games: Goals
Player Fun• If players aren’t playing, we don’t have a business• If players aren’t having fun, we don’t have a business for long• Fun includes game mechanics, feature set, quality,
performance
Studio Velocity• 8 *highly independent* game studios• Different tech stacks, tool chains, phases of development
Developer Productivity and Satisfaction• We are a vendor; the studios are our customers• Must be *strictly better* than the alternatives of build, buy,
borrow
Cost Efficiency• More output for less
![Page 5: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/5.jpg)
The Game of Operations
Cloud• All studios and services moving to AWS• Strong focus on automation
Services• Small, focused teams • Clean, well-defined interface to customers
DevOps• Developers behave like Ops• Ops behaves like Developers
![Page 6: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/6.jpg)
The Game of Operations
Cloud
Services
DevOps
![Page 7: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/7.jpg)
Why Cloud? (The Obvious)
Provisioning Speed• Minutes, not weeks• Autoscaling in response to load
Near-Infinite Capacity• No need to predict and plan for growth• No need to defensively overprovision
Pay For What You Use• No “utilization risk” from owning / renting• If it’s not in use, spin it down
![Page 8: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/8.jpg)
Why Cloud? (The Less Obvious)Instance Optimization Opportunities• Instance shapes to fit most parts of the solution
space (compute-intensive, IO-intensive, etc.)• If the shape does not fit, try another
Service Quality• Amazon and Google know how to run data
centers• Battle-tested and highly automated• World-class networking, both cluster fabric and
external peering
![Page 9: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/9.jpg)
Why Cloud? (The Fundamentals)Right Side of History• Almost impossible to beat Google / Amazon
buying power or operating efficiencies• 2010s in computing are like 1910s in electric
power• Soon it will be just as common to run your own
data center as it is to run your own electric power generation (!)
Easy and Fun• It Just Works ™• Makes it easy to fall in love with infrastructure
![Page 10: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/10.jpg)
Autoscaling
Games are very spiky• Very unpredictable• Huge variability between peak and trough• Hits are self-reinforcing
Services and clients have to “flex”• Clients back off in response to latency• Services grow / shrink based on load
Service Cluster == AWS Auto-Scale Group• Scale up or down based on predefined metrics,
thresholds
![Page 11: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/11.jpg)
Automation Work at KIXEYE
Build / Deploy Pipeline• One button• Puppet -> Packer -> AMI -> Asgard• No-downtime red-black deployment• Futures: canarying, auto-rollback
Manageability• Flume -> ElasticSearch / Kibana for logging• Shinken -> PagerDuty for monitoring and
alerting
![Page 12: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/12.jpg)
The Game of Operations
Cloud
Services
DevOps
![Page 13: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/13.jpg)
Service Teams
• Give teams autonomy• Freedom to choose technology,
methodology, working environment• Responsibility for the results of those
choices
• Hold them accountable for *results*• Give a team a goal, not a solution• Let team own the best way to achieve the
goal
![Page 14: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/14.jpg)
KIXEYE Service Chassis
• Goal: Produce a “chassis” for building scalable game services
• Minimal resources, minimal direction• 3 people x 1 month• Consider building on open source projects
Team exceeded expectations• Co-developed chassis, transport layer, service template,
build pipeline, red-black deployment, etc.• Operability and manageability from the beginning• Heavy use of Netflix open source projects• 15 minutes from no code to running service in AWS (!)• Plan to open-source several parts of this work
![Page 15: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/15.jpg)
Micro-Services
SimpleWell-defined interfaceSingle-purposeModular and independentSmall teamsAutonomy and responsibility
A
C D E
B
![Page 16: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/16.jpg)
Transition to Building ServicesCommon Chassis
• Make it trivially easy to build and maintain a service
Define Service Interface (Formally!)• Propose, Discuss, Agree
Prototype Implementation• Simplest thing that could possibly work• Client can integrate with prototype• Implementor can learn what works and what does not
Real Implementation• Throw away the prototype (!)
Rinse and Repeat
![Page 17: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/17.jpg)
Transition to Service RelationshipsVendor – Customer Relationship
• Friendly and cooperative, but structured• Clear ownership and division of responsibility• Customer can choose to use service or not (!)
Service-Level Agreement (SLA)• Promise of service levels by the service provider• Customer needs to be able to rely on the service, like a
utility
Charging and Cost Allocation• Charge customers for *usage* of the service• Aligns economic incentives of customer and provider• Motivates both sides to optimize
![Page 18: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/18.jpg)
The Game of Operations
Cloud
Services
DevOps
![Page 19: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/19.jpg)
Instrumentation and Measurement
Instrument Everything• Machine / instance stats: CPU, memory, I/O• Software infrastructure stats: database, message
queue• Application stats: game client, game server, services
Make It Easy to Do the Right Thing ™• Easy, reliable, low-latency• Auto-tagged and searchable
Why?• Measurement beats intuition every time; my own
intuition is usually wrong • If you need to ssh into a box, instrumentation failed
you
![Page 20: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/20.jpg)
One Team (!)
• Act as one team across development, product, operations, etc.
• Solve problems instead of blaming and pointing fingers
• Political games are not as fun as real-time strategy games
![Page 21: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/21.jpg)
Everyone Is Responsible for ProdEveryone’s incentives are aligned
Everyone is strongly motivated to have solid instrumentation and monitoring
![Page 22: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/22.jpg)
Organization: Learning CultureLearn from mistakes and improve• What did you do -> What did you learn• Take emotion and personalization out of
it
Encourage iteration and velocity• “Failure is not falling down but refusing
to get back up” – Theodore Roosevelt
![Page 23: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/23.jpg)
Google Blame-Free Post-MortemsPost-mortem After Every Incident• Document exactly what happened• What went right• What went wrong
Open and Honest Discussion• What contributed to the incident?• What could we have done better?Engineers compete to take personal
responsibility (!)
![Page 24: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/24.jpg)
Transition to DevOps
Organization• Studios make user-visible games• Services provide common endpoints
Training / Retraining• Common bootcamp• Train devs as Ops, Ops as devs
You Build It, You Run It• Transition on-call• Use primary / secondary on-call as
apprenticeship
![Page 25: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/25.jpg)
Recap: The Game of OperationsCloud
Services
DevOps
![Page 26: DevOps Chicago - The Game Of Operations and the Operation of Games](https://reader033.vdocuments.us/reader033/viewer/2022060107/554bb2b0b4c90594278b45c3/html5/thumbnails/26.jpg)
Come Join Us!
KIXEYE is hiring in SF, Seattle, Victoria, Brisbane, Amsterdam
@[email protected]/in/randyshoupslideshare.net/randyshoup