devopsdays silicon valley 2014 - the game of operations

32
The Game of Operations and The Operation of Games Randy Shoup @randyshoup linkedin.com/in/randyshoup DevOpsDays Silicon Valley, June 28 2014

Upload: randy-shoup

Post on 06-May-2015

997 views

Category:

Technology


4 download

DESCRIPTION

Operating online games is fun and challenging. Games are some of the spikiest workloads around, and real-time really means *real-time*. Randy shares many of the DevOps techniques his team has put into practice at KIXEYE: Cloud infrastructure, Service teams, and DevOps Culture. He talks about elastic workloads, micro-services, configuration automation, and a common service "chassis". He further discusses the organizational and technical disciplines of team autonomy, internal vendor-customer relationships, and, of course, "you build it, you run it"!

TRANSCRIPT

Page 1: DevOpsDays Silicon Valley 2014 - The Game of Operations

The Game of Operationsand

The Operation of Games

Randy Shoup @randyshoup

linkedin.com/in/randyshoup

DevOpsDays Silicon Valley, June 28 2014

Page 2: DevOpsDays Silicon Valley 2014 - The Game of Operations

Background

CTO at KIXEYE• Real-time strategy games for web and

mobile

Director of Engineering for Google App Engine• World’s largest Platform-as-a-Service

Chief Engineer at eBay• Multiple generations of eBay’s real-time

search infrastructure

Page 4: DevOpsDays Silicon Valley 2014 - The Game of Operations

40 Years Later …

tomeimmortalarena.com

Page 5: DevOpsDays Silicon Valley 2014 - The Game of Operations

Real-Time Strategy Games are …

• Real-time• Spiky• Computationally-

intensive• Constantly evolving• Constantly pushing

boundaries

Technically and operationally demanding

Page 6: DevOpsDays Silicon Valley 2014 - The Game of Operations

Operating Games: Goals

Player Fun• If players aren’t playing, we don’t have a business• If players aren’t having fun, we don’t have a

business for long• Fun includes game mechanics, feature set, uptime,

performance

Developer Productivity and Satisfaction• We are a vendor; the studios are our customers• Must be *strictly better* than the alternatives of

build, buy, borrow

Cost Efficiency• More output for less

Page 7: DevOpsDays Silicon Valley 2014 - The Game of Operations

The Game of Operations

Cloud• All studios and services moving to AWS• Strong focus on automation

Services• Small, focused teams • Clean, well-defined interface to

customers

DevOps Culture• One team across development and ops

Page 8: DevOpsDays Silicon Valley 2014 - The Game of Operations

The Game of Operations

Cloud

Services

DevOps Culture

Page 9: DevOpsDays Silicon Valley 2014 - The Game of Operations

Why Cloud? (The Obvious)

Provisioning Speed• Minutes, not weeks• Autoscaling in response to load

Near-Infinite Capacity• No need to predict and plan for growth• No need to defensively overprovision

Pay For What You Use• No “utilization risk” from owning / renting• If it’s not in use, spin it down

Page 10: DevOpsDays Silicon Valley 2014 - The Game of Operations

Why Cloud? (The Less Obvious)Instance Shaping• Instance shapes to fit most parts of the solution

space (compute-intensive, IO-intensive, etc.)• If one shape does not fit, try another

Service Quality• Amazon and Google know how to run data

centers• Battle-tested and highly automated• World-class networking, both cluster fabric and

external peering

Page 11: DevOpsDays Silicon Valley 2014 - The Game of Operations

Why Cloud? (Fundamental Forces)Economics• Nearly impossible to beat Google / Amazon

buying power or operating efficiencies• 2010s in computing are like 1910s in

electric power

Developer Adoption• It Just Works ™• Makes it easy to fall in love with

infrastructure

Page 12: DevOpsDays Silicon Valley 2014 - The Game of Operations

“Soon it will be just as common to run your own data center as it is to run your own electric power generation”

-- me

Page 13: DevOpsDays Silicon Valley 2014 - The Game of Operations

Autoscaling

Games are very spiky• Very unpredictable• Huge variability between peak and

trough

Hits are self-reinforcing

Page 14: DevOpsDays Silicon Valley 2014 - The Game of Operations

Automation Work at KIXEYE

Resilient Clients• Clients back off in response to latency• Clients continue gameplay despite

network disruption

Elastic Services• Services grow / shrink based on load• Service Cluster == AWS Auto Scale

Group

Page 15: DevOpsDays Silicon Valley 2014 - The Game of Operations

Automation Work at KIXEYE

Build / Deploy Pipeline• One button• Puppet -> Packer -> AMI -> Asgard• Zero-downtime red-black deployment• Futures: canarying, auto-rollback

Manageability• Puppet for configuration management• Flume -> ElasticSearch / Kibana for logging• Shinken -> PagerDuty for monitoring and

alerting

Page 16: DevOpsDays Silicon Valley 2014 - The Game of Operations

The Game of Operations

Cloud

Services

DevOps Culture

Page 17: DevOpsDays Silicon Valley 2014 - The Game of Operations

Service Teams

• Give teams autonomy• Freedom to choose technology,

methodology, working environment• Responsibility for the results of those

choices

• Hold them accountable for *results*• Give a team a goal, not a solution• Let team own the best way to achieve the

goal

Page 18: DevOpsDays Silicon Valley 2014 - The Game of Operations

KIXEYE Service Chassis

• Goal: “chassis” for building scalable game services

• Minimal resources, minimal direction• 3 people x 1 month• Consider building on NetflixOSS

Team exceeded expectations• Co-developed chassis, transport layer, service

template, build pipeline, red-black deployment, etc.• Operability and manageability from the beginning• 15 minutes from no code to running service in AWS

(!)• Open-sourced at github.com/kixeye

Page 19: DevOpsDays Silicon Valley 2014 - The Game of Operations

Micro-Services

Single-purposeSimple, well-defined interfaceModular and independentSmall teamsAutonomy and responsibility A

C D E

B

Page 20: DevOpsDays Silicon Valley 2014 - The Game of Operations

Transition to Service RelationshipsVendor – Customer Relationship• Friendly and cooperative, but structured• Clear ownership and division of

responsibility• Customer can choose to use service or not

(!)

Service-Level Agreement (SLA)• Promise of service levels by the provider• Customer needs to be able to rely on the

service, like a utility

Page 21: DevOpsDays Silicon Valley 2014 - The Game of Operations

Transition to Service RelationshipsCharging and Cost Allocation• Charge customers for *usage* of the

service• Aligns economic incentives of customer

and provider• Motivates both sides to optimize

Page 22: DevOpsDays Silicon Valley 2014 - The Game of Operations

The Game of Operations

Cloud

Services

DevOps Culture

Page 23: DevOpsDays Silicon Valley 2014 - The Game of Operations

One Team (!)

• Act as one team across development, product, operations, etc.

• Solve problems instead of blaming and pointing fingers

• Political games are not as fun as real-time strategy games

Page 24: DevOpsDays Silicon Valley 2014 - The Game of Operations

Everyone Is Responsible for ProdEveryone’s incentives are aligned

Everyone is strongly motivated to have solid instrumentation and monitoring

Page 25: DevOpsDays Silicon Valley 2014 - The Game of Operations

“DevOps is a reorg”

– Adrian Cockcroft

Page 26: DevOpsDays Silicon Valley 2014 - The Game of Operations

Blame-Free Post-Mortems

Learn from mistakes and improve• What did you do -> What did you learn• Take emotion and personalization out of

it

Post-mortem After Every Incident• Document exactly what happened• What went right• What went wrong

Page 27: DevOpsDays Silicon Valley 2014 - The Game of Operations

Blame-Free Post-Mortems

Open and Honest Discussion• What contributed to the incident?• What could we have done better?Engineers compete to take

responsibility (!)

Page 28: DevOpsDays Silicon Valley 2014 - The Game of Operations

“Failure is not falling down but refusing to get back up”

– Theodore Roosevelt

Page 29: DevOpsDays Silicon Valley 2014 - The Game of Operations

Transition to DevOps

Organization• Studios make user-visible games• Services provide common endpoints

Training / Retraining• Common bootcamp• Train devs as Ops, Ops as devs

Transition On-call• Use primary / secondary on-call as

apprenticeship

Page 30: DevOpsDays Silicon Valley 2014 - The Game of Operations

“You Build It, You Run It”

– Everyone

Page 31: DevOpsDays Silicon Valley 2014 - The Game of Operations

Recap: The Game of OperationsCloud

Services

DevOps

Page 32: DevOpsDays Silicon Valley 2014 - The Game of Operations

Come Join Us!

DevOps Whiskey Tasting, July 22333 Bush St., San Francisco

kixeyeloveswhiskey.eventbrite.com

Hiring in SF, Seattle, Victoria, Brisbane, Amsterdamwww.kixeye.com/jobs