devopsdays silicon valley 2014 - the game of operations

Post on 06-May-2015

997 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Operating online games is fun and challenging. Games are some of the spikiest workloads around, and real-time really means *real-time*. Randy shares many of the DevOps techniques his team has put into practice at KIXEYE: Cloud infrastructure, Service teams, and DevOps Culture. He talks about elastic workloads, micro-services, configuration automation, and a common service "chassis". He further discusses the organizational and technical disciplines of team autonomy, internal vendor-customer relationships, and, of course, "you build it, you run it"!

TRANSCRIPT

The Game of Operationsand

The Operation of Games

Randy Shoup @randyshoup

linkedin.com/in/randyshoup

DevOpsDays Silicon Valley, June 28 2014

Background

CTO at KIXEYE• Real-time strategy games for web and

mobile

Director of Engineering for Google App Engine• World’s largest Platform-as-a-Service

Chief Engineer at eBay• Multiple generations of eBay’s real-time

search infrastructure

40 Years Later …

tomeimmortalarena.com

Real-Time Strategy Games are …

• Real-time• Spiky• Computationally-

intensive• Constantly evolving• Constantly pushing

boundaries

Technically and operationally demanding

Operating Games: Goals

Player Fun• If players aren’t playing, we don’t have a business• If players aren’t having fun, we don’t have a

business for long• Fun includes game mechanics, feature set, uptime,

performance

Developer Productivity and Satisfaction• We are a vendor; the studios are our customers• Must be *strictly better* than the alternatives of

build, buy, borrow

Cost Efficiency• More output for less

The Game of Operations

Cloud• All studios and services moving to AWS• Strong focus on automation

Services• Small, focused teams • Clean, well-defined interface to

customers

DevOps Culture• One team across development and ops

The Game of Operations

Cloud

Services

DevOps Culture

Why Cloud? (The Obvious)

Provisioning Speed• Minutes, not weeks• Autoscaling in response to load

Near-Infinite Capacity• No need to predict and plan for growth• No need to defensively overprovision

Pay For What You Use• No “utilization risk” from owning / renting• If it’s not in use, spin it down

Why Cloud? (The Less Obvious)Instance Shaping• Instance shapes to fit most parts of the solution

space (compute-intensive, IO-intensive, etc.)• If one shape does not fit, try another

Service Quality• Amazon and Google know how to run data

centers• Battle-tested and highly automated• World-class networking, both cluster fabric and

external peering

Why Cloud? (Fundamental Forces)Economics• Nearly impossible to beat Google / Amazon

buying power or operating efficiencies• 2010s in computing are like 1910s in

electric power

Developer Adoption• It Just Works ™• Makes it easy to fall in love with

infrastructure

“Soon it will be just as common to run your own data center as it is to run your own electric power generation”

-- me

Autoscaling

Games are very spiky• Very unpredictable• Huge variability between peak and

trough

Hits are self-reinforcing

Automation Work at KIXEYE

Resilient Clients• Clients back off in response to latency• Clients continue gameplay despite

network disruption

Elastic Services• Services grow / shrink based on load• Service Cluster == AWS Auto Scale

Group

Automation Work at KIXEYE

Build / Deploy Pipeline• One button• Puppet -> Packer -> AMI -> Asgard• Zero-downtime red-black deployment• Futures: canarying, auto-rollback

Manageability• Puppet for configuration management• Flume -> ElasticSearch / Kibana for logging• Shinken -> PagerDuty for monitoring and

alerting

The Game of Operations

Cloud

Services

DevOps Culture

Service Teams

• Give teams autonomy• Freedom to choose technology,

methodology, working environment• Responsibility for the results of those

choices

• Hold them accountable for *results*• Give a team a goal, not a solution• Let team own the best way to achieve the

goal

KIXEYE Service Chassis

• Goal: “chassis” for building scalable game services

• Minimal resources, minimal direction• 3 people x 1 month• Consider building on NetflixOSS

Team exceeded expectations• Co-developed chassis, transport layer, service

template, build pipeline, red-black deployment, etc.• Operability and manageability from the beginning• 15 minutes from no code to running service in AWS

(!)• Open-sourced at github.com/kixeye

Micro-Services

Single-purposeSimple, well-defined interfaceModular and independentSmall teamsAutonomy and responsibility A

C D E

B

Transition to Service RelationshipsVendor – Customer Relationship• Friendly and cooperative, but structured• Clear ownership and division of

responsibility• Customer can choose to use service or not

(!)

Service-Level Agreement (SLA)• Promise of service levels by the provider• Customer needs to be able to rely on the

service, like a utility

Transition to Service RelationshipsCharging and Cost Allocation• Charge customers for *usage* of the

service• Aligns economic incentives of customer

and provider• Motivates both sides to optimize

The Game of Operations

Cloud

Services

DevOps Culture

One Team (!)

• Act as one team across development, product, operations, etc.

• Solve problems instead of blaming and pointing fingers

• Political games are not as fun as real-time strategy games

Everyone Is Responsible for ProdEveryone’s incentives are aligned

Everyone is strongly motivated to have solid instrumentation and monitoring

“DevOps is a reorg”

– Adrian Cockcroft

Blame-Free Post-Mortems

Learn from mistakes and improve• What did you do -> What did you learn• Take emotion and personalization out of

it

Post-mortem After Every Incident• Document exactly what happened• What went right• What went wrong

Blame-Free Post-Mortems

Open and Honest Discussion• What contributed to the incident?• What could we have done better?Engineers compete to take

responsibility (!)

“Failure is not falling down but refusing to get back up”

– Theodore Roosevelt

Transition to DevOps

Organization• Studios make user-visible games• Services provide common endpoints

Training / Retraining• Common bootcamp• Train devs as Ops, Ops as devs

Transition On-call• Use primary / secondary on-call as

apprenticeship

“You Build It, You Run It”

– Everyone

Recap: The Game of OperationsCloud

Services

DevOps

Come Join Us!

DevOps Whiskey Tasting, July 22333 Bush St., San Francisco

kixeyeloveswhiskey.eventbrite.com

Hiring in SF, Seattle, Victoria, Brisbane, Amsterdamwww.kixeye.com/jobs

top related