gameday - agile australia · 2019-05-20 · a gameday manifesto? dr gamedays driver process...

Post on 07-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

"GameDay"Achieving resilience through Chaos Engineering

Matt Fellows@matthewfellows

#AAGameDay#ChaosTesting

Pete Cohen@petecohen

MATT

PE

E

MATT

What is the common thread for these catastrophes?

#1 They all combined Technology with People + Process

#2 They all had multiple causes

Overview

■Why GameDay exercises?■ Case Studies■How you can run one for yourself

■ Bugs■ Integration issues■ Distributed failure■ The squishy stuff: People + Process

Classes of issues

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Pace layered architecture

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Bug

bug

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Integration Issues

integration

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Distributed Failures

distributed

User Interface Mobile

API Gateway

Mainframe / DB

Middleware / APIs Ve

rtica

l Slic

e

Catastrophes

Customers

EngineersCall Centre

bug

distributed

...

integration Public Relations

Classes of issues

■ Bugs■ Integration issues■ Distributed failure■ The squishy stuff: People + Process

So how do we avoid becoming front page news?

(the bad kind)

Fragility vs Resilience

Resilience vs Antifragility

Embracing Failure

■We need to practice failure■ Software Engineering needs its Fire Drill

An exercise where we place our systems - technology, people + processes -

under stress in order to learn and improve resilience.

GameDay

A GameDay manifesto?

DR GameDays

Driver Process Continuous Improvement

Approach Run sheet + requirements Loose plan + a little chaos

Focus Infrastructure Customer

Who Operations Cross functional, multi-disciplinary team

Assumption System is built to a robust design

System is hazardous

Once you finally start succeeding at agile…

Iterative software development Independent feature teamsNimble architecturesDistributed, scalable infrastructure

We want to inspire youto give GameDays a go

Case Studies

Case Study: SEEK & nib

MATT

Logistics - how to plan a GameDay

dius.com.au/resources/game-day■ People and roles to get involved■ Preparation workshops and planning■ Templates and checklist■ Physical space set up

Get buy in

Find the

right people

Runworkshops

Logistical preparation

Runthe

GameDay

Communicate and act on outcomes

Get buy in

Find the

right people

Runworkshops

Logistical preparation

Runthe

GameDay

Communicate and act on outcomes

Decide which broad

areas to test

Identify scenarios

Capture hypotheses

Formulate an action plan to

set up scenarios

Scenario and hypothesis generation workshop

Get a common view of

the stack

MATT

Load Balancer

API API API API

Load Balancer Load Balancer Load

BalancerLoad

Balancer

Post Mortem

Post Mortem

MATT

Load Balancer

API API API API

Load Balancer Load Balancer Load Balancer Load Balancer

X X X XNo visibility!

✅ ✅ ✅ ✅

X

Release Dashboard

Ingredients for catastrophe

MATT

✓Introduction of a change to the system✓Human error✓Missing local controls (tests) to prevent syntax issue✓Lack of salient information for operator (monitoring and alerting)✓Opportunity to misinterpret data✓Distance between expert and operator (process)

What did we learn?

■ Just getting teams together to discuss resilience was worthwhile

■We always found something ■Our experiments reduced the impact of

hindsight bias

PETE

What matters: ■ Cross-functional team ■ Planning■Open to exposing failure■ Customer focus■ Bake it in - do GameDays frequently

What doesn’t matter: ■ Size of team/company■Waterfall/Agile■ Language, technology...

PETE

Are GameDays the new hack days?

■ Collaboration■ Problem solving■ Creates business value

The journey towards automated resilience testing

MATT

Pre-Production:

■Create local experiments in Docker■Manual chaos in integrated

environments

Production:

■Start small!■Metrics-driven approach

Chaos Kong

pumba

Matt Fellows @matthewfellows mfellows@dius.com.auPete Cohen @petecohen pcohen@dius.com.au

For links, references, templates and your GameDay toolkit, head to:

dius.com.au/resources/game-day

Thank you!

top related