managing deployments in the age of the microservice · the same version of your application with...

46
Managing Deployments in the Age of the Microservice samschaevitz at google dot com Gmail & Calendar SRE @ Google Prepared for SREcon EMEA 2017 Dublin, Ireland 2017-08-31

Upload: others

Post on 15-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Managing Deployments in the Age of the Microservice

samschaevitz at google dot comGmail & Calendar SRE @ Google

Prepared for SREcon EMEA 2017Dublin, Ireland2017-08-31

Page 2: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Agenda1. Types of Service Changes2. Why Change Things At All?3. What Can Go Wrong?4. Best Practices5. A Brief Interlude on Naming6. Payoff

Page 3: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Caveat: This talk is targeted for those who work on applications composed of layered, replicated, sharded, lightweight and fine-grained services.

Page 4: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Types of Service Changes

Page 5: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Control Surfaces

Type Time to Recovery

Client Change O(Day)

Server Binary Change O(Hour)

Static Configuration Change O(Hour)

Dynamic Configuration Change O(Minute)

Page 6: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Why Change Things At All?

Page 7: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

This is reliability engineering. The safest thing to do is to

never change anything!

Page 8: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

“A ship in harbor is safe, but that’s not what ships are built for.”

- John A. Shedd

Page 9: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

The safest most optimistic and least rewarding thing to do is to never

change anything.

Page 10: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Is a Change Freeze Right For You?

➔ your feature set is complete, forever and ever➔ your system never ever breaks, and will never break➔ even if it did, you totally already understand every single failure mode of

your system➔ your infrastructure or external dependencies never change➔ you cannot possibly optimize your costs better than you already have➔ wouldn’t ever want to adopt any new technologies to make your

infrastructure more reliable than it already is

Page 11: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

“You never know what’s hiding inside these systems. You never know exactly what’s going to trigger it.”

- Bill Curtis, SVP & Chief Scientist at CAST

Page 12: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

“Following the major IT system failure experienced earlier today, with regret we have had to cancel all flights leaving from Heathrow and Gatwick for the rest of Saturday,” a spokeswoman said.

image: pexels, cc0

Page 13: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

“The Delta Air Lines technology glitch that canceled hundreds of flights Sunday and Monday was smaller than other episodes in recent months that cost airlines tens of millions of dollars, but it still served as a reminder of how fragile airline computers are.”

image: pixabay, cc0

Page 14: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

“A computer problem forced United Airlines to ground all domestic flights for about an hour on Sunday evening, causing a cascade of delays and annoying customers throughout the United States.”

image: pexels, cc0

Page 15: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Ideal Change Velocity, the Dev organization’s perspective

vers

ion

time

Page 16: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Ideal Change Velocity, an SRE’s perspective

vers

ion

time

Page 17: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

How do we manage these

conflicting incentives?

Page 18: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Align increases in risk exposure

with decreases in risk profile.

Page 19: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

What Could Go Wrong?

Page 20: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Examples of System Badness

➔ the client doesn’t show the user the error and silently/automatically retries and succeeds➔ end user-visible latency when you load your Gmail inbox increases by 10% at the 99th

percentile➔ a common Java library has been updated to use a new implementation of HashMap and it costs

10% more CPU cycles for our main application service to run➔ Google Wallet integration is broken in the compose box of Gmail➔ redesigning the UI layout and users hate it➔ every user trying to access mail.google.com receives a Server 500 error

Page 21: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Cost of Badness

cost

magnitude

number of users affected

Page 22: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Cost of Badness

cost

magnitude

number of users affected

severity of badness

Page 23: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Cost of Badness

cost

magnitude

number of users affected

severity of badness

max (user risk aversion)

Page 24: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Calculating Cost of System Badness

Total Disruption Cost ≈ (Number of Affected Users * Disruption Severity)- Max. User Risk Aversion

Page 25: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

➔ immediate lost revenue➔ user trust ➔ contract violations ➔ increased infrastructure costs ➔ engineering time

Penalty Buckets

Page 26: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

User Types

➔ developer team testers ➔ internal testers➔ trusted external testers ➔ free-tier consumers➔ paying customers

Page 27: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Best Practices

Page 28: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

The Optimal Rollout Is...

➔ staged➔ progressive➔ revertable➔ transparent➔ automatic➔ well-understood

Page 29: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

● Buildable● Runnable● Obvious feature breakages

Developer Team Testing

Page 30: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Internal Testing ● More subtle feature breakages● Obvious performance

regressions

Page 31: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Trusted External Testing

● Do our paying customers (with higher risk tolerances) notice anything they don’t like about a new release?

Page 32: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Some Free-Tier Users ● Nuanced performance

regressions● Subtle bugs

Page 33: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Remaining Free-Tier Users ● Subtler performance

regressions● Subtler feature regressions

Page 34: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Paying Users● Time in earlier stages is

enough to deploy to most risk averse, most expensive to lose customers

Page 35: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

● Have high-quality (unit, integration) test coverage● Build your targets as often as possible● A/B experiment everything● Automate to limit toil, risk of human error● Hold back some vocal internal users for the duration of the launch● Deploy often; limit change surface● Make it easy to rollback● Encourage backwards compatibility in code

To Detect Badness as Early As Possible...

Page 36: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

A Brief Interlude on Naming

Page 37: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Less Important: Actual Names

image: pxhere, cc0

Page 38: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

More Important: Consistent Names

image: pxhere, cc0

Page 39: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

More Important: Consistent Names

➔ server name schemas➔ binary release stages➔ machines➔ load-balancing domains➔ user populations

Page 40: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Payoff

Page 41: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

In Return for A Principley-Engineered Deployment System...

➔ Detect problems long before the exposure costs you time or money➔ More, better automation systems➔ Easier debugging➔ Automatic rollbacks of bad deployments➔ Frequent production deployments ➔ Happier, more productive organizations

Page 42: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

A developer is happy because her code has launched.

Page 43: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Questions?

Page 44: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Thanks!

Page 45: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Extras

Page 46: Managing Deployments in the Age of the Microservice · the same version of your application with e.g. new command-line flags read a startup time. runtime configuration change - a

Glossary

binary push - installing a different binary version of a long-lived server, starting it, and migrating traffic previously handled by the old version of the process to the new version.

client update - releasing a new version of web, native or mobile application code to users, giving them the option to update their native or web applications.

static configuration change - a change that requires restarting the same version of your application with e.g. new command-line flags read a startup time.

runtime configuration change - a change to a long-lived server that does not require a restart to take effect e.g. a list of healthy backend services that are able to receive traffic from this service.

feature launch - the enabling of new functionality in your application, such that it is usable by your customers.

dark launch - the exercise of new code in your application without affecting the user’s experience of your application in order to gain confidence in new features before the cost of rolling back is too high.

image: pxhere, cc0