upside of downtime - lenny rachitsky

229
http://gapingvoid.com / Sunday, June 20, 2010

Upload: stephanieklaiber

Post on 16-Jan-2015

513 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Upside of Downtime - Lenny Rachitsky

http://gapingvoid.com/

Sunday, June 20, 2010

Page 2: Upside of Downtime - Lenny Rachitsky

The Upside of DowntimeTurning disaster into opportunity

Sunday, June 20, 2010

Page 3: Upside of Downtime - Lenny Rachitsky

Who’s had a site go down?

Sunday, June 20, 2010

Page 4: Upside of Downtime - Lenny Rachitsky

Who’s hasn’t had a site go down?

Sunday, June 20, 2010

Page 5: Upside of Downtime - Lenny Rachitsky

There’s always that one guy!

Sunday, June 20, 2010

Page 6: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 7: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 8: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 9: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 10: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 11: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 12: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 13: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 14: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 15: Upside of Downtime - Lenny Rachitsky

Downtime sucks

Source: http://www.motivatedphotos.com/?id=8080

Sunday, June 20, 2010

Page 16: Upside of Downtime - Lenny Rachitsky

Why downtime sucks

Business

$0

$750

$1,500

$2,250

$3,000

0 2 4 6 8 10 12 14 16 18 20 22

Sales

Sunday, June 20, 2010

Page 17: Upside of Downtime - Lenny Rachitsky

Why downtime sucks

Business

Brand

Sunday, June 20, 2010

Page 18: Upside of Downtime - Lenny Rachitsky

Why downtime sucks

Business

Brand

You

Sunday, June 20, 2010

Page 19: Upside of Downtime - Lenny Rachitsky

Why downtime sucks

Business

Brand

You

Users

Sunday, June 20, 2010

Page 20: Upside of Downtime - Lenny Rachitsky

Downtime = Bad! (Duh)

Sunday, June 20, 2010

Page 21: Upside of Downtime - Lenny Rachitsky

Approach #1Don’t fail

Sunday, June 20, 2010

Page 22: Upside of Downtime - Lenny Rachitsky

Source: http://kansansforlife.files.wordpress.com/2009/12/titanic.jpg

Sunday, June 20, 2010

Page 23: Upside of Downtime - Lenny Rachitsky

“Everything fails all the time”-- Werner Vogels (Amazon, CTO)

Sunday, June 20, 2010

Page 24: Upside of Downtime - Lenny Rachitsky

“Everything fails all the time”-- Werner Vogels (Amazon, CTO)

Sunday, June 20, 2010

Page 25: Upside of Downtime - Lenny Rachitsky

Your site will fail

Werner Vogels (Amazon, CTO)

Sunday, June 20, 2010

Page 26: Upside of Downtime - Lenny Rachitsky

Why?!?

Sunday, June 20, 2010

Page 27: Upside of Downtime - Lenny Rachitsky

Risk Homeostasis

Why Failure Happens

Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg

Sunday, June 20, 2010

Page 28: Upside of Downtime - Lenny Rachitsky

Risk Homeostasis

Black Swan

Why Failure Happens

Source: Amazon.com

Sunday, June 20, 2010

Page 29: Upside of Downtime - Lenny Rachitsky

Risk Homeostasis

Black Swan

Unknown unknowns

Why Failure Happens

Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg

Sunday, June 20, 2010

Page 30: Upside of Downtime - Lenny Rachitsky

Risk Homeostasis

Black Swan

Unknown unknowns

Change

Why Failure Happens

Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg

Sunday, June 20, 2010

Page 31: Upside of Downtime - Lenny Rachitsky

Risk Homeostasis

Black Swan

Unknown unknowns

Change

Many small failures

Why Failure Happens

Source: http://www.biojobblog.com/uploads/image/dominos.jpg

Sunday, June 20, 2010

Page 32: Upside of Downtime - Lenny Rachitsky

Risk Homeostasis

Black Swan

Unknown unknowns

Change

Many small failures

Humans

Why Failure Happens

Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg

Sunday, June 20, 2010

Page 33: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 34: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 35: Upside of Downtime - Lenny Rachitsky

Not unusual

Polisherblocked

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 36: Upside of Downtime - Lenny Rachitsky

Not unusual Not expected

Polisherblocked

Moisture leaks into air system

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 37: Upside of Downtime - Lenny Rachitsky

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expected Not good

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 38: Upside of Downtime - Lenny Rachitsky

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expectedBackup disabled

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 39: Upside of Downtime - Lenny Rachitsky

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expectedBackup disabled

Indicator blockedDoh!

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 40: Upside of Downtime - Lenny Rachitsky

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expectedBackup disabled

Indicator blocked

Relief valve broken

Doh!

Dammit

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 41: Upside of Downtime - Lenny Rachitsky

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expectedBackup disabled

Indicator blocked

Relief valve broken

Gauge broken

Doh!

Dammit

WTF

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 42: Upside of Downtime - Lenny Rachitsky

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Meltdown

Not expectedBackup disabled

Indicator blocked

Relief valve broken

Gauge broken

Doh!

Dammit

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 43: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 45: Upside of Downtime - Lenny Rachitsky

“accidental power failure”

Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/

Sunday, June 20, 2010

Page 46: Upside of Downtime - Lenny Rachitsky

“traffic accident damaged a nearby utility transformer”

Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/

Sunday, June 20, 2010

Page 47: Upside of Downtime - Lenny Rachitsky

“unfortunate code change”Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/

Sunday, June 20, 2010

Page 48: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 49: Upside of Downtime - Lenny Rachitsky

“Unhappy customers may get some attention, but unhappy networked customers can quickly impact your business”

-- Clay Shirky

Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/

Sunday, June 20, 2010

Page 50: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 51: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 52: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 53: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 54: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 55: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 56: Upside of Downtime - Lenny Rachitsky

http://labs.webmetrics.com/crowdsourceduptimeSunday, June 20, 2010

Page 57: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 58: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 59: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 60: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 61: Upside of Downtime - Lenny Rachitsky

Recap

Sunday, June 20, 2010

Page 62: Upside of Downtime - Lenny Rachitsky

Your site will fail

Sunday, June 20, 2010

Page 63: Upside of Downtime - Lenny Rachitsky

Your site will fail+Downtime is bad

Sunday, June 20, 2010

Page 64: Upside of Downtime - Lenny Rachitsky

Your site will fail+Downtime is bad+Everyone will find out

Sunday, June 20, 2010

Page 65: Upside of Downtime - Lenny Rachitsky

Your site will fail+Downtime is bad+Everyone will find out=Screw it, I’ll become a lumberjack

Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg

Sunday, June 20, 2010

Page 66: Upside of Downtime - Lenny Rachitsky

“Embrace fear of outages and degradation. Use it to guide your architecture, your code, your infrastructure. So lean into it.”

-- John Allspaw, VP Tech. Ops at Etsy

Sunday, June 20, 2010

Page 67: Upside of Downtime - Lenny Rachitsky

Approach #2Prepare for downtime

Sunday, June 20, 2010

Page 68: Upside of Downtime - Lenny Rachitsky

Disclaimer: Try hard to avoid downtime

Sunday, June 20, 2010

Page 69: Upside of Downtime - Lenny Rachitsky

Learning by example...

Sunday, June 20, 2010

Page 70: Upside of Downtime - Lenny Rachitsky

Case Study #1Facebook

Sunday, June 20, 2010

Page 71: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 72: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 73: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 74: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 75: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 76: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 77: Upside of Downtime - Lenny Rachitsky

“The larger issue here isn't just that a portion of Facebook's platform has gone down - numerous web services have issues from time to time, including everything from Gmail to Twitter. An outage of this length, however, with no official communication from the company itself is disturbing.”

-- N.Y. Times

Sunday, June 20, 2010

Page 78: Upside of Downtime - Lenny Rachitsky

Downtime Disturbing

Facebook

Sunday, June 20, 2010

Page 79: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 80: Upside of Downtime - Lenny Rachitsky

Case Study #2Google App Engine

Sunday, June 20, 2010

Page 81: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 82: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 83: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 84: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 85: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 86: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 87: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 88: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 89: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 90: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 91: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 92: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 93: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 94: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 95: Upside of Downtime - Lenny Rachitsky

Downtime Kudos

Google App Engine

Sunday, June 20, 2010

Page 96: Upside of Downtime - Lenny Rachitsky

Case Study #3Atlassian

Sunday, June 20, 2010

Page 97: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 98: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 99: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 100: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 101: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 102: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 103: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 104: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 105: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 106: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 107: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 108: Upside of Downtime - Lenny Rachitsky

Downtime

Atlassian

Bravo

Sunday, June 20, 2010

Page 109: Upside of Downtime - Lenny Rachitsky

http://atlassian.com/

Sunday, June 20, 2010

Page 110: Upside of Downtime - Lenny Rachitsky

Downtime:Opportunity to Build Trust

Sunday, June 20, 2010

Page 111: Upside of Downtime - Lenny Rachitsky

Downtime:Opportunity to Destroy Trust

Sunday, June 20, 2010

Page 112: Upside of Downtime - Lenny Rachitsky

How To: Prepare for Downtime

Sunday, June 20, 2010

Page 113: Upside of Downtime - Lenny Rachitsky

Something > Nothing

Sunday, June 20, 2010

Page 114: Upside of Downtime - Lenny Rachitsky

Upside of Downtime Framework 1.0

Oh crapLife is good That sucked

Time

Sunday, June 20, 2010

Page 115: Upside of Downtime - Lenny Rachitsky

Upside of Downtime Framework 1.0

CommunicatePrepare Explain

Time

Sunday, June 20, 2010

Page 116: Upside of Downtime - Lenny Rachitsky

Upside of Downtime Framework 1.0

CommunicatePrepare Explain

Time

Sunday, June 20, 2010

Page 117: Upside of Downtime - Lenny Rachitsky

Upside of Downtime Framework 1.0

CommunicatePrepare Explain

Time

Sunday, June 20, 2010

Page 118: Upside of Downtime - Lenny Rachitsky

Upside of Downtime Framework 1.0

CommunicatePrepare Explain

Time

Sunday, June 20, 2010

Page 119: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain

Sunday, June 20, 2010

Page 120: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain

1. Communication channel

Sunday, June 20, 2010

Page 121: Upside of Downtime - Lenny Rachitsky

1. Communication channel

Something is wrong

Can’t tell if it’s me or you

I’ll assume it’s you

You suck

CommunicatePrepare Explain

Sunday, June 20, 2010

Page 122: Upside of Downtime - Lenny Rachitsky

Something is wrong

Can’t tell if it’s me or you

I’ll assume it’s you

I know it’s youTell me when you’re back

You suck a lot less

CommunicatePrepare Explain

1. Communication channel

Sunday, June 20, 2010

Page 123: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 124: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 125: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 126: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 127: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 128: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 129: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 130: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 131: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain

1. Communication channel Easy to find

Sunday, June 20, 2010

Page 132: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Sunday, June 20, 2010

Page 133: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

Sunday, June 20, 2010

Page 134: Upside of Downtime - Lenny Rachitsky

7 keys for public health dashboards

1. Must show current status for each “service”

2. Data must be accurate and timely

3. Must be easy to find

4. Must provide details for events in real time

5. Provide historical uptime and performance data

6. Provide a way to be notified of status changes

7. Provide details on the data is gathered

Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html

Sunday, June 20, 2010

Page 135: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

2. Process

Sunday, June 20, 2010

Page 136: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

2. Process Authority

Sunday, June 20, 2010

Page 137: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

2. Process Authority

Mean-Time-To-Communicate (MTTC)

Sunday, June 20, 2010

Page 138: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

2. Process Authority

Mean-Time-To-Communicate (MTTC)

On-call/drills/escalations/etc.Sunday, June 20, 2010

Page 139: Upside of Downtime - Lenny Rachitsky

Your servers

Sunday, June 20, 2010

Page 140: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. Communicate

Sunday, June 20, 2010

Page 141: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. Communicate Use communication channel

Sunday, June 20, 2010

Page 142: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Sunday, June 20, 2010

Page 143: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

Sunday, June 20, 2010

Page 144: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

When the incident started

Sunday, June 20, 2010

Page 145: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

When the incident started

ETA

Sunday, June 20, 2010

Page 146: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

When the incident started

ETA

Update regularly

Sunday, June 20, 2010

Page 147: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

When the incident started

ETA

Update regularly

2. Fix it!Sunday, June 20, 2010

Page 148: Upside of Downtime - Lenny Rachitsky

Phew, close one!

Sunday, June 20, 2010

Page 149: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. Postmortem

Sunday, June 20, 2010

Page 150: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/

Sunday, June 20, 2010

Page 151: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Source: http://www.bureauofcommunication.com/compose/apology

Sunday, June 20, 2010

Page 152: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

“We apologize for any inconvenience this may

have caused”

Sunday, June 20, 2010

Page 153: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf

Sunday, June 20, 2010

Page 154: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/

Sunday, June 20, 2010

Page 155: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

What went wrong

Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html

Sunday, June 20, 2010

Page 156: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

What went wrong

Lessons learned

Source: http://graysky.org/2010/02/downtime-postmortem/

Sunday, June 20, 2010

Page 157: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

What went wrong

Lessons learned

Sunday, June 20, 2010

Page 158: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

“I was completely overwhelmed by the amount of positive feedback and support I received.”

Sunday, June 20, 2010

Page 159: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

What went wrong

Lessons learned

2. Improve for the futureSunday, June 20, 2010

Page 160: Upside of Downtime - Lenny Rachitsky

“Google is not just saying sorry, they are actually implementing serious changes which probably represents millions of dollars of development to help make sure this doesn't happen again.”

Prepare ExplainCommunicate

Source: http://news.ycombinator.com/item?id=1168493

Sunday, June 20, 2010

Page 161: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf

Sunday, June 20, 2010

Page 162: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

Be human

Sunday, June 20, 2010

Page 163: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

Be authentic

Sunday, June 20, 2010

Page 164: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

Be transparent

Sunday, June 20, 2010

Page 165: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

Accept responsibility

Sunday, June 20, 2010

Page 166: Upside of Downtime - Lenny Rachitsky

Prepare ExplainCommunicate

Learn and improve

Sunday, June 20, 2010

Page 167: Upside of Downtime - Lenny Rachitsky

Trust

Prepare ExplainCommunicate

Sunday, June 20, 2010

Page 168: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 169: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

Upside of Downtime Framework 1.0

Be HumanBe TransparentBe Prepared + +

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

Sunday, June 20, 2010

Page 170: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

Upside of Downtime Framework 1.0

Be HumanBe TransparentBe Prepared + +

Trust=

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

Sunday, June 20, 2010

Page 171: Upside of Downtime - Lenny Rachitsky

Disclaimer:Don’t screw up too often

Sunday, June 20, 2010

Page 172: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 173: Upside of Downtime - Lenny Rachitsky

Transparent Not Transparent

Caught

Not Caught

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 174: Upside of Downtime - Lenny Rachitsky

Transparent Not Transparent

Caught

Not Caught Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 175: Upside of Downtime - Lenny Rachitsky

Transparent Not Transparent

Caught

Not Caught

Big Loss

Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 176: Upside of Downtime - Lenny Rachitsky

Transparent Not Transparent

Caught

Not Caught

Big Win Big Loss

Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 177: Upside of Downtime - Lenny Rachitsky

Transparent Not Transparent

Caught

Not Caught

Big Win Big Loss

Win Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 178: Upside of Downtime - Lenny Rachitsky

Transparent Not Transparent

Caught

Not Caught

Big Win Big Loss

Win Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 179: Upside of Downtime - Lenny Rachitsky

BenefitsGain trust

Reduce churn, increase loyalty

Reduce support costs

Ability to control the message

Competitive advantage

More time to focus on the actual problem

Reduce stress

Sunday, June 20, 2010

Page 180: Upside of Downtime - Lenny Rachitsky

Change != Easy

Sunday, June 20, 2010

Page 181: Upside of Downtime - Lenny Rachitsky

Change != Impossible

Sunday, June 20, 2010

Page 182: Upside of Downtime - Lenny Rachitsky

Keys to Adoption

Getting past a culture of “hide the problem”

Sunday, June 20, 2010

Page 183: Upside of Downtime - Lenny Rachitsky

Keys to Adoption

Getting past a culture of “hide the problem”

Overriding commitment to want to improve

Sunday, June 20, 2010

Page 184: Upside of Downtime - Lenny Rachitsky

Keys to Adoption

Getting past a culture of “hide the problem”

Overriding commitment to want to improve

Available resources to improve

Sunday, June 20, 2010

Page 185: Upside of Downtime - Lenny Rachitsky

Keys to Adoption

Getting past a culture of “hide the problem”

Overriding commitment to want to improve

Available resources to improve

Pain

Sunday, June 20, 2010

Page 186: Upside of Downtime - Lenny Rachitsky

Keys to Adoption

Getting past a culture of “hide the problem”

Overriding commitment to want to improve

Available resources to improve

Pain

Buy-in

Sunday, June 20, 2010

Page 187: Upside of Downtime - Lenny Rachitsky

Product Management

Support

Sales/Marketing

Engineering/Operations

Sunday, June 20, 2010

Page 188: Upside of Downtime - Lenny Rachitsky

Product Management

Support

Default: Lets wait for complaints

Sales/Marketing

Engineering/Operations

Sunday, June 20, 2010

Page 189: Upside of Downtime - Lenny Rachitsky

Product Management

Support

Default: Lets wait for complaints

Reality: Proactiveness => Forgiveness

Sales/Marketing

Engineering/Operations

Sunday, June 20, 2010

Page 190: Upside of Downtime - Lenny Rachitsky

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Sales/Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 191: Upside of Downtime - Lenny Rachitsky

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Sales/Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 192: Upside of Downtime - Lenny Rachitsky

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Sales/Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 193: Upside of Downtime - Lenny Rachitsky

Engineering/Operations

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Reality: Opportunity to learn/improve

Sales/Marketing

Default: Lets wait for complaints

Sunday, June 20, 2010

Page 194: Upside of Downtime - Lenny Rachitsky

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Reality: Opportunity to learn/improve

Default: I don’t want my customers to knowSales/Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 195: Upside of Downtime - Lenny Rachitsky

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Reality: Opportunity to learn/improve

Default: I don’t want my customers to know

Reality: They’ll find out, better from usSales/

Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 196: Upside of Downtime - Lenny Rachitsky

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Reality: Opportunity to learn/improve

Default: I don’t want my customers to know

Reality: They’ll find out, better from usSales/

Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 197: Upside of Downtime - Lenny Rachitsky

Source: http://delicious.com/lennysan/healthdashboard

Sunday, June 20, 2010

Page 198: Upside of Downtime - Lenny Rachitsky

Simple as that!

Sunday, June 20, 2010

Page 199: Upside of Downtime - Lenny Rachitsky

Your site will still fail!

Sunday, June 20, 2010

Page 200: Upside of Downtime - Lenny Rachitsky

“The measure of a society is how well it transforms pain and suffering into something worthwhile.”

-- Fredrick Nietzsche

Sunday, June 20, 2010

Page 201: Upside of Downtime - Lenny Rachitsky

“The measure of a company is how well it transforms pain of downtime into something worthwhile.”

-- Lenny Rachitsky

Source: Original quote inspired by Fredrick Nietzsche

Sunday, June 20, 2010

Page 202: Upside of Downtime - Lenny Rachitsky

Bare minimum:Register a Twitter account

Sunday, June 20, 2010

Page 203: Upside of Downtime - Lenny Rachitsky

Lenny Rachitsky@lennysanhttp://www.transparentuptime.com/

Webmetrics/Neustar@webmetricshttp://www.webmetrics.com/

Slides: http://bit.ly/upside-of-downtime

Thank You

Sunday, June 20, 2010

Page 204: Upside of Downtime - Lenny Rachitsky

Bonus

Sunday, June 20, 2010

Page 205: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 206: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 207: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 208: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 209: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 210: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

"Unlikely that an accidental surface or subsurface oil spill would occur from the proposed activities"

-- Exploration and environmental impact plan

Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion

Sunday, June 20, 2010

Page 211: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 212: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 213: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 214: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 215: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 216: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 217: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 218: Upside of Downtime - Lenny Rachitsky

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 219: Upside of Downtime - Lenny Rachitsky

“Be not afraid of transparency; some are born transparent, some achieve transparency, and others have transparency thrust upon them.”

-- Burrowed from William Shakespeare

Sunday, June 20, 2010

Page 220: Upside of Downtime - Lenny Rachitsky

Sunday, June 20, 2010

Page 221: Upside of Downtime - Lenny Rachitsky

Making change1. Find the bright spots - (this presentation has a bunch)

Sunday, June 20, 2010

Page 222: Upside of Downtime - Lenny Rachitsky

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

Sunday, June 20, 2010

Page 223: Upside of Downtime - Lenny Rachitsky

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

Sunday, June 20, 2010

Page 224: Upside of Downtime - Lenny Rachitsky

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

Sunday, June 20, 2010

Page 225: Upside of Downtime - Lenny Rachitsky

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

Sunday, June 20, 2010

Page 226: Upside of Downtime - Lenny Rachitsky

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

6. Grow your people - (everyone is learning as they go)

Sunday, June 20, 2010

Page 227: Upside of Downtime - Lenny Rachitsky

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

6. Grow your people - (everyone is learning as they go)

7. Tweak the environment - (create a simple process)

Sunday, June 20, 2010

Page 228: Upside of Downtime - Lenny Rachitsky

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

6. Grow your people - (everyone is learning as they go)

7. Tweak the environment - (create a simple process)

8. Build habits - (build process organically)

Sunday, June 20, 2010

Page 229: Upside of Downtime - Lenny Rachitsky

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

6. Grow your people - (everyone is learning as they go)

7. Tweak the environment - (create a simple process)

8. Build habits - (build process organically)

9. Rally the herd - (get buy in, rest will follow)

Sunday, June 20, 2010