cloud expo - flying two mistakes high

66
Flying Two Mistakes High A Guide to Not Crashing Lee Atchison, Principal Cloud Architect and Advocate at New Relic, Inc. ©2008-16 New Relic, Inc. All rights reserved.

Upload: lee-atchison

Post on 16-Apr-2017

72 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Cloud Expo - Flying Two Mistakes High

Flying Two Mistakes HighA Guide to Not CrashingLee Atchison, Principal Cloud Architect and Advocate at New Relic, Inc.

©2008-16NewRelic,Inc.Allrightsreserved.

Page 2: Cloud Expo - Flying Two Mistakes High

2

SafeHarbor

©2008-16NewRelic,Inc.Allrightsreserved.

This document and the information herein (including any information that may be incorporated by reference) isprovided for informational purposes only and should not be construed as an offer, commitment, promise orobligation on behalf of New Relic, Inc. (“New Relic”) to sell securities or deliver any product, material, code,functionality, or other feature. Any information provided hereby is proprietary to New Relic and may not bereplicated or disclosed without New Relic’s express written permission.

Such information may contain forward-looking statements within the meaning of federal securities laws. Anystatement that is not a historical fact or refers to expectations, projections, future plans, objectives, estimates,goals, or other characterizations of future events is a forward-looking statement. These forward-lookingstatements can often be identified as such because the context of the statement will include words such as“believes,” “anticipates,”, “expects” or words of similar import.

Actual results may differ materially from those expressed in these forward-looking statements, which speak onlyas of the date hereof, and are subject to change at any time without notice. Existing and prospective investors,customers and other third parties transacting business with New Relic are cautioned not to place undue relianceon this forward-looking information. The achievement or success of the matters covered by such forward-lookingstatements are based on New Relic’s current assumptions, expectations, and beliefs and are subject to substantialrisks, uncertainties, assumptions, and changes in circumstances that may cause the actual results, performance, orachievements to differ materially from those expressed or implied in any forward-looking statement. Furtherinformation on factors that could affect such forward-looking statements is included in the filings we make withthe SEC from time to time. Copies of these documents may be obtained by visiting New Relic’s Investor Relationswebsite at http://ir.newrelic.com or the SEC’s website atwww.sec.gov.

New Relic assumes no obligation and does not intend to update these forward-looking statements, except asrequired by law. New Relic makes no warranties, expressed or implied, in this document or otherwise, withrespect to the information provided.

Page 3: Cloud Expo - Flying Two Mistakes High

3

WhoamI?

©2008-16NewRelic,Inc.Allrightsreserved.

Specializein:

Cloudcomputing

Services&Microservices

Scalability, Availability

28yearsinindustry7inAmazonRetail&AWS(BuiltSW/VGAppStore,AWSElasticBeanstalk)

4inNewRelic(ArchitectureLead,Cloud,ServiceMigration)

@leeatchison leeatchison

Page 4: Cloud Expo - Flying Two Mistakes High

4

Iwanttotellyouastory…

©2008-16NewRelic,Inc.Allrightsreserved.

Page 5: Cloud Expo - Flying Two Mistakes High

5

Iwanttotellyouastory…

©2008-16NewRelic,Inc.Allrightsreserved.

Youtellmeifthisisokornot…

Page 6: Cloud Expo - Flying Two Mistakes High

6

Iwanttotellyouastory…

©2008-16NewRelic,Inc.Allrightsreserved.

Thiswasarecentlyoverheardconversation…

Youtellmeifthisisokornot…

Page 7: Cloud Expo - Flying Two Mistakes High

7

Isthisok?

©2008-16NewRelic,Inc.Allrightsreserved.

“Wewerewonderinghowchangingasettingon

ourMySQLdatabasemightimpactourperformance…

Page 8: Cloud Expo - Flying Two Mistakes High

8

Isthisok?

©2008-16NewRelic,Inc.Allrightsreserved.

“Wewerewonderinghowchangingasettingon

ourMySQLdatabasemightimpactourperformance…

…butwewereworriedthatthechangemaycauseourproductiondatabasetofail…”

Page 9: Cloud Expo - Flying Two Mistakes High

9

Isthisok?

©2008-16NewRelic,Inc.Allrightsreserved.

“…Sincewedidn’twanttobringdownproduction,wedecidedtomakethechangetoourbackup

(replica)databaseinstead…

UnderConstruction

…butwewereworriedthatthechangemaycauseourproductiondatabasetofail…”

Page 10: Cloud Expo - Flying Two Mistakes High

10

Isthisok?

©2008-16NewRelic,Inc.Allrightsreserved.

“…Sincewedidn’twanttobringdownproduction,wedecidedtomakethechangetoourbackup

(replica)databaseinstead…

…Afterall,itwasn’tbeingusedforanything

atthemoment.”

UnderConstruction

Page 11: Cloud Expo - Flying Two Mistakes High

11

Isthisok?

©2008-16NewRelic,Inc.Allrightsreserved.

Until,ofcourse,thebackupwasneeded…

UnderConstructionX

Page 12: Cloud Expo - Flying Two Mistakes High

12

Isthisok?

©2008-16NewRelic,Inc.Allrightsreserved.

Until,ofcourse,thebackupwasneeded…

Thiswasatruestory

UnderConstruction!!!!X

X

Page 13: Cloud Expo - Flying Two Mistakes High

13

Iflyradiocontrolledmodelairplanes

©2008-16NewRelic,Inc.Allrightsreserved.

“Keepyourplaneatleasttwomistakeshigh.”

There’sanoldadage:

Page 14: Cloud Expo - Flying Two Mistakes High

14

ButWhy?

©2008-16NewRelic,Inc.Allrightsreserved.

“Keepyourplaneatleasttwomistakeshigh.”

Page 15: Cloud Expo - Flying Two Mistakes High

15

WhyTwoMistakesHigh?

©2008-16NewRelic,Inc.Allrightsreserved.

Youperformsomestunt,anditfails…Youlosealtitude

Youalwayswanttobehighenoughtomakeamistake,

evenifyou’vejustmadeamistake…

Page 16: Cloud Expo - Flying Two Mistakes High

16

WhyTwoMistakesHigh?

©2008-16NewRelic,Inc.Allrightsreserved.

Youperformsomestunt,anditfails…Youlosealtitude

Now,youarelower,andyouaretryingtorecover

Youalwayswanttobehighenoughtomakeamistake,

evenifyou’vejustmadeamistake…

Page 17: Cloud Expo - Flying Two Mistakes High

17

WhyTwoMistakesHigh?

©2008-16NewRelic,Inc.Allrightsreserved.

Youperformsomestunt,anditfails…Youlosealtitude

Now,youarelower,andyouaretryingtorecover

Youwanttostillbehighenough, sothatifyoumakeanothermistake,youwon’tcrash

Youalwayswanttobehighenoughtomakeamistake,

evenifyou’vejustmadeamistake…

Page 18: Cloud Expo - Flying Two Mistakes High

18

WhyTwoMistakesHigh?

©2008-16NewRelic,Inc.Allrightsreserved.

Youperformsomestunt,anditfails…Youlosealtitude

Now,youarelower,andyouaretryingtorecover

Youwanttostillbehighenough, sothatifyoumakeanothermistake,youwon’tcrash

Youalwayswanttobehighenoughtomakeamistake,

evenifyou’vejustmadeamistake…

Page 19: Cloud Expo - Flying Two Mistakes High

19

Putanotherway…

©2008-16NewRelic,Inc.Allrightsreserved.

…evenifyouarecurrentlyrecovering

fromamistake

…flyingtwomistakeshigh,youcanalwayshaveabackupplanfor

recovering fromamistake

Page 20: Cloud Expo - Flying Two Mistakes High

20©2008-16NewRelic,Inc.Allrightsreserved.

Don’tscrewup...

…whileyouarescrewingup

Page 21: Cloud Expo - Flying Two Mistakes High

Thissameapplieswhenbuildinghighlyavailable,highscaleapplications

21©2008-16 New Relic, Inc. All rights reserved.

Page 22: Cloud Expo - Flying Two Mistakes High

22

Howdowekeep“TwoMistakesHigh”inanapplication?

©2008-16NewRelic,Inc.Allrightsreserved.

Walkthroughramifications andrecoveryplan

Page 23: Cloud Expo - Flying Two Mistakes High

23©2008-16NewRelic,Inc.Allrightsreserved.

Walkthroughramifications andrecoveryplan

Makesurerecoveryplanworks§ Hasnomistakes§ Hasitsownrecoveryplan

Howdowekeep“TwoMistakesHigh”inanapplication?

Page 24: Cloud Expo - Flying Two Mistakes High

24©2008-16NewRelic,Inc.Allrightsreserved.

Walkthroughramifications andrecoveryplan

Ifrecoveryplandoesn’twork…

it’snotagoodrecoveryplan

Makesurerecoveryplanworks§ Hasnomistakes§ Hasitsownrecoveryplan

Howdowekeep“TwoMistakesHigh”inanapplication?

Page 25: Cloud Expo - Flying Two Mistakes High

25©2008-16 New Relic, Inc. All rights reserved.

EXAMPLEHowmanynodesdoweneed?

Page 26: Cloud Expo - Flying Two Mistakes High

26

EXAMPLEHowmanynodesdoweneed?

©2008-16NewRelic,Inc.Allrightsreserved.

HowmanynodesdoIneedtohandlemytrafficdemands?

BuildingaService§ Designedtohandle1,000req/sec(assumesinglenode=300req/sec)

Page 27: Cloud Expo - Flying Two Mistakes High

27

EXAMPLEHowmanynodesdoweneed?

©2008-16NewRelic,Inc.Allrightsreserved.

Right???

§ ceil[1,000/300]=4nodes§ Withfournodes,canhandleourtraffic§ PLUS wehaveenoughnodesthatwecanloseone!Wehaveredundancy!

Page 28: Cloud Expo - Flying Two Mistakes High

28

EXAMPLEWellno…

©2008-16NewRelic,Inc.Allrightsreserved.

Youthink4nodesgivesyouredundancy,butitdoesn’t...

Ifyouloseoneofthosenodes:§ Remainingnodescanonlyhandle300*3=900req/sec

§ Cannothandlethe1,000req/secload

Page 29: Cloud Expo - Flying Two Mistakes High

29

EXAMPLEHowmanydoweneed?

©2008-16NewRelic,Inc.Allrightsreserved.

4nodes...allowshandlingourtrafficbutwecannothandleanodefailure

5nodes...allowshandlingasinglenodefailure

But…

Noupgrading

6nodes...amulti-nodefailure,

Or…

Handleafailureduringanupgrade

Page 30: Cloud Expo - Flying Two Mistakes High

30

LESSONFlyTwoMistakesHigh

©2008-16NewRelic,Inc.Allrightsreserved.

Evenifyouthinkyouhaveredundancy…

§ Thinkthroughthefailuremodes§ …and makesure

Page 31: Cloud Expo - Flying Two Mistakes High

31©2008-16 New Relic, Inc. All rights reserved.

EXAMPLERollingupgrades

Page 32: Cloud Expo - Flying Two Mistakes High

32

EXAMPLERollingupgrades

©2008-16NewRelic,Inc.Allrightsreserved.

Areyousafe?

Youneed10nodestorunyourapplication

Youhave11nodes,sothatyoucandorollingupgrades

§ Bringonenodedownatatimetoupgrade…

§ Alwaysatleast10available...

Page 33: Cloud Expo - Flying Two Mistakes High

33

EXAMPLEWellno…

©2008-16NewRelic,Inc.Allrightsreserved.

Withthefailedservertocontendwith…youhavenoroomtodoanupgradeorrollback,

andyouareatriskforanotherfailure

§ Whatifthatnodefailsduringupgrade?§ Whatifyounowhavetorollback?

Page 34: Cloud Expo - Flying Two Mistakes High

34

LESSONFlyTwoMistakesHigh

©2008-16NewRelic,Inc.Allrightsreserved.

Makesureyoucanhandlefailures

§ Evenduring“exceptional”events,suchasupgrades

§ Exceptionaleventscancausefailures

Page 35: Cloud Expo - Flying Two Mistakes High

35©2008-16 New Relic, Inc. All rights reserved.

EXAMPLEUnknowndependencies

? ?

Page 36: Cloud Expo - Flying Two Mistakes High

36

EXAMPLEUnknowndependencies

©2008-16NewRelic,Inc.Allrightsreserved.

Areyousafe?

Youhaveyourapplicationrunningon20servers…§ Youcanrunon15serversifnecessary

§ Plentyofredundancy

Page 37: Cloud Expo - Flying Two Mistakes High

37

EXAMPLEWell,depends…

©2008-16NewRelic,Inc.Allrightsreserved.

Areanyofthe20servers in

thesamerack?

Page 38: Cloud Expo - Flying Two Mistakes High

38

EXAMPLEWell,depends…

©2008-16NewRelic,Inc.Allrightsreserved.

Areanyofthe20servers in

thesamerack?

Sharethesamepowersupply?

Page 39: Cloud Expo - Flying Two Mistakes High

39

EXAMPLEWell,depends…

©2008-16NewRelic,Inc.Allrightsreserved.

Areanyofthe20servers in

thesamerack?

Sharethesamepowersupply?

Sharethesamepowersource?

Page 40: Cloud Expo - Flying Two Mistakes High

40

EXAMPLEWell,depends…

©2008-16NewRelic,Inc.Allrightsreserved.

Areanyofthe20servers in

thesamerack?

Sharethesamepowersupply?

Sharethesamepowersource?

SharethesameA/Csystem?

Page 41: Cloud Expo - Flying Two Mistakes High

41

LESSONFlyTwoMistakesHigh

©2008-16NewRelic,Inc.Allrightsreserved.

Redundancyisnotredundancywhentheresourcesarenotindependent

Page 42: Cloud Expo - Flying Two Mistakes High

42©2008-16 New Relic, Inc. All rights reserved.

EXAMPLEFailureloop

Page 43: Cloud Expo - Flying Two Mistakes High

43

EXAMPLEFailureloop

©2008-16NewRelic,Inc.Allrightsreserved.

Areyousafefrompoweroutages?

Youliveinanapartment…§ Theapartmentprovidesanenclosedgaragetostorethingsin

§ Thepowergoesoutinyourplacealot…§ ...youbuyagenerator,storeitinthegarage

Page 44: Cloud Expo - Flying Two Mistakes High

44

EXAMPLEFailureloop

©2008-16NewRelic,Inc.Allrightsreserved.

Oops

Oops…thegarage:§ Hasasingledoor,thebiggaragedoor§ Ithasagaragedooropener§ Thatrequireselectricitytoopen...§ Thegeneratorisonlyavailable...whenyoualreadyhavepower…

Page 45: Cloud Expo - Flying Two Mistakes High

45

LESSONFlyTwoMistakesHigh

©2008-16NewRelic,Inc.Allrightsreserved.

Makesureyourrecoveryplansactuallyareoperationalwhenyouareinafailuremode

Page 46: Cloud Expo - Flying Two Mistakes High

46©2008-16 New Relic, Inc. All rights reserved.

EXAMPLEHighredundancyinaction

Page 47: Cloud Expo - Flying Two Mistakes High

47

EXAMPLEArealsystem…

©2008-16NewRelic,Inc.Allrightsreserved.

Highlyindependent

Multi-levelerror recovery

Highlyrecoverablesystem

Redundant

Page 48: Cloud Expo - Flying Two Mistakes High

48

EXAMPLEArealsystem…

©2008-16NewRelic,Inc.Allrightsreserved.

Infact,oneoftheveryfirstlargescalesoftwareapplicationsutilizingextremeredundancyandfailuremanagement

Highlyindependent

Multi-levelerror recovery

Highlyrecoverablesystem

Redundant

Page 49: Cloud Expo - Flying Two Mistakes High

49

EXAMPLEWhatisthissystem?

©2008-16NewRelic,Inc.Allrightsreserved.

Page 50: Cloud Expo - Flying Two Mistakes High

50

EXAMPLEUSSpaceShuttleProgram

©2008-16NewRelic,Inc.Allrightsreserved.

§ Theyhadproblems…seriousmechanicalproblems...

§ Butthesoftwaresystemutilizedstateoftheart:• Redundancytechniques• Errorrecoverytechniques

Page 51: Cloud Expo - Flying Two Mistakes High

51

EXAMPLEUSSpaceShuttleSystem

©2008-16NewRelic,Inc.Allrightsreserved.

Five onboardcomputers§ Fourwereidentical(fifthtalkaboutlater)

§ Allfour:– Rantheexactsameprogramduringcriticalperiods

– Givensamedata– Expectedtogeneratethesameresult

Page 52: Cloud Expo - Flying Two Mistakes High

52

EXAMPLEFourcomputers

©2008-16NewRelic,Inc.Allrightsreserved.

Computersvotedontheproperoutcome

Ifanyonecomputerdidnotgeneratethesameresults:

Page 53: Cloud Expo - Flying Two Mistakes High

53

EXAMPLEFourcomputers

©2008-16NewRelic,Inc.Allrightsreserved.

Computersvotedontheproperoutcome

Thosethatdisagreedwiththeoutcomewereturnedoffforremainderoftheflight

Ifanyonecomputerdidnotgeneratethesameresults:

Page 54: Cloud Expo - Flying Two Mistakes High

54

EXAMPLEFourcomputers

©2008-16NewRelic,Inc.Allrightsreserved.

Ultimateindemocraticsystems…

Computersvotedontheproperoutcome

Thosethatdisagreedwiththeoutcomewereturnedoffforremainderoftheflight

Ifanyonecomputerdidnotgeneratethesameresults:

Page 55: Cloud Expo - Flying Two Mistakes High

55

EXAMPLEFourcomputers

©2008-16NewRelic,Inc.Allrightsreserved.

CouldFLYwithonlyTHREE computersworking

CouldLANDwithonlyTWO computersworking

Page 56: Cloud Expo - Flying Two Mistakes High

56

EXAMPLETies

©2008-16NewRelic,Inc.Allrightsreserved.

Whatifthefourcomputerscouldn’tdecide?

(softwarebugormultiplefailures)

Page 57: Cloud Expo - Flying Two Mistakes High

57

EXAMPLETies

©2008-16NewRelic,Inc.Allrightsreserved.

Whatifthefourcomputerscouldn’tdecide?

(softwarebugormultiplefailures)

Fifthcomputerwasusedasatiebreaker

§ Muchsimplerversionofsoftware…onlyusedforkeydecisions

§ Softwarewrittenbyindependentsoftwareteam,unconnectedwithrestofsoftwaredevelopers

§ (Intheory)wouldnotintroducesamesoftwareerrors…

Page 58: Cloud Expo - Flying Two Mistakes High

58

HighlySuccessful

©2008-16 New Relic, Inc. All rights reserved.

30-yearoperationofSpaceShuttle:§ Neveracasewhereaseriouslifethreateningproblemoccurredthatwasaresultofasoftwareproblem

§ Eventhoughsoftwarewasthemostcomplexsoftwareeverbuiltforaspaceprogram

Page 59: Cloud Expo - Flying Two Mistakes High

59

USSpaceShuttle

©2008-16NewRelic,Inc.Allrightsreserved.

Thisisextreme(notneededbymostprojects)§ Showswhatispossible...§ Independenceiscriticaltohighavailability

Page 60: Cloud Expo - Flying Two Mistakes High

60

LESSONFlyTwoMistakesHigh

©2008-16NewRelic,Inc.Allrightsreserved.

Useavailabilitysolutionconsistentwiththerisk

Page 61: Cloud Expo - Flying Two Mistakes High

61

LESSONFlyTwoMistakesHigh

©2008-16NewRelic,Inc.Allrightsreserved.

Useavailabilitysolutionconsistentwiththerisk

Highertherisk,higherthefocusonavailability

Page 62: Cloud Expo - Flying Two Mistakes High

62

LESSONFlyTwoMistakesHigh

©2008-16NewRelic,Inc.Allrightsreserved.

Useavailabilitysolutionconsistentwiththerisk

Highertherisk,higherthefocusonavailability

Don’toverinvest,don’tunder

invest

Page 63: Cloud Expo - Flying Two Mistakes High

63

LESSONFlyTwoMistakesHigh

©2008-16NewRelic,Inc.Allrightsreserved.

Useavailabilitysolutionconsistentwiththerisk

Highertherisk,higherthefocusonavailability

Don’toverinvest,don’tunder

invest

Butthinkahead,avoidthesurprise

Page 64: Cloud Expo - Flying Two Mistakes High

64

Andremember…

©2008-16NewRelic,Inc.Allrightsreserved.

“Keepyourplaneatleasttwomistakeshigh.”

Page 65: Cloud Expo - Flying Two Mistakes High

ArchitectingforScaleBy:LeeAtchisonPublished by:O’ReillyMedia,Available:June2016www.architectingforscale.com

WanttoLearnMore?

Page 66: Cloud Expo - Flying Two Mistakes High

©2008-15 New Relic, Inc. All rights reserved.

Thank you.

LeeAtchisonPrincipalCloudArchitectandAdvocateatNewRelic,Inc.

Architecting forScalePublished by:O’ReillyMedia,Available: June2016www.architectingforscale.com

@leeatchison leeatchison