keeping up with the architects. - ieee computer society€¦ · andrew warfield, ubc and coho data....
TRANSCRIPT
![Page 1: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/1.jpg)
Keepingupwiththearchitects.
AndrewWarfield,UBCandCohoData
![Page 2: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/2.jpg)
Aboutthiskeynote.(AndthethingsI'mnotgoingtotalkabout.)
![Page 3: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/3.jpg)
Notgoingtotalkaboutanyofthisstuffrightnow(buthappytointhehallwaytrack)• FinishedPhDatCambridgein2006• Workedinindustrialresearch(AT&TandIntel)• Twostartups(XenSource andCohoData)• AssociateprofatUBC• Threekids• Iwentheli skiinglastFriday.
![Page 4: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/4.jpg)
Here'swhatIamgoingtodo
• Makesomeprettyobviousobservationsabouttechnologydirections.• Drawsomedodgyandhighlyspeculativeconclusionsfromthoseobservations.• Trytoinfluenceyourresearch.
• Disclaimer:thisisnotaconferencetalk,norisit5stapledtogetherconferencetalks.• Anotherdisclaimer:I'mgoingtogiveyoumoreproblemsthansolutions.
![Page 5: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/5.jpg)
Solet'sgo…
![Page 6: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/6.jpg)
Section5:Evaluation.
• (Attheendoftheday,allsystemspapersareaboutperformance.)• Probablybecauseit'soneoftheonlythingsweknowhowtomeasure.• Therearetwotypesofperformanceresults:
1. Smallimprovementsinaverylargesystem.2. Speedupsthataresosignificantthattheychangefunctionality.
• GoogleandFacebookandAmazonandMicrosoftareprobablyalotbetteratsolvingmeaningfulproblemswiththeirsystemsthanyouare.
![Page 7: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/7.jpg)
Herearethehigh-leveltrends/ideasbehindthistalk1. Diminishingscarcity.2. Practical/sensibletoownyourownhardwareagain.3. Thesoftwarewehaveisturningouttobeabigger,slower,more
onerousburdenthanthehardwareitrunson.• Itisapoormatchforchangingperformanceandfailurecharacteristicsofhardware.
• Itisapoormatchfortheoperationalneedsofusers.
![Page 8: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/8.jpg)
Consequencesoftheseideas
• Thegoalpostsaremovingintermsofwhatwedesignsystemsfor.
• Humancostsassociatedwithrunningoursystemsareabiggerexpenseandinconvenience,atalllevels,thanthepiecewiseperformanceofcomponents.• Theyareactuallyabarrier.
• Theendofscarcitymarksthebeginningofapushforefficientpredictability.• Thisiswhystoragecustomersbyflash.It’salsoahardsystemsproblem.
![Page 9: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/9.jpg)
Sowhatdoweneedtounderstand,assystemsresearchers,tohelp?
![Page 10: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/10.jpg)
Onesignificanthardwarechage:Rackscale
![Page 11: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/11.jpg)
11
Thisisagoogledatacentercirca2001.GFS(2003):largestdeploymentshadover1,000storagenodes,hundredsofclients,
300TBofstoragespace
![Page 12: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/12.jpg)
http://itq.nl/intels-take-open-compute-project-rack-scale-architecture/
https://www.supermicro.com/solutions/SRSD.cfm
![Page 13: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/13.jpg)
Whatis"rackscale"?
• Everythinginarackwillshareahighperformancebus.• Withinarack,opticalinterconnectsareexpectedtoreachterabitbandwidthintheneartermwithsub-microsecondlatencies.
• Theserverasweknowitwillbecompletelydisaggregated.• CPUs,GPUs,storage,networkinterfaces,andvolatilememorywilleachmovetoindependentphysicalenclosures.Arbitrarycompositionandindependentscale.
• Rackresourceswillbeverydense.• Like,really dense.• Asaballpark,withinarackwearelikelytoseethousandsofcores,tensofpetabytesofpersistentmemory,andterabytesofRAM.
• Inshort,asingledatacenterrackwithacapitalvalueinthelowmillionsofdollars,willbeascapableasentirefirst-generation(e.g.2003-era)"warehouse"datacentersfrompubliccloudproviders
![Page 14: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/14.jpg)
Consequencesoftherackscaletrendonsoftwaredesign.
![Page 15: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/15.jpg)
What’schanging?
1. Storageisbecomingdense.• Problematicallydense!
2. Thememoryhierarchyishavinganidentitycrisis.3. Applicationlatencyisacrueltaskmaster.
![Page 16: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/16.jpg)
Trend1:Densenonvolatilestoragecapacity.
![Page 17: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/17.jpg)
DenseNonvolatileCapacity
• Flashvendorshavefinallystartedtorelaxaboutthedurabilityproblem.• Thejawdroppingbit:wewillsee4PBin1uinasmallnumberofyears.• Atapricethatapproachesspinningdisk.
• Thebadnews:intheimmediateterm,interconnectionwillbeaproblem.• Andinthelongertermitmaynotgetawholelotbetter.
![Page 18: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/18.jpg)
TrendsSSD Cap/1u Xputperdata
2TB 64TB 312MB/s/TB
8TB 256TB 78MB/s/TB
32TB 1PB 20MB/s/TB
128TB 4PB 5MB/s/TB
18
![Page 19: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/19.jpg)
TrendsSSD Cap/1u Xputperdata
2TB 64TB 312MB/s/TB
8TB 256TB 78MB/s/TB
32TB 1PB 20MB/s/TB
128TB 4PB 5MB/s/TB
NVMedevice:x4PCIeBroadwellCPU:40PCIelanes
19
![Page 20: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/20.jpg)
TrendsSSD Cap/1u Xputperdata
2TB 64TB 312MB/s/TB
8TB 256TB 78MB/s/TB
32TB 1PB 20MB/s/TB
128TB 4PB 5MB/s/TB
NVMedevice:x4PCIeBroadwellCPU:40PCIelanes
TORcross-racklinkstypicallyoversubscribedat3or4:1
20
![Page 21: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/21.jpg)
Thisisverydifferentfromallthestoragesystemsthatwe'vebuiltinthepast.• Noseekpenalty.
• MeansthatbackgroundI/Oisactuallyreasonabletodo.• Migrationforperformance.• Alternaterepresentations(e.g.materializedviews,intentionalDUPlication)oftenforperformance.
• Metadataalldaylong.• Sprinklerheadsareaproblem.
• 4PBisanawfullyscaryfailuredomain.• Sensibleapplicationoferasurecodingneedsfiveormorenodes.• East/westtrafficisconstrained.
• Capacity-motivateddeletionissillyinmostcases.• Butrealdeletionprobablyneedstobeencryptionbased.
![Page 22: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/22.jpg)
Mirador (FAST’17)
Centralizedthree-stagepipelinecontinuouslyoptimizesplacement22
![Page 23: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/23.jpg)
Trend2:Themagicofpersistentmemory.
![Page 24: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/24.jpg)
PersistentMemory
• Everyoneisexcitedabout3DXpoint.• (Whattheheckis3dxpoint?)
• Badnews:persistentRAMisatotalPITA.• Becauseit'snotreallypersistentRAM:ramasyouthinkaboutitisatotalilusion.• It'sreallyasuperduperfastdisk.• Infact,it'sasuperduperfast*single**unreliable*disk.Butmoreonthisinasec.
• Butwait,thisdoesn'tmeanthatXPoint isn'taspectacularlygoodidea.• Withit,RAMisabouttobreakthroughthememorywall(coretocapacityratio).• TechnologieslikeXPoint willgiveusamultiplieronworkingset.• Persistencewillmassivelyspeeduprestarttimes,especiallyforread-onlydata.
![Page 25: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/25.jpg)
Onemorespanner:Disaggregation.
• Somesignificantamountofmemoryisabouttomoveoffhost.• Nobodyseemstoagreeonhowthisisgoingtohappen.
• "remote"memoryvssharedphysicalbusvsRack-scaleCCNUMA• Allofthesethingsareinterestingintwoways.
• First,failuredomainsareverydifferent...inwaysthatAppsandOSesareNOTusedtoreasoningabout.
• Second,theyaffordanentirelynew(andexciting!)formofdynamism.• MapReduceandSparkhaveagoodbutverycoarse-grainednotionofpartitioning.• Thesesystemshavethepotentialtobesomuchmoredynamic.• Sameforscaleoutdatastores.• SameforstatereplicationandHA
![Page 26: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/26.jpg)
Sowhat'sgoingtohappenhere
• Totalchaos.• Persistentmemorylookslikeareallyfastdisk.Disaggregatedmemorylookslikeanextensionofthecachehierarchy.• Ourviewofmemory,locality,andpersistenceisintrouble.• Interfacesandabstractionsreallyneedtochangeinsupportofthis.• Oneprediction:filesystemandvirtualmemorywillmerge.
• Loadsofreasonstodothis-- serializationoverheads,reboots,sharing.• butstillmanyopenquestions.
![Page 27: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/27.jpg)
Trend2:Applicationlatency.
![Page 28: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/28.jpg)
Latency
• Tellmeifyou’veheardthisonebefore:CPUsaren'tgettingfaster• I/Oisgettingfasterandwider.• Latencyisbecomingadominantmetric.
• Directimpactone.g.purchaseprobability.• Butit'samuchhardermetrictoworkwiththanthroughput.
• ShrinkingI/Olatenciesresultsinincreasedcomputationaldensity.• BecauseI/Owaitgoesaway(e.g.DBMS)
• Butalatencyfocusimposesalotofconstraintsonsoftwaredesign.• Especiallytail-latencySLOs.• Needtoreasonabouttheslowpathasacommoncase.
![Page 29: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/29.jpg)
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
ContentionFree SingleLock
NumberofCores
Throughp
ut(K
IOPS)
THE COST OF CONTENTION
![Page 30: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/30.jpg)
Core
DPDK
TCP
SPDK
BlockI/O
DecibelLogic
Userspace
Kernel
CoreCore
HardwareQueues
Decibel(NSDI‘17)
• Howshouldwestructureastoragesystemtoprovidevirtuallocaldisks?
• Partitionlikecrazy,crusadeagainstlatency,pushallunnecessaryfunctionalityupthestack.
• Thisgeneralizestoapplications.
![Page 31: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/31.jpg)
0
100
200
300
400
500
600
700
800
900
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Local Decibel(DPDK) Decibel(Legacy)
0
100
200
300
400
500
600
700
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Local Decibel(DPDK) Decibel(Legacy)
DecibelPerformance(70/30MixedWorkload)
422 vs450 vs490μs
Throughput(KIOPS) Latency(μs)
NumberofCores NumberofCores
![Page 32: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/32.jpg)
Everythinghurtslatency
• Redundancyisagoodexampleofwhythisgetshard.• Forinmemory,networkRTTwillapproachmediastoretime.• Soaremotewritedoublesthecost.• Worse:Replicationatlowerlayersofthesystemisinvariablyamplified.• Thisiswhyemergingdatastoresdon'tdoit.
• Areallatencyfocusdrivessoftwarearchitectureinaveryspecificdirection.• Contentionisasourceofhard-to-reason-aboutperformancevariance.• Soavoidcontentionatallcosts.Designitoutupfront.• (Ifyoudothisright,youbenefitfromnothavingtohiredevelopersthatunderstandlocking.)
• Doingthisrightmeansdesigningdataandcode-levelpartitioningverycarefully.• LessacademicallyrewardingthanOCCandlockfreedom,butseeparentheticpointabove.
![Page 33: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/33.jpg)
Andwiththat,I’mmostlydone.
![Page 34: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/34.jpg)
Herearethehigh-leveltrends/ideas
1. Diminishingscarcity.2. Practical/sensibletoownyourownhardwareagain.3. Softwareneedstochange.
![Page 35: Keeping up with the architects. - IEEE Computer Society€¦ · Andrew Warfield, UBC and Coho Data. About this keynote. (And the things I'm not going to talk about.) Not going to](https://reader035.vdocuments.us/reader035/viewer/2022063013/5fcb3e648caffe63933a7af2/html5/thumbnails/35.jpg)
Closingthought.
• Nobodyisgoingtoadoptyourstuffunlessyoumakeitaseasyasheckforthemtodoit.• Exposeyourresearchresultsasaservice,orassomethingasclosetoaserviceasispossible.• Putthemincontainers,hostthemonAWS.
• Solveapplicationproblems.• Earlyexperiencesworkingwithphysicalscientists.