high-performance video streaming - acm sigcomm · 2017. 10. 27. · high-performance video...
TRANSCRIPT
![Page 1: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/1.jpg)
Disk|Crypt|NetHigh-performancevideostreaming
Ilias Marinos,RobertWatson(Cambridge),MarkHandley(UCL),
RandallStewart(Netflix)
![Page 2: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/2.jpg)
ModernVideoStreaming
• JustlotsofHTTPrequestsforvideochunks.• Clientpickschunkstoadaptrate.• Serverisprettydumb– justhastogofast.• HTTP/1.1persistentconnections.• TLSbecomingimportant(95%ofYoutube traffic).
• Morethan50%ofUSInternettraffic.• Importanttomakegooduseofexpensivehardware.Howfastcanyougo?
![Page 3: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/3.jpg)
NewiPlayer setup,Dec2015:• nginx onLinux,24coresontwoIntelXeonE5-2680v3
processors,512GBDDR4RAM,8.6TBRAIDarrayofSSDs.• 20Gb/sperserver. ßCanweimproveperformance?
![Page 4: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/4.jpg)
Casestudy:Netflix
• FreeBSD,buttweaked.– Asynchronoussendfile()• Non-blockingzerocopyfromdiskbuffercachetoNet.
– VMscaling• FakeNUMAdomainstoavoidlockcontention.• Proactivecleanupofdiskbuffercache.
– RSS-assistedLRO.• Sortincomingpacketstobucketsbasedon5-tuplehashtooptimizeLROengineefficacy.
![Page 5: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/5.jpg)
LetsDoSomeExperiments• 8-coreHaswellserver,2x40GbENICs,128GBRAM,4xIntelP3700NVMe disks
• LinuxClients.• Syntheticworkload,middlebox forrealisticRTT.
Streamer
middlebox
40GbEswitch
C C
ms
Client
middlebox
Streamer
μs
![Page 6: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/6.jpg)
Unencryptedvideostreamingworkload
DataNOTindiskbuffercache
Conclusions• Netfliximprovementsgood• CPUutilizationisaproblem
~2x Datacomesfromdiskbuffercache
CPUutilizationdoubleswhenfetchingfromdisk
(~350%->~700%)
![Page 7: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/7.jpg)
EncryptionProblem:
Sendfile:• Zerocopyfromdiskbuffercache.
TLS:• Different encryptedstreamperuser.• Kernel isunawareofTLS.
Sendfile andTLSarefundamentallyincompatible!
• ConventionalTLSstackgaveNetflix 20-> 8.5Gb/s• Netfliximplementedin-kernelTLSsupportforsendfile!.
sendfile()NOT zerocopy anymore!
![Page 8: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/8.jpg)
Encryptedvideostreamingworkload
Performanceloss(~30%)whencontentfetched
fromSSDs
CPUissaturated.Memoryreadthroughput~3xmorethannetwork
throughput!
![Page 9: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/9.jpg)
What’shappening?
NVMeDRAMLLC
NIC
BufferCache
Copieddata
Encrypteddata
Copy
TCP
CPU1
2
3
AES
Thestackistooasynchronous.DatakeepsgettingflushedfromtheLLC,andre-loaded.Systemisbottleneckedonmemory.
![Page 10: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/10.jpg)
ProductionNetflixWorkload
• 192GBforbuffercache,butonly10%hitratio.• Streamersbottleneckedinmemorybandwidth.
üModernNVMe SSDshavelowlatency &highthroughput.
üModernIntelCPUsDMAdirectlytoL3cache.
Canweeliminatethediskbuffercachecompletely,andfetcheverythingfromtheSSDs
on-demand?
![Page 11: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/11.jpg)
IdealStack
NVMe
DRAMLLC
NIC
AES
TCP
CPU
re-usebuffer
Toachievethis,wemust:• FetchondemandfromtheSSDwhenTCPneedsdata.• AssoonastheSSDreturnsdata,processitto
completionandDMAittotheNIC.
![Page 12: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/12.jpg)
SolutionOutline1. ATCPACKarrives,freeingupcongestion
window.2. TriggerstacktorequestmoredatafromSSDsto
fillthatcongestionwindow.3. SSDsreturndata placingthemintheLLC.4. Readcompletioneventcausesapplicationto
encryptthedatain-place,addTCPheaders,andtriggerthetransmissionofthepackets.
5. Networkcompletioneventfreesthebuffer,allowingittobereusedforalaterdiskread.
ConventionalOSstackNOTsuitable:Ø Highlyasynchronous;storageandnetworkstackare
looselycoupled-- reliesonVFS&BufferCache.Ø Introducesoverheadsrelatedtoabstractionlayers
(VFS,POSIXetc),redundantmemorycopiesanddomaintransitions(user<->kernel).
![Page 13: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/13.jpg)
TheAtlasStreamingStack
Atlas:acompleteuser-spacestackØ TCP/IPstackbasedonmodifiedversionofSandstorm(SIGCOMM’14) andnetmap(ATC’12).
Ø Storagehandledusingdiskmap (nobuffercache,nosophisticatedFS).
Ø Lockless,fullzero-copy stackfromdisk<->NIC.Ø Tightpipelinetoreduceasynchrony,andideallysavememorybandwidth(w/DDIO).
![Page 14: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/14.jpg)
Diskmap Architecture
SQ CQ
PCIe NVMe Disk
kernel
user
DMA
SQ CQ
nvme0-1
libnvmeapp
SQ CQ
nvme0-2
libnvmeapp
DMA
DMA
adminqpairs
C0 C1
I/OMMU
Diskmap:akernel-bypassI/OframeworkforNVMe disks
memorymapped
buffers buffers
![Page 15: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/15.jpg)
TheAtlasExecutionPipeline
SQ CQ
NVMe DiskNIC
RX TX
kernel
user
webserver
TCP/IP
libnmio libnvme
1
2
4buffers 5
637
![Page 16: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/16.jpg)
Atlasvs.Netflix,UnencryptedContent
Throughp
ut(G
b/s)
LLCmisses/s(x10
7 )Netflixneeds8
cores,Atlasonlyneeds4
15%betterthroughputthanNetflixwhencachehitratioislow.
AlmostnoCPUstalls:datainLLCwhenwewantit.
![Page 17: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/17.jpg)
Atlasvs.Netflix,EncryptedContent
Throughp
ut(G
b/s)
Mem
oryread/throu
ghpu
t
Whencachehitratioislow,50%morethroughputusinghalfthecores.
Almosthalfthememoryreadsforeachpacketsent.
![Page 18: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/18.jpg)
Atlasmemoryusage
WhenLLC/CPUisNOTsaturated:
WhenLLC/CPUissaturated:
DRAMLLC
NIC
AES
TCP
CPU
TCPPackets
re-usebuffer
NVMe
DRAMLLC
NIC
AES
TCP
CPU
TCPPackets
re-usebuffer
NVMe
Netmap doesn’tprovidealow-delayfine-grainedwaytocommunicateDMAcompletions.Can’treusebuffersfastenough(noLIFOstack),andthiscontributestosomeextracachepressure.
![Page 19: High-performance video streaming - acm sigcomm · 2017. 10. 27. · High-performance video streaming IliasMarinos, Robert Watson (Cambridge), Mark Handley (UCL), ... a kernel-bypass](https://reader034.vdocuments.us/reader034/viewer/2022051604/600260273eb8de35c83f8de5/html5/thumbnails/19.jpg)
Summary• Netflixaddressedallthelow-hangingfruit– Veryfast,butnowbottleneckedonmemory
• Atlasisaspecializedstack– PutsSSDdirectlyinTCPcontrolloop– Immediatelyprocessesdiskreadstocompletionandtransmits.
– 50%throughputimprovementwithencryptedcontent,closeto50%reductioninmemoryreads
• NetflixinspiredbyAtlas– NowexperimentingwithhowtodirectlytriggerencryptionoffofdiskDMAcompletionsintheirFreeBSDstack.