network stack specialization for performance presented by donghwi kim (some figures are brought from...
TRANSCRIPT
![Page 1: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/1.jpg)
1
Network Stack Spe-cialization
for PerformancePresented by Donghwi Kim
(Some figures are brought from the paper)
![Page 2: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/2.jpg)
2
Objective
• The authors tried to show upper bound of network application performance by specialization(Actually, not only a network stack but also an ap-plication’s implementation is specialized)
• A special kind of applications is chosen(Serves same content to multiple users)• Sandstorm: A Web server serves static webpage• Namestorm: A DNS server
![Page 3: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/3.jpg)
3
Key of performance
• A complete zero-copy stack• Aggressive amortization• Pre-packetized data• Batching to mitigate system-call overhead
• Synchronous, clocked from received packets• Improves cache locality• Minimize the latency of sending the first packet of re-
sponse
• Intel’s DDIO
![Page 4: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/4.jpg)
4
Network stack
• libnmio: Data-movement and event-notification primitives• libeth: A lightweight Eth-
ernet-layer• libtcpip: An optimized
TCP/IP layer• libudpip: A UDP/IP layer
![Page 5: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/5.jpg)
5
A complete zero-copy stack• Receiving a packet• Done by DMA
• Transmitting a packet• Aggressive amortization
• Modify one of prepared a copy of packet and use DMA• The modifications are performed in a single pass to use
CPU’s L1 cache efficiently
![Page 6: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/6.jpg)
6
A complete zero-copy stack• pre-copy method• maintain more than one copy of each packet• potential to thrash CPU’s L3 cache
• memcpy method• maintain one long-term copy and create ephemeral
copies• more work should be done
![Page 7: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/7.jpg)
7
How the optimization works?
• Batching increases TCP RTT• Amortizing reduces per-request processing
![Page 8: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/8.jpg)
8
Intel’s DDIO
• Direct Data I/O
• When transmission• Pull data from the L3 cache without a detour through
system memory
• When reception• DMA can place data in processor’s L3 cache
![Page 9: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/9.jpg)
9
Evaluation
![Page 10: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/10.jpg)
10
Evaluation
![Page 11: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/11.jpg)
11
Evaluation
![Page 12: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/12.jpg)
12
DDIO
• Pre-copy case: DDIO pulls untouched incoming data into the cache, so the file data cannot be cached• Memcopy case: CPU loads file data into the cache
![Page 13: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/13.jpg)
13
Discussion
• mTCP vs. Sandstorm
![Page 14: Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper) 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649e4c5503460f94b413f5/html5/thumbnails/14.jpg)
14
Discussion
• mTCP• Provides UNIX-like socket programming interface• mTCP provides fairness
• TCP of Sandstorm• Higher level stack does not wrap lower level stack
• Each stack is a stand-alone service• For example, an application interacts directly with libnmio
• Amortization, no-queueing, inaccurate timer cannot guarantee correctness• Limited applications