system architecture for web-scale applications using lightweight cpus and virtualized i/o
DESCRIPTION
System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O. *. Kshitij Sudan* Saisanthosh Balakrishnan § Sean Lie § , Min Xu § Dhiraj Mallick § , Gary Lauterbach § Rajeev Balasubramonian *. §. Exec Summary. Focus on web-scale applications - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/1.jpg)
System Architecture for Web-Scale Applications Using Lightweight
CPUs and Virtualized I/O
Kshitij Sudan*Saisanthosh Balakrishnan§
Sean Lie §, Min Xu § Dhiraj Mallick §, Gary Lauterbach§
Rajeev Balasubramonian*§
*
![Page 2: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/2.jpg)
HPCA-2013
Exec Summary
• Focus on web-scale applications• Contribution 1: use of simple cores• This amplifies the power/cost contribution of the
I/O subsystem• Contribution 2: virtualize I/O, e.g., single disk
shared by many cores• Contribution 3: software stack optimizations• Contribution 4: evaluations on a production
quality real design
![Page 3: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/3.jpg)
HPCA-2013
Exec Summary
• Focus on web-scale applications• Contribution 1: use of simple cores• This amplifies the power/cost contribution of the
I/O subsystem• Contribution 2: virtualize I/O, e.g., single disk
shared by many cores• Contribution 3: software stack optimizations• Contribution 4: evaluations on a production
quality real design
![Page 4: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/4.jpg)
HPCA-2013
Exec Summary
• Focus on web-scale applications• Contribution 1: use of simple cores• This amplifies the power/cost contribution of the
I/O subsystem• Contribution 2: virtualize I/O, e.g., single disk
shared by many cores• Contribution 3: software stack optimizations• Contribution 4: evaluations on a production
quality real design
![Page 5: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/5.jpg)
HPCA-2013
Web Scale Applications
• Targeting datacenter platforms• Focus on power and cost (OpEx and CapEx)• Web scale applications have large datasets,
high concurrency, high communication, high I/O – e.g., MapReduce
• Typically, performance increases as cluster size grows, but so does power and cost
![Page 6: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/6.jpg)
HPCA-2013
Energy Efficient CPUs
• For embarrassingly parallel workloads, energy per instruction (EPI) is important
• For a given power/energy budget, many low-EPI cores can yield a higher throughput than a few high-EPI cores
• Hence, use many light-weight energy-efficient CPUs (Atom CPU at 8.5 W)
![Page 7: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/7.jpg)
HPCA-2013
Contribution of the I/O Sub-System
• With light-weight cores, the energy and cost contributions of “other” components grow– Intel Atom CPU + Chipset = 11 Watts– Typical disk, or Ethernet card = 5-25 Watts– Fans, power supplies etc…
• The application only uses 20-60 MB/s disk bw, while the disk has a peak read bw of 120 MB/s
![Page 8: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/8.jpg)
HPCA-2013
0
20
40
60
80
100
120
140
160
Atom TeraSort - Aggregate Disk BW read Moving average (read) writ Moving average (writ)
Dsik
BW
(MB/
sec)
Wasting energy on over-provisioned resources
![Page 9: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/9.jpg)
HPCA-2013
Cluster-in-a-Box with Virtualized I/O
• Use energy-efficient CPUs– ~10x more CPUs in same power budget than using
typical server class CPUs• Virtualize I/O devices – disk and Ethernet– Balanced resource provisioning and lower
cost/power• Amortize fixed server overheads by sharing
components– Fans, power supplies, etc.
![Page 10: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/10.jpg)
HPCA-2013
Compute Cards
Compute card – 6 CPUs share 4 ASICs (PCIe connection), ASIC implements the fabric, 4GB DDR2 memory per CPU on the back
![Page 11: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/11.jpg)
HPCA-2013
Compute Cards
Compute card – 6 CPUs share 4 ASICs (PCIe connection), ASIC implements the fabric, 4GB DDR2 memory per CPU on the back
![Page 12: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/12.jpg)
HPCA-2013
Logical Organization
Ethernet FPGA
E-Cards
(Up to 8 per system each with 8xSATA HDD/SSD)
Storage FPGA
S-Cards
(Up to 8 per system, each with 8x1 GbE or 2x10 GbE)
CPU + ChipsetASIC
3D-Torus Interconnect formed by ASICs
ComputeCard
![Page 13: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/13.jpg)
HPCA-2013
Physical Organization
S-Card
E-Card
Compute Card
Midplane Interconnect
HDD/SSD
![Page 14: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/14.jpg)
HPCA-2013
Cluster-in-a-Box Summary• 768 CPU cores interconnected using a high bandwidth fabric
in a 3D torus topology– Low-latency distributed fabric architecture based on low-power
ASICs• FPGAs implement the disk and ethernet controllers • Fabric and FPGAs implement I/O virtualization
– Up to 64 disks shared by 384 server nodes• Server nodes don’t require a rack-top-switch to
communicate– All internal cluster communication via fabric
• Entire cluster consumes < 3.5kW under full-load
![Page 15: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/15.jpg)
HPCA-2013
System Software Improvements
• Implement large SATA packet sizes to reduce disk seek overheads
• Other OS/ethernet configuration knobs: avoid journaling in the filesystem, jumbo TCP/IP frames, interrupt coalescing
• MapReduce configuration: designate the few nodes near the S-cards as DataNodes
![Page 16: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/16.jpg)
HPCA-2013
Methodology
• Compare two cluster designs with the same power envelope to evaluate TCO and power for cluster architectures – 17-node Core i7 CPU based cluster (baseline) and
a 384-node Atom cluster-in-a-box– 4 kW Core i7 cluster; 3.5 kW Atom cluster-in-a-box– Four Apache Hadoop benchmarks– TCO calculations based on Hamilton’s model
![Page 17: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/17.jpg)
HPCA-2013
TeraGen TeraSort WordCount GridMix0
20
40
60
80
100
120
9.5
34.26
6.11
34.4823.68
98
5.66
65.63
Execution Time Results
AtomCore i7
Exec
ution
Tim
e (m
ins)
![Page 18: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/18.jpg)
HPCA-2013
Improvement in EDP
-100%
0%
100%
200%
300%
400%
500%
600%
700%
329%
606%
-34%
273%
% C
hang
e in
Per
f./W
-h
![Page 19: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/19.jpg)
HPCA-2013
TeraGen TeraSort WordCount GridMix-40%-20%
0%20%40%60%80%
100%120%140%160%
75.50%
147.75%
-15.38%
46.96%
Improvement in EnergyCh
ange
in P
erf./
Watt
![Page 20: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/20.jpg)
HPCA-2013
Performance/TCO vs. Number of Disks and Number of Cores
![Page 21: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/21.jpg)
HPCA-2013
Conclusions
• Datacenter power and cost are limiting factors when scaling web-scale apps– Build clusters using light-weight, low-power CPUs
• Balanced resource provisioning can improve utilization, cost, power– Virtualize I/O (disk and Ethernet)– Amortize the overheads of fans, power supplies, etc.
• The cluster-in-a-box system yields up to 6x improvement in EDP, relative to a traditional cluster
![Page 22: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/22.jpg)
Questions?
Thank You
![Page 23: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O](https://reader034.vdocuments.us/reader034/viewer/2022051420/56815e4c550346895dccc37e/html5/thumbnails/23.jpg)
CPU and Disk Utilization
HPCA-2013
768 CPUs, 64 disks 64 CPUs, 32 disks