data center computing trends a survey
Post on 17-May-2015
1.896 Views
Preview:
DESCRIPTION
TRANSCRIPT
Datacenter Computing
Trends and Problems :
A survey
Partha KunduSr. Distinguished Engineer
Corporate CTO Office
Special Session, May3NOCS 2011
Pittsburgh, PA, USA
Special Session NOCS 2011 2Partha Kundu
Data center computing is a new paradigm!
Special Session NOCS 2011
Outline of talk
Power & Energy in Data Centers
Network architecture
Protocol interactions
Conclusions
3Partha Kundu
Special Session NOCS 2011
Power & Energy in the Data Center
4Partha Kundu
Special Session NOCS 2011
Source: ASHRAE Source: Google 2007
Data Center Energy breakdown Server Peak power usage profile
• Power delivery and Cooling overheads are quantified in PUE metric• Cooling is the most significant source of energy inefficiency
CPU power contribution is less than 1/3 of server power
5Partha Kundu
Special Session NOCS 2011
Energy Efficiency
Most of the time server load is around 30%
But, server is least energy efficient in it’s most common operating region!
Source : Barroso, Holzle: Data Center as a Computer, Morgan Claypool (publishers), 2009
Servers are never completely idle
6Partha Kundu
Special Session NOCS 2011
Dynamic Power Range
CPU power component (peak & idle) in servers has reduced over the years
Dynamic Power range:• CPU power range is 3x for servers• DRAM range is 2X• Disk and Networking is < 1.2X
Disk and Network switches need to learn from the CPU’s power
proportionality gainsSource : Barroso, Holzle: Data Center as a Computer, Morgan Claypool (publishers), 2009
7Partha Kundu
Special Session NOCS 2011
Energy Proportionality
Goal:Achieve best energy efficiency
(~80%) in the common operating regions (20 – 30% load)
Challenges to proportionality:• Most proportionality tricks in embedded/mobile devices are not useable in DC due to huge activation penalties
• Distributed structure of data and application doesn’t allow powering down during low use• Disk drives spin >50% of time even when there is no activity
[Sankar et al, ISCA ‘08] smaller rotational speeds, multiple heads
8Partha Kundu
Special Session NOCS 2011
Source : Kozyrakis et al, IEEE Micro 2010
Application Behavior in Data Centers
• Cosmos is similar to data mining workload• Bing preloads web index in memory• But, peak disk bandwidth can be high
Significant variation in disk, memory and network capacity and bandwidth usage across Apps
9Partha Kundu
Special Session NOCS 2011
Dynamic Resource requirements in the Data-center
Intra-server variation (TPC-H, log scale) Inter-server variation (rendering farm)
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q120.1MB
1MB
10MB
100MB
1GB
10GB
100GB
Time
Serv
er M
emo
ry A
lloca
tio
n
Query
Huge variations even within a single Application running in a large cluster
10Partha Kundu
Special Session NOCS 2011 11Partha Kundu
CPUsDIMMDIMM
CPUsDIMMDIMM
CPUsDIMMDIMM
CPUsDIMMDIMM
DIMM
DIMM
DIMMB
ackplan
e
DIMM
DIMM
DIMM
DIMM
DIMM
Conventional blade systems
Motivating Disaggregated memory**Lim et al: Disaggregated Memory for expansion and sharing in Blade Servers, ISCA 2009
Special Session NOCS 2011 12Partha Kundu
Disaggregated Memory*
Break CPU-memory co-location
Leverage fast, shared communication fabrics
Blade systems with disaggregated memory
CPUsDIMMDIMM
CPUsDIMMDIMM
CPUsDIMMDIMM
CPUsDIMMDIMM
Backp
lane
DIMM
*Lim et al: Disaggregated Memory for expansion and sharing in Blade Servers, ISCA 2009 Memory blade
DIMMDIMM
DIMMDIMMDIMM
DIMM DIMM
Special Session NOCS 2011 13Partha Kundu
Blade systems with disaggregated memory*Lim et al: Disaggregated Memory for expansion and sharing in Blade Servers, ISCA 2009
CPUsDIMMDIMM
CPUsDIMMDIMM
CPUs DIMMDIMM
CPUs DIMMDIMM
Backp
lane
DIMM
Memory blade
DIMMDIMM
DIMMDIMMDIMM
DIMMDIMM
Authors claim: 8X improvement on memory constrained environments 80+% improvement in performance per $ 3x consolidation
Disaggregated Memory*
Special Session NOCS 2011
Disaggregated Server
High Density, Low Power SM10000 Servers*• Designed to replace 40 1 RU servers in a single 10 RU system. • 512 1.66 GHz 64 bit X86 Intel Atom cores in 10 RU; 2,048 CPUs/rack• 1.28 Terabit interconnect fabric• Up to 64 1 Gbps or 16 10 Gbps uplinks• 0-64 SATA SSD/Hard disk• Integrated load balancing, Ethernet switching, and server management• Uses less than 2.5 KW of power
SeaMicro SM10000 server*
Claim:Achieves 4x Space & Power consolidation
*Source : Seamicro URL http://www.seamicro.com/?q=node/102
14Partha Kundu
DRAM Disk drivesPower supply
Fabric connectivity
Servers with Consolidated
Special Session NOCS 2011
Network Architecture
15Partha Kundu
Special Session NOCS 2011
Requirements of a Cloud-enabled Data Center
Capacity re-allocation
Economies of Scale
Economic & Technical Motivations:
Use commodity hardware & components
Dynamically distribute compute resources
16Partha Kundu
Special Session NOCS 2011
Status Quo: Conventional DC Network
Ref: “Data Center: Load balancing Data Center Services”, Cisco 2004
CR CR
AR AR AR AR. . .
SS
DC-Layer 3
Internet
SS
…
SS
…
. . .
DC-Layer 2Key
• CR = Core Router (L3)• AR = Access Router (L3)• S = Ethernet Switch (L2)• A = Rack of app. servers
~ 1,000 servers/pod == IP subnet
17Partha Kundu
Special Session NOCS 2011
Conventional DC Network Problems
• Cost of network equipment is prohibitive
• Limited server-to-server capacity
CR CR
AR AR AR AR
SS
SS
…
SS
…
. . .
SS
SS
…
SS
…
~ 5:1
~ 40:1
~ 200:1
18Partha Kundu
Special Session NOCS 2011
And More Problems …
CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IP subnet (VLAN) #1
~ 200:1
• Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency)
IP subnet (VLAN) #2
… … … …
19Partha Kundu
Special Session NOCS 2011
And More Problems …
CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IP subnet (VLAN) #1
~ 200:1
• Server IP address assignments are topological
• IP movement from contained VLAN is hard
Complicated manual L2/L3 re-configuration
IP subnet (VLAN) #2
… … … …
20Partha Kundu
Special Session NOCS 2011
What We Need is…..
1. L2 semantics
2. Uniform High capacity
3. Performance isolation
… … … …
21Partha Kundu
Special Session NOCS 2011
Achieve Uniform High Capacity :Clos Network Topology*
. .
.
. .
.TOR
20 Servers
Int
. .
.
. . . . .
.
Aggr
K aggr switches with D ports
20*(DK/4) Servers
. .
.. . . . . . . .
• Large bisection BW • Multi paths at modest cost
• Tolerates Fabric Failure
*Ref: A Scalable, Commodity, Data Center architecture, Al-Fares et al, SIGCOMM 2008
22Partha Kundu
Special Session NOCS 2011
Addressing and Routing:Name-Location Separation
payloadToR3
. . . . . .
yx
Servers use flat names
Switches run link-state routing and maintain only switch-level topology
y zpayloadToR4 z
ToR2 ToR4ToR1 ToR3
ypayloadToR3 z
. . .
DirectoryService
…x ToR2
y ToR3
z ToR4
…
Lookup &Response
*VL2: A Scalable and Flexible Data Center Network, Greenberg et al, SIGCOMM 2009
23Partha Kundu
Special Session NOCS 2011
Addressing and Routing:Name-Location Separation
payloadToR3
. . . . . .
yx
Servers use flat names
Switches run link-state routing and maintain only switch-level topology
yzpayloadToR4 z
ToR2 ToR4ToR1 ToR3
payloadToR3 z
. . .
DirectoryService
…x ToR2
y ToR3
z ToR4
…
Lookup &Response
…x ToR2
y ToR3
z ToR3
…
24Partha Kundu
*VL2: A Scalable and Flexible Data Center Network, Greenberg et al, SIGCOMM 2009
Special Session NOCS 2011
VL2 FabricObjectives and Solutions
SolutionApproachObjective
2. Uniformhigh capacity between servers
Enforce hose model using existing
mechanisms only
Employ flat addressing
1. Layer-2 semantics
3. Performance Isolation
Guarantee bandwidth for
hose-model traffic
Clos based network,Valiant LB flow
routing
Name-location separation &
resolution service
TCP
25Partha Kundu
Special Session NOCS 2011
Protocol Interactions
26Partha Kundu
Special Session NOCS 2011
TCP InCast Collapse : Problem
Affects key datacenter applications with barrier synchronization boundaries e.g. DFS, web search, MapReduce
Source : Nagle et al, The Panasas ActiveScale Storage Cluster – Delivering Scalable High Bandwidth Storage,SC2004
27Partha Kundu
Special Session NOCS 2011 28Partha Kundu
Special Session NOCS 2011
New Cluster Based Storage System
29Partha Kundu
Special Session NOCS 2011
Incast Application overfills Buffers
30Partha Kundu
Special Session NOCS 2011
Solution: TCP with ms-RTO**Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication, Vasudevan et al, SIGCOMM 2009
• Little adverse effect on WAN traffic
31Partha Kundu
Special Session NOCS 2011
Incast Collapse : an unsolved problem at scale*
*Understanding TCP Incast Throughput Collapse in Datacenter Networks, Griffith et al WREN 2009
Solution space is complex:• Network conditions can impact RTT• Switch buffer management strategies • Goodput can be unstable with load/num. senders
32Partha Kundu
Special Session NOCS 2011
Conclusions
33Partha Kundu
Special Session NOCS 2011
• Opportunities to realize energy efficiency particularly in IO sub-systems
• Data Center fabrics need to be re-architected for application scalability and cost
• WAN artifacts can create bottlenecks
34Partha Kundu
Data Center Computing
Special Session NOCS 2011
• Energy Efficiency: Local (distributed) energy management decision & coordination by NOC
• Fabric communication:NOC can reduce intra-chip/socket communication latencies between VMs
• Congestion Mgt:NOC can assist in traffic orchestration across VMs
35Partha Kundu
NOCs in the Data Center
Special Session NOCS 2011 36Partha Kundu
Thank you!
top related