Download - Data Centre Networks
![Page 1: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/1.jpg)
Data Centre NetworksGaurav Gautam, Preetam Patil, Parimal Parag
Centre for Networked Intelligence
![Page 2: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/2.jpg)
Standing on the shoulder of the giants
![Page 3: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/3.jpg)
Data Centre Networks
• A physical facility of networked compute and storage systems to enable the delivery of shared applications and data
• Components:• Network: routers, switches, firewalls
• Storage systems
• Compute system: servers, application-delivery controllers
• Scale:• Moderate scale enterprise DCN
• Hyperscale public cloud (as of 2017, Microsoft had 1M servers in 100 data centers)
![Page 4: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/4.jpg)
Public Cloud Revenue Growth
45,212 43,438 46,287 49,509
37,512 43,49857,337
72,022
102,064 104,672
120,990
140,62912,83614,663
16,089
18,387
44,45750,393
64,294
80,980
616
1,203
1,951
2,535
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
2019 2020 2021 2022
Cloud BusinessProcess Services(BPaaS)
Cloud ApplicationInfrastructureServices (PaaS)
Cloud ApplicationServices (SaaS)
CloudManagement andSecurity Services
Cloud SystemInfrastructureServices (IaaS)
Desktop as aService (DaaS)
Source: Gartner, 23 July 2020
![Page 5: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/5.jpg)
![Page 6: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/6.jpg)
DCN Design• Architecture design
• Spine vs spineless? Multicasting? Distributed vs centralized control?
• Modularity?
• Programmability and flexibility vs cost?
• Data placement• Availability through redundancy
• Congestion hotspot?
• Energy management • Cross system optimization?
• Telemetry• Sampling in time, network elements, features
• Traffic control
![Page 7: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/7.jpg)
Architecture
![Page 8: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/8.jpg)
Architecture Design
• Goals:• Scalability
• Availability (Redundancy in data and paths) and flexibility for fault tolerance (rerouting flows without downtime)???
• Adaptability to load fluctuations (congestion control) and predictability (service guarantees)
• Observability (DCN telemetry)
• Proposed architectures• Leaf-spine topology (better for warehouse scale DCN)
• Spineless topology (better for moderate scale DCN)
![Page 9: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/9.jpg)
Leaf-Spine and Spineless Topologies
Source: "A Survey on Data Center Networking (DCN):Infrastructure and Operations" IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 19, NO. 1, FIRST QUARTER 2017
4-ary Fat-Tree Level-1 DCell with n=4
![Page 10: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/10.jpg)
Data Placement
![Page 11: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/11.jpg)
Data placement
• Goals• High data availability• High fault tolerance• Low storage overhead• Low repair and regeneration bandwidth• Low repair locality• Low access latency
• Challenges• Server failures• System overload• Streaming data• Hotspot generation• Large data migration
Source: "A Survey on Data Center Networking (DCN):Infrastructure and Operations" IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 19, NO. 1, FIRST QUARTER 2017
![Page 12: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/12.jpg)
Availability through redundancy
A
A
B
B
A
B
A+B
A+2B• Redundant storage of data• Availability under finite failures• Coding techniques for efficient storage
![Page 13: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/13.jpg)
Fork-Join Scheduling
• Lower latency due to parallel access• Higher latency due to redundant access• Optimal redundancy selection
![Page 14: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/14.jpg)
Telemetry
![Page 15: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/15.jpg)
Telemetry
• Goal• Network state estimation with small overhead
• Measurements• Sampling in time• Sampling in network elements• Sampling in features• Raw observation vs statistics
• Typically important features• Latency statistics (per-flow, class, tenant for public clouds)• Throughput and buffer utilization• Energy usage vs load
• Control• Telemetry for control decisions such as admission/path selection
![Page 16: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/16.jpg)
Telemetry
Source: "Flexible sampling-based in-band network telemetry in programmable dataplane" ICT Express Volume 6, Issue 1, March 2020, Pages 62-65
In-band network telemetry
![Page 17: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/17.jpg)
Energy ManagementSource : Bloomberg
![Page 18: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/18.jpg)
Energy Management
• Goal• Reduce the energy expenditure and carbon footprint of servers and cooling
systems
• Challenges• Design of energy efficient storage devices
• Design of cooling systems
• Observations• Hyperscale DCNs are more energy efficient
43
43
113
Percentage
Coolingsystems
Servers
Storagedrives
Network
Source: Lawrence Berkeley National Lab 2016
![Page 19: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/19.jpg)
Energy vs performance tradeoff
• Cores with several speed levels
• Faster speed higher energy consumption
• Control of server speed for energy management
• Based on network telemetry
![Page 20: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/20.jpg)
Traffic Control
![Page 21: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/21.jpg)
DCN Traffic Control
• Goal: Optimally utilize the available bandwidth and adapt the network dynamics
• Control decisions• Admission control and priority management
• Path selection (SDN) vs routing (distributed)
• End-to-end congestion control
• Observations• Complexity and overhead tradeoff with performance
![Page 22: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/22.jpg)
Priority management
• Goal: Manage service guarantees for flows
• Flow classification• SLAs (tenants)
• QoS guarantees (application requirements)
• Elephant vs mice (best effort)
• Typical mechanisms• Allocate more resources by giving more scheduling opportunities
• Priority queueing through p4
• Priority flows go to priority buffer
• Differential priority flows using ToS/DSCP markings in packets
![Page 23: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/23.jpg)
Path Selection vs Routing• Goal: Find an optimal path/route• Topology discovery
• Spanning tree algorithms• Multicast trees
• Centralized• Path selection in SDN• Challenges: How much network information is needed
• Decentralized:• Routing algorithms (BGP, OSPF, IS/IS)• Challenges: Sub-optimal since only local view• Multipath routing: SPB, TRILL (single TCP flow on a single path only)• Splitting a TCP flow: FLARE, MPTCP• Challenges: which flows to spilt, when (imbalance threshold) to split, packet
reordering
![Page 24: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/24.jpg)
End-point congestion control
• Goal: Adapt to the current network conditions to meet service guarantees
• Challenges• Congestion hotspots
• TCP in-cast problem
• Unfairness• Respond to higher flow completion times, throughput collapse, or Jain unfairness index
• Rate control• TCP flow control
• DCTCP (utilises ECN of hardware switches)• ICTCP (In-cast TCP)
• Network rate control at end-points• Linux: DPDK/XDP, tc, Windows: NDIS
![Page 25: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/25.jpg)
DCN research platform implementation
• Vendor solutions:• DC switches (with support for segment routing, P4, MPLS)
• Integrated analytics framework (e.g., Tetration), or custom scripts?
• Simulators: NS3, DCNs-2, DCNSim, mtCloudSim, MATLAB
• Open-source platforms:• ONF Trellis switching fabric with programmable switches/switching ASICs
• Emulation: NG-SDN stack by ONF (ONOS, Stratum, bmv2)
• Combined H/W+S/W: NG-SDN stack along-with switching ASICs and NETFPGA card (network processor), XDP-supported NICs
• VM/Container/microVMs platforms
• (OpenStack/OpenShift/OpenNebula/Kubernetes) for workload generation and NFV
![Page 26: Data Centre Networks](https://reader035.vdocuments.us/reader035/viewer/2022081503/62a2a1aee219d561b86b7433/html5/thumbnails/26.jpg)
Thanks!