network overlays - events.static.linuxfound.org04/07/12 vivek kashyap ([email protected]) 9 overlay...
TRANSCRIPT
Vivek Kashyap ([email protected]) 204/07/12
Cloud/DataCenter
● Virtual Machines (VM) deployed on physical nodes in the DataCenter
─ Domain: rack, VLAN, site
● Network isolation a must for a viable multi-tenant solution
─ Each tenant (company, department) has security and privacy requirements
─ Private view of network: load-balancers, firewalls
● Collaborative applications on same virtual network
Interface
Cloud Controller
Domain1 Control DomainN Control
Domain 1
Host 1
Host 2
Host 3Domain n
Host 1
Host 2
Host 3Virtualization
Compute Storage Memory Network
Application
Operating System
Image
Virtual Server Virtual Server Virtual Server
ApplicationOperating
System
ImageApplication
Operating System
Image
Image repository
Vivek Kashyap ([email protected]) 304/07/12
Virtualization increases network complexity
● VM deployable anywhere in the data-center without subnet boundary, host, rack server, or VLAN constraints
─ VM IP and MAC addresses migrate with the VM
─ VM network security profile migrates with the VM
● Physical network must be dynamically configured
─ Network state must be coordinated between hypervisor and switch port
> Physical network modified as the VM migrates
VM Migration
Virtual Machine
Routers
Switches/routers
Data Center
Vivek Kashyap ([email protected]) 404/07/12
Layer-2 and massive scaling of data centers
● Broadcast/multicast is problematic in flat layer-2 networks
─ Every device in the network must process the packets
─ Broadcast storms may bring network down
● Address resolution
─ ARP: broadcast and periodic flushing of entries
─ Neighbour Discovery: Solicited node address reduces load significantly
─ Routers connected to L2 broadcast domains must handle ARP traffic
● Virtual machine placement is constrained
─ Limiting broadcast domain to TOR limits placement of workloads to the rack
─ Extending layer-3 to aggregation layer may cause broadcast on all VLANs
> Layer-2 access switches enable VLAN on all ports to allow reachability
─ Layer-3 to core suffers maximum broadcast load
● Large number of Virtual Machines and the resultant broadcast/multicast frames require much larger table sizes in the switches
Server racks
Aggregation layer
ToR ToR ToR
Core
ToR
Core
Vivek Kashyap ([email protected]) 504/07/12
Multi-tenancy and large data center networks
● Each tenant expects features akin to a physical data center
─ Security and isolation, control, workload balancing and placement
● Network resources must be provisioned on-demand
─ New customer, additional resources
─ Workload/VM migration without being constrained by subnet boundaries
● Tenant virtual network spans multiple boundaries
─ For optimal utilization the VM's from one tenant may be supported in same rack or host
─ Tenant traffic must stay isolated
> VLANs and broadcast domain
> Firewalls
● Tenant virtual network must interact with non-virtualized resources where needed
─ Firewalls, intrusion detection, load-balancers, etc.
Vivek Kashyap ([email protected]) 604/07/12
Overlay Networks
VM Migration
Domain 2Domain 1
Virtual network Virtual Machine
Switches/routers
Routers
● Solution: Create overlay network over the physical network
─ Decouple physical and logical configuration
─ VM's dynamically deployed without effecting switch/router table sizes
● Overlay network == Tunnel
─ Encapsulate the frame at first hop network device (vswitch, switch, router)
─ Send packet to the target decapsulating device
Vivek Kashyap ([email protected]) 704/07/12
Virtual networks
Virtual Cloud 2
VM Migration
Data Center 2Data Center 1
Virtual Cloud 1
Virtual Cloud 3Virtual Machine
Switches/routers
Routers
● Each tenant's traffic is completely isolated from other tenants
─ Frames carry a Virtual Network Identifier (24bit == 16 million networks)
> For comparison: there are only 4K VLANs
● Virtual network address space is isolated from other virtual networks
─ Same IP and MAC may be used in different virtual networks
─ VM can be placed anywhere irrespective of layer 2 constraints
● Packets do not exit a virtual network except through controlled 'gateways'
Vivek Kashyap ([email protected]) 804/07/12
Multi-tenancy with overlapping addresses
8 System Networking8
Site2
Site1
Server1C
oke
Ove
rlay
Net
wo
rk
10.0.3.100:23:45:67:00:03
10.0.5.700:23:45:67:00:04
10.0.5.700:23:45:67:00:04
10.0.5.400:23:45:67:00:25
Pep
siO
verl
ay N
etw
ork
Database
Database
HTTP1APP2
10.0.3.4200:23:45:67:00:25 APP
10.0.5.400:23:45:67:00:02 HTTP3
10.0.5.100:23:45:67:00:01
HTTP2
10.0.3.100:23:45:67:00:01
HTTP1
10.0.3.4200:23:45:67:00:03
HTTP2
10.0.5.100:23:45:67:00:02APP3
Server2
8 System Networking8
Site2
Site1
●
Ove
rlay
Net
wo
rk 2
10.0.3.100:23:45:67:00:03
10.0.5.700:23:45:67:00:04
10.0.5.700:23:45:67:00:04
10.0.5.400:23:45:67:00:25
Ove
rlay
Net
wo
rk 1
VM1
VM5
VM2VM2
10.0.3.4200:23:45:67:00:25 VM3
10.0.5.400:23:45:67:00:02 VM5
10.0.5.100:23:45:67:00:01
VM4
10.0.3.100:23:45:67:00:01
VM3
10.0.3.4200:23:45:67:00:03
VM4
10.0.5.100:23:45:67:00:02VM1
Vivek Kashyap ([email protected]) 904/07/12
Overlay network structure
Bridge NICOBP
VM VMVM
● The encapsulation and decapsulation of packets is done at the edge boundary of the overlay (Overlay Boundary Point)
─ The OBP may be a vswitch, access switch, or a network appliance
● OBP keeps a per VN mapping of end-station to remote OBP address
● OBP maintains a per-VN state for delivering multicast packet
Bridge NIC
VM VM
OBP
OBP extending linux vswitch Access swtich with embedded OBP
vlan
Vivek Kashyap ([email protected]) 1004/07/12
Standardization required● Define a header format
─ A VNID must be supported in each frame
─ Payload maybe Ethernet or IP packet
● Fragmentation ?
─ Encapsulation may lead to exceeding the link MTU
> Fragmentation of packet at IP layer, done in overlay, or prevented
● Checksum & FCS
─ Need not duplicate checksum or FCS in both inner and outer headers
● Control plane
─ How to populate the forwarding table of a virtual network instance?
─ How to handle multi-destination frames with a virtual network instance?
─ How to associate an end-point with a virtual network instance?
─ How to de-associate an end-point from a virtual network instance
Vivek Kashyap ([email protected]) 1104/07/12
IETF Proposals on overlay networks● VxLAN: Virtual exTensible Local Area Network
─ http://datatracker.ietf.org/doc/draft-mahalingam-dutt-dcops-vxlan
─ 24-bit VN ID
─ Encapsulates frames in UDP (MAC in UDP)
─ Utilizes IP multicast for address resolution and broadcast
● NVGRE: Network Virtualization using Generic Routing Encapsulation
─ https://datatracker.ietf.org/doc/draft-sridharan-virtualization-nvgre
─ 24 bit VN ID
─ Encapsulates frames in GRE
─ Utilizes IP multicast for address resolution and broadcast
● Stateless Transport Tunneling Protocol for Network virtualization
─ http://tools.ietf.org/html/draft-davie-stt-01
─ Utilizes TCP segmentation offload capabilities of NIC
Vivek Kashyap ([email protected]) 1204/07/12
Address resolution● VxLAN/NVGRE rely on IP multicast
─ Packets are flooded to all OBPs participating in the VN if target unknown
● Can we avoid flooding?
─ http://tools.ietf.org/html/draft-shah-armd-arp-reduction-01
─ Suggests ToR/switch caching mappings and responding
─ Multiple caches in network may provide similar function
─ Extend to support an 'ARP reduction module'
> Receives ARP requests from the bridge. If not in cache sends to ARM agent.
> ARM agent consults “Address database” to return target OBP mapping
KVM guest/Container/daemon
ARM Agent
Bridge
VM1VM3
NIC
VM2
ARMBridge
ARM
OBP OBP
Vivek Kashyap ([email protected]) 1304/07/12
Fragmentation?● Overlay encapsulation may cause packet to be fragemented
─ A loss of fragment implies loss of entire data-segment
● Proposal
─ vSwitch/OBP considers IP_DF bit to be set for each packet implicitly
─ If the packet is to be fragmented due to addition of overlay header
> Generate an ICMP error “Datagram too big” to the VM» This is the case with Ipv6 by default
─ VM will reduce its view of the MTU for that path as a result
> And no need to fragment in OBP
─ If IP packet is larger than MTU, the VM's IP stack will fragment
> Each fragment will be an encapsulated as a separate overlay packet
Vivek Kashyap ([email protected]) 1404/07/12
Additional references
● Network Virtualization Overlay Control Protocol Requirements
─ http://www.ietf.org/id/draft-kreeger-nvo3-overlay-cp-00.txt
● ARP Broadcast reduction for large data centers
─ http://tools.ietf.org/html/draft-shah-armd-arp-reduction-01
● Problem statement for ARMD
─ http://tools.ietf.org/html/draft-ietf-armd-problem-statement-02
● Problem Statement: Overlays for Network Virtualization
─ http://tools.ietf.org/html/draft-narten-nvo3-overlay-problem-statement-01
─
Vivek Kashyap ([email protected]) 1604/07/12
Legal StatementThis work represents the view of the author and does not necessarily represent the view of IBM.
IBM is a registered trademark of International Business Machines Corporation in the United States and/or other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries .
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, and service names may be trademarks or service marks of others.