networking is not free: lessons in network design
DESCRIPTION
An in-depth critique of the existing OpenStack networking approach, with a focus on how the Nova network controller is more of a hindrance than a help. Discusses the gap in Quantum's functionality required to close the gap, and alternative solutions. How can we make networking in OpenStack robust, high performance, and fault tolerant? What do typical large scale networks look like and what lessons can we learn from them? Is there an approach to networking we can take that is the same with a handful of servers as it is with hundreds of racks?TRANSCRIPT
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution** All unlicensed or borrowed works retain their original licenses
Dan Sneddon
Member Technical Staff
Twitter: @dxs
Download: http://engineering.cloudscaling.com/portland13
Networking is NOT Free: Lessons In Network DesignNetworking is NOT Free: Lessons In Network Design
Presenter Bio
2
Dan Sneddon
Member Technical Staff
Twitter: @dxs
• 20 years of network engineering and systems design
• Lead Global Network Engineer for Apple
• Network Security Architect for SLAC National Laboratory
• IT Architect for division of Schneider Electric
• Financial sector networking (banks and trading floors)
• Major startups, including Twitter
Our Journey Today
3
1. Datacenter Networking: Historical Perspective
2. Rise and Fall Of The VLANs
3. Networking At Cloud Scale
4. OpenStack Networking Models
5. Room For Improvement In OpenStack Networking
Datacenter Networks: Historical Perspective
Datacenter Networking Timeline
5
2000's1980's
2010+1990's
•Client/Server
•10 Mb Ethernet
•Token Ring
•Serial Cables
•100 Mb
•Switched Ethernet
•Bonded Interfaces
•Spanning-Tree
•1 Gb+ Servers
•10 Gb Uplinks
•VLANs
•Virtual Machines
•10 Gb+ Servers
•40/100 Gb Uplinks
•Virtual Networks
•SDN
6
1980’s: Shared Media and Serial
Token Ring
10 M Hub
10M Hub
Serial LinkUser
1990‘s: 100 Megabits Switched!
7
User
User
User
Database
Switch
2000’s: Rise Of the Gigabit VLANs!
8
VLAN 10
VLAN 20
VLAN 30
Etc...
Database
Server VLAN
Administration
Accounting
Everyone Else
2010’s: Everything Gets Simple!
9
User
Rise And FallOf the VLANS
Datacenter VLAN Segregation
11
VLAN 10 VLAN 20 VLAN 30
Layer 2/3 Boundary
VLAN Physical Separation
12
VLAN Pros and Cons
13
• Provide a level of isolation
• Reduction in size of broadcast domain
• Manageable, up to a certain size (especially with VTP, etc)
Pros:
• Each VLAN can only reach other VLANs through routers
• Spanning-tree (when it breaks, everything breaks)
• 4096 VLAN limit--assigning in blocks uses this up faster
Cons:
VLANs Only Scale So Far
• In the late 2000’s, high-density (1U) servers become standard
• There is no way to make spanned VLANs work for many thousands of servers
• A new model takes over: small layer 2 domains with layer 3 routing
Death Of the VLANs
14
Breaking Through The Scale Barrier
15
VLANs Only Scale So Far
VLAN Locally, Route Globally
Hierarchical Internetworking Model
16
Core
Distribution
Access
Hosts
Scale-Out Networking
Networking At the Scale of Cloud
Two Cloud Infrastructure Models
18
Legacy Apps
EnterpriseVirtualization
1
NewDynamic Apps
ElasticInfrastructure
2
Elastic Cloud vs.Enterprise Virtualization
19
Enterprise Virtualization Elastic Cloud
Applications Traditional & Legacy Dynamic
Scaling Architecture Managed Silos Horizontal
Technology Stack Heavy & Proprietary Distributed & Open
Price/Performance Low High (4-7x better)
Failure Domains Large Small
Provisioning Slower & Manual Faster & 100% API
Best For: Server consolidation and lower datacenter mgmt costs
On-demand, scale-out infrastructure for new apps
Classic OpenStack Networking, With That Old-Timey Feel
• Flat/Flat DHCP only support a single VLAN for everything
• VlanManager is the most feature-rich for multi-tenant
• VlanManager requires trunking all VLANs down to each host
• In a public cloud, max of 4096 VLANs limits tenants
Nova-Network
20
Flat Flat DHCP VlanManager FlatDHCP Multi-host HA4 Modes:
OCS Nova-Networking L3 Plugin
21
Cloudscaling Exclusive Solution
• Layer 3 networking for VMs, with DHCP and NAT service
• Each VM is on its own Linux bridge, no shared layer 2
• Quantum not required
• DHCP service is local to each compute host
• AWS-like: floating IPs, elastic netblocks, and now VPC
Brokerless Messaging With ZeroMQ
22
Avoiding RabbitMQ’s Single Point Of FailureNova-Compute
Nova-Scheduler Nova-API
RabbitMQBroker
RabbitMQ(Brokered)
Single Point Of Failure
Nova-Compute
Nova-Scheduler Nova-API
vs. ZeroMQ(Peer To Peer)
OpenStack Networking
APIs For All Your Networking Things
• “Quantum” is now known as “OpenStack Networking”
• Pluggable architecture, with APIs for all network functionality
• Basic L3 plugin (finally!), but designed for L3 on flat L2 network
• nova-network process still performs some very basic functions
• Some plugins are more complete/stable than others
OpenStack Networking
24
OpenStack Networking
25
Quantum DB
Quantum API Service
QuantumAgent(s)
RPC
SQL
Varies
REST
Horizon
REST over HTTP(S)
Nova(Quantum
Plugin)REST
Keystone
Ceilometer
REST
Notifi-cations
compute node
Hypervisor
Virtual Network Plugin
Provider Network Plugin
DHCP Agent
SDN Solution
Physical Hardware
Varies
Varies
OPENSTACKNETWORKSERVICE
OpenStack Networking Modes
26
• VLAN networks are supported using provider network plugins
• Layer 3 plugin
• GRE tunnel support using virtual network plugins
• May be used with Linux Namespaces to isolate tenants from one another within a hypervisor
• Many commercial vendor plugins
Quantum Compatibility
27
Lots Of Choices For Virtual Network/SDN Providers
•Open vSwitch. http://www.openvswitch.org/openstack/documentation
•Nicira NVP. quantum/plugins/nicira/nicira_nvp_plugin/README and http://www.nicira.com/support.
•Midokura. http://www.midokura.com/midonet/openstack/
•BigSwitch. http://www.bigswitch.com/sites/default/files/sdn_resources/openstack_aag.pdf
•Cisco. quantum/plugins/cisco/README and http://wiki.openstack.org/cisco-quantum
•Linux Bridge. quantum/plugins/linuxbridge/README and http://wiki.openstack.org/Quantum-Linux-Bridge-Plugin
•Ryu. quantum/plugins/ryu/README and http://www.osrg.net/ryu/using_with_openstack.html
•NEC OpenFlow. http://wiki.openstack.org/Quantum-NEC-OpenFlow-Plugin
Room For Improvement
29
Default Layer 3 Design
VLANs
OpenStack Networking Won’t Magically Configure Routing
* Diagram taken from OpenStack Networking official documentation
Gaps In Functionality
30
• VLAN networks are still problematic, Quantum doesn’t fix that
• Layer 3 network plugin still gets deployed on shared layer 2
• Dynamic routing protocols are not supported by L3 plugin
• Overlay networks are great, unless something goes wrong--GRE tunnels hard to troubleshoot, we need tooling, diagnostics
• Load-balancer-, firewall-, and VPN-as-a-service still in design phase, may not be production-ready until I or J release
How Can We Make Things Better?
31
• Further work needed on the “metaplugin” that allows more than one plugin simultaneously
• ZeroMQ support (there are known problems with DHCP, etc.)
• Better high-availability, including active-active DHCP
• Better support for custom tenant networks with overlapping IPs
There Are Plenty Of Ways To Contribute
32
Questions
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution** All unlicensed or borrowed works retain their original licenses
Networking is NOT Free: Lessons In Network Design
Dan Sneddon
Member Technical Staff
Twitter: @dxs
Download: http://engineering.cloudscaling.com/portland13