2014 openstack summit - neutron ovs to linuxbridge migration
DESCRIPTION
Presentation titled 'Migrating production workloads from OVS to LinuxBridge'. Presented at the Fall 2014 OpenStack summit in Paris, this slide deck introduced the possibility of migrating live workloads from Open vSwitch to LinuxBridge with minimal downtime.TRANSCRIPT
Migrating production workloads from OVS to LinuxBridge
Kevin Stevens ([email protected])
James Denton ([email protected])
Kevin Stevens
•RPC Engineer since 2012 (Essex)
• IRC: k_stev on irc.freenode.net
We’re operators!
James Denton
• RPC Network Engineer since 2012 (Essex)
• IRC: busterswt on irc.freenode.net
What are we doing here?
1
A history of networking in
Rackspace Private Cloud
Our experiences with Open vSwitch
Swapping out OVS
with LinuxBridge
What to expect with
each
2 3 4
2011Started building OpenStack-powered private clouds
2012
Began architecting, building and supporting private clouds in customer DCs
2013
Over 100 customers running RackspacePrivate Clouds
2014Released RPC v9 based on Icehouse. 99.99% API uptime SLA.
RPC
v2.0/3.0
RPC
v4.0/4.1RPC v4.2 RPC v9.0
OpenStack
ReleaseFolsom Grizzly Havana Icehouse
Network Stack nova-network Quantum Neutron Neutron
L2
ConnectivityflatDHCP Open vSwitch Open vSwitch
LinuxBridge
(ML2)
L3 Agent
SupportN/A No Yes Yes
Host OSUbuntu 12.04
LTS
Ubuntu 12.04
LTS
CentOS 6.5
RHEL 6.5
Ubuntu 12.04
LTS
Ubuntu 14.04
LTS
The Evolution of Networking in RPC
Why Neutron?
Why Neutron w/ Open vSwitch?
•Open vSwitch pushed
by community
•Open vSwitch pushed
by packagers
•Wanted overlay
networks
•Kernel panics (1.10)
•ovs-vswitchd segfaults
(1.11)
•Broadcast storms
•Data corruption (2.01)
The problems
Why Linux Bridge?
•Looking for reliability and stability
•Less moving parts
•Easier to troubleshoot
•Supported by the community
Why move to LinuxBridge?
• Flexibility provided by overlay networking
(if not using vxlan)
•Neutron Distributed Virtual Routers (Juno)
•Any customizability provided by OVS not implemented by Neutron itself
www.rackspace.com 12
What do we lose by moving?
Planning
•Snapshot and delete all instances
•Delete all networks
•Change from OVS -> LB
•Recreate all networks
•Boot instances
•…
•It works but…
Plan A: Scorch the earth!
But wait… these are production environments!
•Deploy LinuxBridge environment
•Snapshot all instances
•Import images into new
environment
•Build new instances
•Cutover
•…
•It works, but… $$$
Plan B: Migration Environment
•Stop services
•Update the database
•Change the configuration from OVS -> LB
•Restart services
•…
•Profit!
Plan C: Switch it out!
• Neutron OVS DB schema != Neutron LinuxBridge DB schema
–Migration to OVS ML2 DB schema is required first
• Overlay networks may not supported
– LinuxBridge uses VXLAN rather than GRE
–Requires kernel >= 3.9
• Means GRE networks must be converted to VLAN networks
–Didn’t want to introduce additional complexity
–VLANs easier to troubleshoot if something went wrong
Issues with migrating
The Process
•Determine what’s needed:
–Dependencies
–Some method of converting database to ML2 schema
–Some method of converting data to LB from OVS
–Which configuration files need mangling
–Which services need disabling
–Which services need restarting
–Roll-back plan
Preparation
•Can instances gain a DHCP lease?
•Do instances have internal/external connectivity?
•Are security groups/other functions still operational?
•Were instances placed into the correct bridge?
•Will the changes survive a reboot?
Define a successful outcome
Normal OVS Operation (Network Node)
Normal OVS Operation (Compute Node)
• Backup! Backup! Backup!
•Use migrate_to_ml2.py (modified) to change the DB schema
•Update segments, ports and vlan tables
–Change GRE to VLAN
–Change segmentation id to real VLAN ID
–Set a provider bridge
First steps: Database manipulation
• Install the LinuxBridge plugin
•Update SQL connection strings
• Configure ml2_conf.ini / linuxbridge_conf.ini
• Change driver from OVS to ML2 in Neutron and Nova conf files
Next steps: Install and Configure
• Stop Neutron services on all nodes
• Remove host data-plane port from the OVS bridge(s)
• Pull instance taps out of the OVS-related linux bridges
• Remove router and dhcp interfaces from OVS integration bridge
• Stop Openvswitch
Next steps: Pull ports from bridges
Interfaces removed from bridges
Stop openvswitch services
• Start Neutron services
• Restart compute services
Finally: Restart services
Post Service Restart (Network Node)
Post Service Restart (Compute Node)
•Instances unresponsive?–Check traffic from tap->bridge->physical interface
–Verify VLANs properly trunked through (and VLANs created on the switch)
Failure Scenarios
•IPs disappear or taps placed in QBR bridges–Check Nova instance_info_caches table.
–Cache can be regenerated with a hard reboot of instance, or by adding an interface to the instance
Failure Scenarios (Cont’d)
•Unable to boot new instances?
– Usual troubleshooting techniques should be used
•DHCP Binding_failed error messages?
– Check /etc/default/neutron-server is referencing ML2 configuration file
•BRQ bridges not built?
– Verify New agents checking in?
– Verify the LinuxBridge agent is installed and running
Failure Scenarios (Cont’d Cont’d)
Benchmarks
* Host-to-host testing; no virtualization. Longer is better.
Compare all the things
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00
1
2
4
8
Aggregate Throughput (Gbps)
# ofThreads
iPerf3 Benchmarks (TCP / 1500 MTU / 10G Data) – Intel X520* (ixgbe driver)
Open vSwitch(VXLAN)
LinuxBridge(VXLAN)
Open vSwitch(GRE)
Open vSwitch(VLAN)
LinuxBridge(VLAN)
* Host-to-host testing; no virtualization. Longer is better.
Compare all the things
59.75 Seconds
61.50 Seconds
110.50 Seconds
104.00 Seconds
115.00 Seconds
0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00
LB VLAN
OVS VLAN
OVS GRE
LB VXLAN
OVS VXLAN
Transfer Speed (MBps)
SCP File Transfers (10G file)*
* Host-to-host testing; no virtualization. Longer is better.
Compare all the things
59.75 Seconds
61.50 Seconds
110.50 Seconds
104.00 Seconds
115.00 Seconds
0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00
LB VLAN
OVS VLAN
OVS GRE
LB VXLAN
OVS VXLAN
Transfer Speed (MBps)
SCP File Transfers (10G file)*
* Host-to-host testing; no virtualization. Longer is better.
Compare all the things
59.75 Seconds
61.50 Seconds
110.50 Seconds
104.00 Seconds
115.00 Seconds
0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00
LB VLAN
OVS VLAN
OVS GRE
LB VXLAN
OVS VXLAN
Transfer Speed (MBps)
SCP File Transfers (10G file)*
• OVS provides a great deal of functionality
• Network stability more important for our customers than being on the cutting edge
• Linux bridge provides almost all of the features we might want to use
• How to migrate existing environments to LinuxBridge
• Improved stability and comparable performance with OVS achieved
www.rackspace.com 41
In Summary
Questions?
Download @https://github.com/busterswt/openstackparis2014