2014 openstack summit - neutron ovs to linuxbridge migration

Post on 21-Apr-2017

5.808 Views

Category:

Internet

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Migrating production workloads from OVS to LinuxBridgeKevin Stevens (kevin.stevens@rackspace.com)James Denton (james.denton@rackspace.com)

Kevin Stevens• RPC Engineer since 2012 (Essex)

• IRC: k_stev on irc.freenode.net

We’re operators!

James Denton• RPC Network Engineer since 2012 (Essex)

• IRC: busterswt on irc.freenode.net

What are we doing here?

1A history of networking

in Rackspace

Private Cloud

Our experience

s with Open

vSwitch

Swapping out OVS

with LinuxBridg

e

What to expect with

each

2 3 4

2011Started building OpenStack-powered private clouds

2012Began architecting, building and supporting private clouds in customer DCs

2013Over 100 customers running RackspacePrivate Clouds

2014Released RPC v9 based on Icehouse. 99.99% API uptime SLA.

RPC v2.0/3.0

RPC v4.0/4.1 RPC v4.2 RPC v9.0

OpenStack Release Folsom Grizzly Havana Icehouse

Network Stack nova-network Quantum Neutron Neutron

L2 Connectivity flatDHCP Open vSwitch Open vSwitch LinuxBridge

(ML2)L3 Agent Support N/A No Yes Yes

Host OS Ubuntu 12.04 LTS

Ubuntu 12.04 LTS

CentOS 6.5RHEL 6.5

Ubuntu 12.04 LTS

Ubuntu 14.04 LTS

The Evolution of Networking in RPC

Why Neutron?

Why Neutron w/ Open vSwitch?

•Open vSwitch pushed by community

•Open vSwitch pushed by packagers

•Wanted overlay networks

“If it dies. It dies.”- Ivan Drago, OpenStack Operator

•Kernel panics (1.10)

•ovs-vswitchd segfaults (1.11)

•Broadcast storms

•Data corruption (2.01)

The problems

Why Linux Bridge?

•Looking for reliability and stability

•Less moving parts

•Easier to troubleshoot

•Supported by the community

Why move to LinuxBridge?

12

• Flexibility provided by overlay networking (if not using vxlan)

• Neutron Distributed Virtual Routers (Juno)

• Any customizability provided by OVS not implemented by Neutron itself

www.rackspace.com

What do we lose by moving?

Planning

•Snapshot and delete all instances

•Delete all networks•Change from OVS -> LB•Recreate all networks•Boot instances•…• It works but…

Plan A: Scorch the earth!

But wait… these are production environments!These are production environments!

•Deploy LinuxBridge environment•Snapshot all instances• Import images into new environment•Build new instances•Cutover•…• It works, but… $$$

Plan B: Migration Environment

•Stop services•Update the database•Change the configuration from OVS -> LB

•Restart services•…•Profit!

Plan C: Switch it out!

• Neutron OVS DB schema != Neutron LinuxBridge DB schema– Migration to OVS ML2 DB schema is required first

• Overlay networks may not supported – LinuxBridge uses VXLAN rather than GRE– Requires kernel >= 3.9

• Means GRE networks must be converted to VLAN networks– Didn’t want to introduce additional complexity– VLANs easier to troubleshoot if something went wrong

Issues with migrating

The Process

•Determine what’s needed:–Dependencies–Some method of converting database to ML2 schema

–Some method of converting data to LB from OVS–Which configuration files need mangling–Which services need disabling–Which services need restarting–Roll-back plan

Preparation

•Can instances gain a DHCP lease?

•Do instances have internal/external connectivity?

•Are security groups/other functions still operational?

•Were instances placed into the correct bridge?

•Will the changes survive a reboot?

Define a successful outcome

Normal OVS Operation (Network Node)

Normal OVS Operation (Compute Node)

• Backup! Backup! Backup!

• Use migrate_to_ml2.py (modified) to change the DB schema

• Update segments, ports and vlan tables–Change GRE to VLAN–Change segmentation id to real VLAN ID–Set a provider bridge

First steps: Database manipulation

• Install the LinuxBridge plugin

• Update SQL connection strings

• Configure ml2_conf.ini / linuxbridge_conf.ini

• Change driver from OVS to ML2 in Neutron and Nova conf files

Next steps: Install and Configure

• Stop Neutron services on all nodes

• Remove host data-plane port from the OVS bridge(s)

• Pull instance taps out of the OVS-related linux bridges

• Remove router and dhcp interfaces from OVS integration bridge

• Stop Openvswitch

Next steps: Pull ports from bridges

Interfaces removed from bridges

Stop openvswitch services

• Start Neutron services

• Restart compute services

Finally: Restart services

Post Service Restart (Network Node)

Post Service Restart (Compute Node)

• Instances unresponsive?–Check traffic from tap->bridge->physical interface–Verify VLANs properly trunked through (and VLANs

created on the switch)

Failure Scenarios

• IPs disappear or taps placed in QBR bridges–Check Nova instance_info_caches table.–Cache can be regenerated with a hard reboot of

instance, or by adding an interface to the instance

Failure Scenarios (Cont’d)

• Unable to boot new instances?– Usual troubleshooting techniques should be used

• DHCP Binding_failed error messages?– Check /etc/default/neutron-server is referencing ML2

configuration file

• BRQ bridges not built?– Verify New agents checking in?– Verify the LinuxBridge agent is installed and running

Failure Scenarios (Cont’d Cont’d)

Benchmarks

* Host-to-host testing; no virtualization. Longer is better.

Compare all the things

1

2

4

8

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00

iPerf3 Benchmarks (TCP / 1500 MTU / 10G Data) – Intel X520* (ixgbe driver) Open vSwitch

(VXLAN)

LinuxBridge (VXLAN)

Open vSwitch (GRE)

Open vSwitch (VLAN)

LinuxBridge (VLAN)

Aggregate Throughput (Gbps)

# ofThreads

* Host-to-host testing; no virtualization. Longer is better.

Compare all the things

LB VLAN

OVS VLAN

OVS GRE

LB VXLAN

OVS VXLAN

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00

59.75 Seconds

61.50 Seconds

110.50 Seconds

104.00 Seconds

115.00 Seconds

SCP File Transfers (10G file)*

Transfer Speed (MBps)

* Host-to-host testing; no virtualization. Longer is better.

Compare all the things

LB VLAN

OVS VLAN

OVS GRE

LB VXLAN

OVS VXLAN

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00

59.75 Seconds

61.50 Seconds

110.50 Seconds

104.00 Seconds

115.00 Seconds

SCP File Transfers (10G file)*

Transfer Speed (MBps)

* Host-to-host testing; no virtualization. Longer is better.

Compare all the things

LB VLAN

OVS VLAN

OVS GRE

LB VXLAN

OVS VXLAN

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00

59.75 Seconds

61.50 Seconds

110.50 Seconds

104.00 Seconds

115.00 Seconds

SCP File Transfers (10G file)*

Transfer Speed (MBps)

41

• OVS provides a great deal of functionality

• Network stability more important for our customers than being on the cutting edge

• Linux bridge provides almost all of the features we might want to use

• How to migrate existing environments to LinuxBridge

• Improved stability and comparable performance with OVS achieved

www.rackspace.com

In Summary

“OpenStack is hard.”- Albert Einstein, Original Cloud Architect

Questions?

Download @https://github.com/busterswt/openstackparis2014

top related