high availability in neutron · high availability in neutron sylvain afchain software engineer...

28
High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014

Upload: others

Post on 03-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

High Availability in Neutron

Sylvain AfchainSoftware Engineer

Assaf MullerSoftware Engineer

Getting the L3 Agent Right

November 2014

Page 2: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

SITUATION IN ICEHOUSE

Page 3: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN
Page 4: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

Single Point of Failure

R2

L3 Agent 1 L3 Agent 2

R1

External Network

Tenant Network

VM VM

Page 5: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

L3 Agent Healthcheck

● Developed by eNovance

● Open-Source● Works and tested

on Grizzly, Havana and Icehouse

R2

Controller L3 Agent 1 L3 Agent 2

L3 Agent State

Healthcheck Healthcheck

R1

RPC

Page 6: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

L3 Agent Healthcheck

● Developed by eNovance

● Open-Source● Works and tested

on Grizzly, Havana and Icehouse

R2

Controller L3 Agent 1 L3 Agent 2

L3 Agent State

Healthcheck Healthcheck

R1

RPC

Page 7: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

L3 Agent Healthcheck

ConsPros

● Does not affect deployment

● Remove a node if isolated● Distributed service● Works since Grizzly● Lightweight

● Still no full HA● Not stateful● Long downtime● Out of tree● Still not the right way

Page 8: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

SITUATION IN JUNO

Page 9: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

Controller Rescheduling

Controller L3 Agent 1 L3 Agent 2

Rescheduler Loop R1

allow_automatic_l3agent_failover = True

Authored by Kevin Benton

R2

Page 10: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

Controller Rescheduling

Controller L3 Agent 1 L3 Agent 2

Rescheduler Loop

R1

allow_automatic_l3agent_failover = True

Authored by Kevin Benton

R2

Page 11: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

Another Approach: Keepalived

Features

● Configuration determines default, admin can overrule

● Works within tenant networks

● Plays nice with: FWaaS, VPNaaS, LBaaS

● Failover independent from RPC layer

● Routers in active / passive● Floating IP in active /

passive● All L3 agents are active● Uses keepalived (VRRP)● Rough failover time: c + 2/29 * n

Page 12: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

VRRP: Pre-emptive Router Scheduling

R1 R1

R2R2

R3

R3

L3 Agent 1 L3 Agent 2 L3 Agent 3

SLAVE

MASTER

Page 13: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

VRRP: HA Networks & VRID

R1 - VRID 1 R1 - VRID 1

R2 - VRID 2R2 - VRID 2

R3 - VRID 1

R3 - VRID 1

L3 Agent 2 L3 Agent 3

R1, R2 - Tenant A

R3 - Tenant B

L3 Agent 1

Page 14: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

VRRP: Implementation

L3 AgentNeutron Server

● Per tenant network to accommodate VRRP traffic

● New virtual router ID attribute uniquely identifies router clusters

● New keepalived manager● IPs as VIPs, only present

on the master instance

Page 15: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

VRRP: Implementation

R1

L3 Agent 1

QG

HA

QR

169.254.192.1/18KEEPALIVED

VIPVIP

VIP

R1

L3 Agent 2

QG

HA

QR

169.254.192.2/18 KEEPALIVED

Page 16: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

VRRP: CLI/API

neutron net-list

As an admin :

+---------------------+----------------------------------------------------------+| id | name | subnets |+---------------------+-------------------+--------------------------------------+| 27b2e2b7-22c0-4f74- | private | f472d8c8-1dd0-4272- 10.0.0.0/24 || be1a07de-9d7b-4823- | public | 7f3d69a6-bd50-4cd5- 172.24.4.0/24 || 62917a72-0576-422e- | HA network tenant | 12f466f4-6c51-4726- 169.254.192.0/18 |+---------------------+-------------------+--------------------------------------+

As a user :

+---------------------+-------------------------------------------------------+| id | name | subnets |+---------------------+-------------------+-----------------------------------+| 27b2e2b7-22c0-4f74- | private | f472d8c8-1dd0-4272- 10.0.0.0/24 || be1a07de-9d7b-4823- | public | 7f3d69a6-bd50-4cd5- 172.24.4.0/24 |+--------------------------------------+--------------------------------------+

Page 17: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

VRRP: CLI/API

neutron port-list

As an admin :

+-----------+---------+-----------------------------------------------------------+| id | name | fixed_ips |+-----------+---------+-----------------------------------------------------------+| 3e7268c6- | HA port | {"subnet_id": "12f466f4-", "ip_address": "169.254.192.2"} || d87a6c9c- | HA port | {"subnet_id": "12f466f4-", "ip_address": "169.254.192.1"} || 5105bd78- | | {"subnet_id": "f472d8c8-", "ip_address": "10.0.0.1"} |+-----------+---------+-----------------------------------------------------------+

As a user :

+-----------+------+------------------------------------------------------+| id | name | fixed_ips | +-----------+------+------------------------------------------------------+| 5105bd78- | | {"subnet_id": "f472d8c8-", "ip_address": "10.0.0.1"} |+-----------+------+------------------------------------------------------+

Page 18: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

VRRP: CLI/API

Where are my HA router instances?

neutron l3-agent-list-hosting-router router1

+--------------------------------------+--------+----------------+-------+| id | host | admin_state_up | alive |+--------------------------------------+--------+----------------+-------+| 0dad7203-cba8-4b79-bec3-16ddf55d6a5a | ops-1 | True | :-) || 7d7afb99-b522-442a-8acd-f1548a1dea19 | ops-2 | True | :-) |+--------------------------------------+--------+----------------+-------+

Page 20: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

VRRP : Future Work

Improvements

● Administration:● Where is the master?

Can I move it?● Log state transitions

● Conntrackd for stateful SNAT traffic

● Migrate legacy routers to HA

● In Juno, a router may be distributed or HA, but not both

Page 21: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

VRRP: Limitations

Limitations

● 255 virtual routers per HA network thus per tenant

● Can be removed by allowing more than one HA network per tenant

● East-west traffic could be improved, no need to go through a L3 node

● North-south traffic could be improved as well, especially for floating IPs

Improvements

Page 22: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

Another Approach: DVR

Features

● Distributes virtual routers on all compute nodes

● No more east-west traffic through L3 node

● No single point of failure

● Floating IPs hosted by the compute node hosting the VM

● Some service could be distributed as well, like FWaaS

● SNAT support through a L3 node

Page 23: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

Without DVR

R1

VM1 VM2 VM3 VM4

L3 Agent

Page 24: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

East-West with DVR

VM1 VM2

VM3 VM4

DVR DVR

Page 25: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

Floating IPs with DVR

VM1 VM2

VM3 VM4

DVR + NAT DVR + NAT

FLOATING-IPFLOATING-IP

Page 26: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

SNAT with DVR

VM1 VM2

VM3 VM4

DVR + NAT

FLOATING-IP

VRRP + SNAT

DVR + NAT

Page 27: High Availability in Neutron · High Availability in Neutron Sylvain Afchain Software Engineer Assaf Muller Software Engineer Getting the L3 Agent Right November 2014. SITUATION IN

Healthcheck Rescheduling L3 HA DVR

Advantages Mature In-tree, simple Failover is quick and indifferent to management plane

Removes bottleneck from network node

Release Grizzly Juno Juno Juno

Segmentation Technology

All All All Tunneling + L2pop

Topology Extra agent to install on network nodes

Enable configuration option

Enable configuration option

Compute nodes connected to external network(s)

Summary