openstack discovery and networking assurance - koren lev - meetup

33
self marketing slide coming next …

Upload: openstack-israel

Post on 21-Jan-2018

342 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

self marketing slide coming next …

Page 2: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

OpenStack Discovery and AssuranceKoren LevDC Operator, IT Developer, Entrepreneur, Dev/Ops manager etc…

• I’ve been using OpenStack since Diablo (~ 6 years)• I’ve been operating and supporting SP and ENT

deployments in Europe and the Middle East

Page 3: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

General observations and thoughts…

• I believe OpenStack infrastructure is not very easy to operate(post installation that is …)

• I believe it is a bit hard to maintain and troubleshoot

• Community’s focus on fulfilment (“make it work”), provisioning (“configure it”) and abstraction (“end users don’t care about the details”)- Therein lies the problem (IMHO)

• We neglected the Cloud operator operations needs (IMHO)

• According to Mirantis (example): running 5000 OpenStack nodes was failing mostly because of issues around Neutron

• I’ll use networking charter to illustrate it, the points made fits all charters

Page 4: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Thought backed up by some investigation

Page 5: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Controllers and Agents vs Workers/Plugins

• Most OS modules operates using controllers and agents.• Here is an example :

Controllers

Agents

Workers

APIs: for fulfilment and provisioning - abstracted

https://docs.openstack.org/developer/neutron/#neutron-stadium

Page 6: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Neutron controller data (current API):

“instance”

Very simple, abstracted, awesome for the cloud user …

“port” “network”

“router”

…and be assured:the network is active !

Page 7: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

The views of cloud operations team…

• Let’s say a ‘vm200’ instance on ‘network100’ can’t communicate (it happens…)• Troubleshooting with premium knowledge (good support personnel)• Assuming: Mirantis 8.0 (Liberty), Mechanism : OVS and LXB, Type: VXLAN• Assuming: only RegionOne• Assuming: you found the nova instance-to-host mapping (nova API)• Assuming: you found the nova instance-name-to-uuid mapping (nova API)

Since liberty *

Page 8: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• Running on host ‘node-6’ , OVS agent there, host and agent reachable.• We need more details before going down to the hosts level …

• DHCP server and a gateway/router running on this network, find out where:

The views of cloud operations team…

Page 9: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• More details are missing, available through MariaDB, not exposed in API (partial list):

• Is this really important data for troubleshooting ?

• Well…depends what’s wrong in the network ( if not being ‘active’ or ‘;-)’ )

Workers/plugins vendors are placing their details in MariaDB (no ops API)

The views of cloud operations team…

Page 10: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• So based on the findings so far, moving to hosts level (yes, MariaDB data is not enough !):

The views of cloud operations team…

Page 11: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• Ever wondered what’s going on in hypervisor interface list ? (partial list here):

The views of cloud operations team…

Page 12: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• Let’s skip vNIC model type details for now, move down to the linux bridge:

The instance representation of a network ‘port’ inside that specific hypervisor (assuming linux bridge plugin)

The bridge-side network ‘port’ inside that specific hypervisor (assuming linuxbridge plugin)

Thought : is it ‘active’ ?

The views of cloud operations team…

Page 13: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• Let’s skip monitoring details for now, move down to the OpenvSwitch:

The views of cloud operations team…

The ovs-side network ‘port’ inside that specific hypervisor (assuming ovs plugin)

The tunneling bridge inside OVS in-charge of isolation and segmentation

Tunneling used for this specific case (vxlan)

The integration bridge inside OVS in-charge of isolation and encapsulationThe ovs-side representation

of the instance ‘port’

Page 14: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• Now which communication is broken ? to which destinations ? depending on the answers, we can go across to the specific tunnel destinations.

• Let’s assume vm200 has no ip address assigned , so investigating the tunnel to node-6 (neutron-agent dhcp is over there, see slide 7):

The views of cloud operations team…

Node-1 192.168.2.1 as source and node-6 192.168.2.2 as destination(assuming in this example there is no routing needed from the source and destination of the tunnel)

Page 15: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• Finding the physical NICs used for the segmentaion/tunneling from node-1 to node-6:

The views of cloud operations team…

“br-mesh” bridge in this hypervisor is holding the ip for the vxlan-sys tunneling inside the ovs

“br-mesh” bridge in this hypervisor is connected through pNIC ens160, sub-interface 103 (vlan for the tunnel endpoint)

vi /etc/network/interfaces.d/ifcfg-ens160.103:

Page 16: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• Moving to node-1 for the L3, DHCP and Meta investigations :

The views of cloud operations team…

Find uuid of dhcp service running by that specific dhcp agent on that specific node

The dhcp server has this vNIC port connected down at node-1

Page 17: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• vServices vNIC interfaces connections on node-1 (dhcp - a quick summary):

The views of cloud operations team…

Page 18: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• vServices vNIC interfaces connections on node-1 (l3- a quick summary):

The views of cloud operations team…

Page 19: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• What if we change distribution/mechanism/types ? (guess what - different discovery/collection logic and different details per object), dpdk/fd.io example:

The views of cloud operations team…

Page 20: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• What if more then 1 VM ? What if HA ? What if DVR ?

The views of cloud operations team…

• Discovery x VMs x 2 , Discovery x 2 , Discovery x Hosts

• Post discovery you can start finding a fix …

Page 21: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Yes, we are a small team that spent the last year developing a possible offering to start solving the networking charter, focused on ‘Networking Operations API’ (see next).

..not a cure for cancer …but it’s pretty good, tested with real IT operations teams

We call it ‘Calipso’

Point made (!?) stop bitching…any solution ?

Possible Openstack attachments: ‘Monasca’, ‘Vitrage’ , ‘Ceilometer’, ‘Neutron’, ‘Tacker’Others: ‘Barometer’

Page 22: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

• OpenStack “Operations APIs” – let’s get started…

• Exposing up the needed details for the Cloud operations team

• To be developed for any module suffering from lack of workers/plugins visibility

Our ‘Networking Operations API’:

• Modeled for Multi distribution, any mechanism driver / type drivers variances

• Includes smart discovery logic, a visualization solution , monitoring, analysis

Proposition: a possible starting point

Visibility = Predictability = Stability

Page 23: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

CNA TNAMaintenance Troubleshooting

Inventory Discovery

Graph

MonitorFailure

DetectionFailure

AnalysisReport

Show connections, dependencies,

state and impactShow failure, root cause

Interfaces: API, DB, CLI forHypervisors/Containers

Discovery

OSDNA: Modules

Cloud Network Administrator

Tenant Network Administrator

Project ‘Calipso’

Page 24: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Calipso objects - examplesOSDNA Object Object Details Example 1 Example 2 Example 3

vService Services Overlay (virtual) DHCP (ip netns) L3 GW (ip netns) FWaaS

vNIC VMs NIC, Container CNI Instance/vService

vNIC

Tap to linux-bridge VPP Virtual-Ethernet

vConnector L2 inside a host(isolation) Linux Bridge VPP bridge-domain VMware Port-Group

vEdge Virtual to Physical Edge OVS VPP Midonet

pNIC / Bond Physical Underlay Fabric Edge Ports EPGs in ACI Servers Eth / Ether-

channels

Network Segment Virtual Segments (for any

tunneling overlay)

VLAN VXLAN Segment-ID GRE segments

OTEP Overlay Tunnel VXLAN Geneve GRE

OSDNA Views Details Example 1 Example 2 Example 3

Virtual Topology Modular links graph in

Calipso discovery

vService to Network Instance to Network All virtual2physical

per network

Policy Topology Data from the APP

Driving OpenStack

App VM to DB VM VNF to end-user VNF chaining

Page 25: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Calipso object model: adaptive, simple

Calipso

Page 26: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Environment A

Calipso DiscoveryLogic

APIDBCLI

Environment_Config AInitial scan logic

Environment_Config BInitial scan logic

APIDBCLI

Environment B

Environment_Config CInitial scan logic

APIDBCLI

Environment C

"name" : “MyENV3",

"host" : "10.56.20.239",

"port" : "5673",

"user" : "nova",

"password" : "YVWMiKMshZhlxxxxqFu5PdT9d"

},

{

“Mon" : "Monitoring3",

"type" : "Sensu",

"host" : "korlev-nsxe1.cisco.com",

"port" : "4567"

[removed]

],

"distribution" : "Mirantis-8.0",

"last_scanned:" : "5/8/16",

"name" : "Mirantis-Liberty",

"mechanism_drivers" : "OVS"],

"type_drivers" : "vxlan",

"operational" : "yes",

"type" : "environment"

Calipso hierarchical, modeled

Inventory:regionsProjects

HostsAggregates / zones

NetworksPorts

InstancesvNICs

vConnectorsvEdges

vServicespNICsOTEPsetc ..

Links and Relationships

Analysis:

Instance-vNICvNIC-vConnector

vConnector-vEdgevEdge-pNICpNIC-OTEP

OTEP-vConnectorvService-vNICNetwork-Port

etc …

Calipso Cliques and Topologies:(Cliques):

Focal_point_type (ex): instanceClique_type: [array of links]

RabbitMQCRUD events

Real time Updates

Environment_Listener AEvent-based scan logicEnvironment_Listener BEvent-based scan logicEnvironment_Listener CEvent-based scan logic

ObjectScan

SSH parsing caching

Page 27: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Environment ARegion X, Zone Y

Host 234

Calipso Monitoring

SensuServer Manager(conf by Calipso)

Calipso SensuChecks

Sensu Redis DBCalipso hierarchical,

modeledInventory:

regionsProjects

HostsAggregates / zones

NetworksPorts

InstancesvNICs

vConnectorsvEdges

vServicespNICsOTEPsetc ..

Real time Status and Statistics

OTEP

vNIC

pNIC

vEdge

Sensu Client Transport

(configured and deployed by

Calipso)

VPP stats/resultsvNIC stats/resultsLXB stats/resultsOTEPs stats/resultspNICs stats/resultsetc.. Checks are customized and modeled

Sensu API

Sensu UI

Calipso Sensu Handler Environment ACalipso Sensu Handler

Environment ACalipso Sensu Handler Environment C

Monitoring Configurator(Environment-aware)

Calipso BUS

Calipso porting to TSDB

Calipso DiscoveryLogic

Possibly contributing to OpenStack Health checks

Historical reporting

Page 28: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Calipso visualization: modeled for complex virtual topologies

Op

en S

tack

Cal

ipso

Dis

cove

ry

Connecting physical and virtual elements of cloud

networking

Cal

ipso

UI

Calipso Graph

Cloud Networking Assurance

Historical Trends , Root Cause , Impact Analysis

Cloud Network Administrator

Tenant Network Administrator

Virtual Network Elements, Dependencies, Status,Stats API Extensions for

discovery/assurance

Do

cker

ANY (*Open)Stack,ANY Plugin

Model-DrivenDiscovery

Engine

Inventory

Containers

Users:

Page 29: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

OpenStack Discover*Mongo

DB*Monitor* BUS*

External App

UI*API*

OS CRUD events

Scan 4 all Data (API, DB, CLI)

Scan (temp) Data Scan (temp) Data

Full Inventory Data

Environment Config(Init/Setup) Environment Config

(Init/Setup)

State/Statistics

Checks Results

Live Updates

Inventory, Topology Data

Full Topology Data

Run a Scan

Scan 4 some Data (API, DB, CLI)(scheduled)

Run a Scan

Some Inventory Data Some Topology Data

Inventory, Topology

Inventory, Topology

Analysis APP

Inventory, Topology

Monitoring Config(Init/Setup)Monitor Clients +

Checks Installation

Run a Scan

Messages/Updates

Setup MonitorSetup Monitor

Monitoring Config(Init/Setup)

State/Statistics

State. StatisticsState/Statistics

Messages / Notifications

APIDBCLI

RabbitMQ

SensuClients

SensuChecks

Messages / Notifications

UI ConfigUI Config

Environment Config(Init/Setup)

State. Statistics

Agent for ‘OperationsAPI’

* All Container-based today

Page 30: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Discovery logic successfully running on:

OVS, VLANs, GREs, VXLANs:• "Mirantis-6.0", "Mirantis-7.0", "Mirantis-8.0", "Mirantis-9.0",• "RDO-Mitaka", "RDO-Liberty", "RDO-Juno" • “Devstack-liberty", "Canonical-icehouse","Canonical-juno", • "Canonical-liberty", "Canonical-mitaka", • "Apex-Mitaka“ (3-o), "Devstack-Mitaka", • "packstack-7.0.0-0.10.dev1682“• "Stratoscale-v2.1.6", • "Mirantis-9.1",

VPP, VLANs:"RDO-Mitaka“, "Apex-Mitaka",

Pre QA: Midonet, vSphere (vSwitch)

If your variance is not on this list it means we didn’t test/validate

We’d appreciate your help in adapting to more variances !

Page 31: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Adapting to multi-environment cases !!

Page 32: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

OpenStackCalipso objects for

ContainersCalipso objects for Bare

Metal

Through API

Objects in Calipso Discovery CalipsoMonitoring

Region - ex: NYC, SJC

Host – ex: compute node

Project – ex: Coke

Port

Zone / Aggregate – ex: B16, Floor 2 etc …

Calipso objects for VMware vSphere

API – OpenStack API – Contiv , Docker API – Cisco UCS API – vSphereCalipso

Adapters

Through API

Through API

Through API

Custom SensuChecks

N/A

Server

Tenant

Container veth

Cluster

N/A

N/A

N/A

NIC

N/A

DataCenter Cluster

Server

Tenant

Port-group

DataCenter

NetworkCustom Sensu

ChecksNetwork Network Network

Page 33: OpenStack Discovery and Networking Assurance - Koren Lev - Meetup

Calipso objects for OpenStack

Calipso objects for Containers

Calipso objects for Bare Metal

Through API

Objects in Calipso Discovery CalipsoMonitoring

Instance / vService – ex: a VM, a DHCP srv

pNIC – ex : TengigEth

vConnector – ex: Bridge

vEdge – ex: OVS, fd.io etc

OTEP – ex: VXLAN, GRE

vNIC / Port

Network / Network Segment

Container

pNIC

Bridge, BDomain

OVS, fd.io

VXLAN

Container veth, CNI

Network / Network Segment

A Server

pNIC

N/A

N/A

N/A

N/A

Network / Network Segment

Calipso objects for VMware vSphere

API – OpenStackDB – MySQL

CLI – Linux Bash / SSH

API – Contiv , DockerDB – ETCD

CLI – Linux Bash / SSH / Docker

API – Cisco UCSDB –

CLI – OS specific / SSH

API – vSphere,DB – N/ACLI – ESXi

VM

pNIC

Port-group

vSwitch / NSX switch

VXLAN

vNIC

Network

CalipsoAdapters

Custom SensuChecks

Custom SensuChecks

Custom SensuChecks

Custom SensuChecks

Custom SensuChecks

Custom SensuChecks