ryota mibu, nec carlos gonçalves, nec dpdk, collectd and

50
DPDK, Collectd and Ceilometer The missing link between my telco cloud and the NFV infrastructure Maryam Tahhan, Intel Emma Foley, Intel Carlos Gonçalves, NEC Ryota Mibu, NEC

Upload: others

Post on 19-Mar-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

DPDK, Collectd and CeilometerThe missing link between my telco cloud and the NFV infrastructure

Maryam Tahhan, IntelEmma Foley, Intel

Carlos Gonçalves, NECRyota Mibu, NEC

Page 2: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Do you know what’s happening in your cloud?

Page 3: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Do you know what’s happening in your cloud?

Page 4: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Do you know what’s happening in your cloud?

Page 5: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

IntroductionMedium-/large-scale cloud environments account for between hundreds and hundreds of thousands of infrastructure systems.

As the size of infrastructure, traffic and virtual resources grow, so does the effort of monitoring back-ends.

Data growth is exploding across the network. There are more users and devices are connected to the network than ever before. The underlying expectation remains that services are available whenever and wherever they are needed; and that those services meet a level of quality that is acceptable to the end user.

It is vital to monitor systems for malfunctions that could lead to users’ application service disruption and promptly react to these fault events to facilitate improving overall system performance.

A key part of this includes expanding the amount of data available about the system (e.g. DPDK statistics), and improving alarming functionality in OpenStack Aodh.

Page 6: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

The Corner Stone

Telemetry is the cornerstone for:

Billing

Benchmarking

Intelligent orchestration

Fault management

Page 7: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Use case ExampleCompute node DPDK interface monitoring for Host link status and switches from active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With DPDK OVS

0

ceilometer aodh1 2

3

Page 8: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Compute node DPDK interface monitoring for Host link status and switches from active to standby service when the link goes down.

Use case Example

ControllerCompute Node 1 Compute Node 2

collectd

OVS With DPDK OVS

0

ceilometer aodh1 2

3

localhost-port.0-link_status != 0

Page 9: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Use case ExampleCompute node DPDK interface monitoring for Host link status and switches from active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With DPDK OVS

0

ceilometer aodh1 2

localhost-port.0-link_status != 0

3

Page 10: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Use case ExampleCompute node DPDK interface monitoring for Host link status and switches from active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With DPDK OVS

ceilometer aodhVNF VNF

VNF = Active

VNF = Standby

Page 11: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Use case ExampleCompute node DPDK interface monitoring for Host link status and switches from active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With DPDK OVS

ceilometer aodhVNF VNF

localhost-port.0-link_status != 0

VNF = Active

VNF = Standby

Page 12: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Use case ExampleCompute node DPDK interface monitoring for Host link status and switches from active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With DPDK OVS

ceilometer aodhVNF VNF

localhost-port.0-link_status != 0

X VNF = Active

VNF = Standby

Page 13: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Use case ExampleCompute node DPDK interface monitoring for Host link status and switches from active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With DPDK OVS

ceilometer aodhVNF VNF

localhost-port.0-link_status == 0

X VNF = Active

VNF = Standby

Page 14: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Use case ExampleCompute node DPDK interface monitoring for Host link status and switches from active to standby service when the link goes down.

ControllerCompute Node 1 Compute Node 2

collectd

OVS With DPDK OVS

ceilometer aodhVNF VNF

localhost-port.0-link_status == 0

X VNF = Active

VNF = Standby

Page 15: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Use case ExampleCompute node DPDK interface monitoring for Host link status and switches from active to standby service when the link goes down.

ControllerCompute Node 1(out of service)

Compute Node 2

collectd

OVS With DPDK OVS

ceilometer aodhVNF VNF

X VNF = Active

Page 16: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

FYI

Page 17: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

•••

••

The ability to measure and enforce Telco KPIs in the data-plane will be mandatory for any Telco grade NFVI implementation.

Page 18: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

•••

••

The ability to measure and enforce Telco KPIs in the data-plane will be mandatory for any Telco grade NFVI implementation.

Page 19: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Collecting DPDK Interface Statistics with collectd

SFQM Plug-Ins

Dpdkstat PluginCeilometer plugin

Provided Functionality

Example

Plug-InsConsumer/OpenStack

Platform

collectd

NB to MANO/VNFM

SFQM Plug-Ins

Provided Functionality

Example

DPDK Application

SFQM Features

Extended statistics API

Page 20: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Collecting DPDK Interface Statistics with collectd

SFQM Plug-Ins

collectddpdkstat

Plugincollectd

1. Read

2. Get stats

3. Dispatch Values

OVS With DPDK

Provided Functionality

Example

RX TX

collectd Ceilometer

Plugin

VM

Ceilometer

collectd

RX TX

5. Post Values4. Pass Values

Plug-InsConsumer/OpenStack

Platform

collectd

NB to MANO/VNFM

SFQM Plug-Ins

Provided Functionality

Example

DPDK Application

SFQM Features

Page 21: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Collectd - dpdkstat

Diagram courtesy of Harry Van Haaren

Page 22: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Collectd - dpdkstat

Diagram courtesy of Harry Van Haaren

Page 23: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Ceilometer collectd plugin

Page 24: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Future workTaking advantage of the notification plugin architecture in collectd to post an event (like link status failure or application thread failure) directly to the notification bus for immediate alarming in Aodh.

Integrating DPDK Keep Alive with collectd and enabling it to take advantage of the notification plugin to Aodh.

Performance, scalability and aggregation analysis.

Gnocchi integration

Open vSwitch (with DPDK) flow and port stats collectd plugin.

Page 25: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Project in OPNFV working on building an open-source NFVI fault management and maintenance framework to ensure Telco VNFs availability in fault and maintenance events

● Identify requirements● Gap analysis● Implementation work in upstream● Integration and testing

Consistent Resource State Awareness

Immediate Notification

Fault CorrelationExtensible Monitoring

Doctor

Page 26: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Doctor: fault management use case

Page 27: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Doctor: mapping to the OpenStack ecosystem

Page 28: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Doctor: focus of initial contributions Consistent Resource State Awareness

Immediate Notification

Page 29: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Doctor: focus of initial contributions Immediate Notification

Consistent Resource State Awareness

Page 30: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Immediate event alarming

Page 31: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

State correction

Page 32: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

From project creation to Brahmaputra release

Page 33: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Doctor blueprints in OpenStack Liberty

Page 34: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Doctor: focus of initial contributions Consistent Resource State Awareness

Immediate Notification

Page 35: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Doctor: extending contribution focus Consistent Resource State Awareness

Immediate Notification

Fault CorrelationExtensible Monitoring

Page 36: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Doctor blueprints in OpenStack

Page 37: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

SFQM + Doctor

Ceilometer

collectddpdkstat

Plugincollectd

1. Read

2. Get stats

3. Dispatch Values

OVS With DPDK

RX TX

collectd Ceilometer

Plugin

VM collectd

RX TX

5. Post Values4. Pass Values

Page 38: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Demo

Page 39: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

What will be demonstrated- Link status check (DPDK)- Fault detection propagation (collectd-ceilometer-plugin)- Resource state correction (Nova)- Alarm the OpenStack admin/user (Aodh)- Active-Standby service switching (User)

Page 40: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and
Page 41: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Summary“Trying to manage a complex cloud solution without a proper telemetry infrastructure in place is like trying to walk across a busy highway with blind eyes and deft ears. You have little to no idea of where the issues can come from, and no chances to take any smart move without getting in trouble”. [1]

Doctor

Painting the pedestrian crossing

Page 42: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

References

[1] https://azure.microsoft.com/en-us/blog/cloud-service-fundamentals-telemetry-basics-and-troubleshooting/

Page 43: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and
Page 44: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Backup

Page 45: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

•••

The ability to measure and enforce Telco KPIs in the data-plane will be mandatory for any Telco grade NFVI implementation.

Page 46: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

SFQM FeaturesDPDK 2.0 DPDK 2.1 DPDK 2.2

• Callback API• RX/TX timestamping

sample app.

• Extended NIC stats for ixgbe.

• proc_info

• Extended NIC stats for igb, i40e and VFs.

• Extended NIC API Alignment across drivers.

• DPDK KeepAlive

DPDK 16.04

• Primary process liveliness check.

• dpdkstat plugin – pull request

• Ceilometer plugin.

Collectd

• Retrieve statistics directly from NIC registers.

• Detect DPDK application thread failure.

• Measure packet latency through DPDK

• Detect DPDK primary process liveliness.

• Relay DPDK interface stats and status back to ceilometer.

Page 47: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Statistics we can relay to Ceilometer

collectd-ceilometer plugin

Any of the existing read plugin stats:

● Cache_size, ceph_latency, ceph_rate, cpu count, cpufreq, fan speed, fork_rate, if_collisions, if_dropped, if_errors, virt_cpu_total ...

● Full list of plugins can be found here● Full data set spec can be found here

collectd-dpdkstat plugin

● Any of the hardware NIC statistics registers for drivers that support the extended NIC stats API.

● Any of the generic SW statistics registers

● DPDK Interface Link Status.● rx_good_packets, rx_crc_errors,

tx_errors, mac_local_errors, mac_remote_errors, rx_priority0_dropped….

● Data set spec for the ixgbe driver can be found here.

Page 48: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Collectd architecture

Page 49: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and

Demo

Page 50: Ryota Mibu, NEC Carlos Gonçalves, NEC DPDK, Collectd and