1 doctor fault management 18 may 2015 ryota mibu, nec

Post on 22-Dec-2015

222 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

DoctorFault Management

18 May 2015

Ryota Mibu, NEC

2

Doctor Overview

• One of OPNFV Requirement Project (Identify requirement, Gap Analysis, Implementation Study)

• Goal

– Build fault management and maintenance framework for high availability of Network Services on top of virtualized infrastructure

– Valuable and acceptable framework for other industries

• Status

– Initial Requirement study, architecture design, Gap analysis : Done (See Document [link])

– Collaborative Development: Started (Blueprints are proposed to Nova and Ceilometer)

– Standardization Sync: On-going (by NFV member efforts, joint meeting)

3

Use Case 1: Fault management

4

Use Case 2: Maintenance

5

High Level Architecture

Virtualized Infrastructure

Applications

VIM User and Administrator

Virtualized Infrastructure Manager (VIM)= OpenStack

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

6

Fault Management Sequence

Virtualized Infrastructure

Applications

VIM User and Administrator

Virtualized Infrastructure Manager (VIM)= OpenStack

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Detectio

n

Reaction

Doctor Initial Focus

8

Key Requirements as VIM

Immediate Notification

Consistent Resource State

Awareness

Extensible Monitoring

Fault Correlation

9

TO-BE: Functional Blocks

Virtualized Infrastructure

Applications

VIM User and Administrator

VIM

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Notifier

Monitor

Controller

Inspector

10

Fault Management Scenarios (1/2)

Monitor

Notifier

User-sideManager

Virtualized Infrastructure

Alarm

Conf.3. Update State2. Find Affected

Applications

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

4. (alt) Notify

Admin-side Manager

5. Notify Error

0. Set Alarm

6-. Action

Failure

Policy

MonitorMonitor

11

Fault Management Scenarios (2/2)

Monitor

Notifier

User-sideManager

Virtualized Infrastructure

Alarm

Conf.3. Update State2. Find Affected

Applications

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

4. (alt) Notify

Admin-side Manager

5. Notify Error

0. Set Alarm6-. Action

Failure

Policy

MonitorMonitor

12

AS-IS: OpenStack Kilo (1/3)

• How can you find faults as a tenant user?

– Keep-a-live check to each VM– Polling VM state to Nova API– Set alarm on metering service (e.g. CPU runtime)

13

AS-IS: OpenStack Kilo (2/3)

• How does the metering service work?

1. Resource controller such as Nova monitors usage of resource [Periodically]

2. Get samples from resource controller and register them to DB [Periodically]

3. Evaluate alarm definition on samples [Periodically]4. Raise alarm depend on result of the evaluation

Machine

Hypervisor

VM

Nova Ceilometer (Heat)

Samples

1.

2. 3

.

4.

14

AS-IS: OpenStack Kilo (3/3)

• Notification

– OpenStack components post events to messaging queue– Ceilometer collects, transform and publish those events which can be

used for billing

NFVI Neutron Ceilometer (Billing)

Samples

Nova

Cinder

Que

ue

15

Implementation Plan in OpenStack

15

Ceilomter

Virtualized Infrastructure

Applications

Zabbix

VIM User and Administrator

Error Injection

Plugin ?

Event Alarm

Immediate Notification

Queue

Inspector

Nova

16

Demo (1/3)

• User Scenario

Web Server

Web Server

Web Server

Load Balancer

HTTP ClientsHTTP

ClientsHTTP Clients

Public Net Private Net

Launch New VM

17

Demo (2/3)

• Demo 1

• Demo 2

Machine

Hypervisor

VM

Nova

Ceilometer (Heat)

Samples

1. Collect CPU time samples

2. Alarm Heat if CPU runtime = 0

3. Create New Web Server

1. Hook

3. Alarm Heat

Agent

Alarm

2. Notify as Event

Machine

Hypervisor

VM

Nova

Ceilometer (Heat)

Agent

Alarm

18

Demo (3/3) Results

• Demo 1

• Demo 2

90 sec

26 sec

19

Doctor Southbound API

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

ConfigurationFault Messaging

Unified Event API

Monitor

Monitor

Threshold

Enable

Enable

20

Case 1: Obvious Fault

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

ZabbixBMC(Inspecto

r)Nova

Ceilometer

User

ConfigurationFault Messaging

SNMP Trap(Power-off)

HTTP POST(Host A down)

HTTP POST(Host A down,

VM A1-A3 down)

HTTP POST(VM A1 down)

HTTP POST(Alert: VM A1 down)

HTTP POST(Create Alarm)

Enable

Enable

21

Case 2: Threshold Exceeded Fault (Admin Config)

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

Zabbix

Monitor Agent

(Inspector)

NovaCeilomet

erUser

ConfigurationFault Messaging

HTTP POST(Switch down) HTTP POST

(Host A down, VM A1-A3 down)

HTTP POST(VM A1 down)

HTTP POST(Alert: VM A1 down)

HTTP POST(Create Alarm)

Threshold

Enable

Enable

vSwitch

collectd

Admin Threshold

22

Backup

23

Fault Management Sequence (Optional)

Virtualized Infrastructure

Applications

VIM User and Administrator

Virtualized Infrastructure Manager (VIM)= OpenStack

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Auto Reaction

Detectio

n

Reaction

24

Fault Management Scenarios (Optional)

Monitor

Notifier

User-sideManager

Virtualized Infrastructure

Alarm

Conf.3. Update State2. Find Affected

Applications

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

4. (alt) Notify

Admin-side Manager

5. Notify Error

0. Set Alarm

6-. Action

Failure

Policy

Monitor

Auto Reaction

Monitor

25

Configuration / Policy Enforcement

25

UserNFVI

Conf.Polic

yInspector Notifier

Admin

Policy Service

Conf.

Monitor

ConfigurationFault Messaging

Option 1: Policy Service Integration

Option 2: Using Metadata in Controller

Metadata

Threshold

Enable

Metadata

Controller

PolicyThreshold

Enable

26

Case 3: Threshold Exceeded Fault (User Config)

26

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

Zabbix

Monitor Agent

(Inspector)

NovaCeilomet

erUser

ConfigurationFault Messaging

HTTP POST(Switch down) HTTP POST

(Host A down, VM A1-A3 down)

HTTP POST(VM A1 down)

HTTP POST(Alert: VM A1 down)

HTTP POST(Create Resource with Policy Label)

vSwitch

collectd

Admin

Policy Service

Enable

ThresholdEnable Threshold

Policy

CongressHTTP POST(Set Policy)

HTTP POST(Data)

Metadata

top related