1 doctor fault management 18 may 2015 ryota mibu, nec

25
1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

Upload: jayson-hart

Post on 22-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

1

DoctorFault Management

18 May 2015

Ryota Mibu, NEC

Page 2: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

2

Doctor Overview

• One of OPNFV Requirement Project (Identify requirement, Gap Analysis, Implementation Study)

• Goal

– Build fault management and maintenance framework for high availability of Network Services on top of virtualized infrastructure

– Valuable and acceptable framework for other industries

• Status

– Initial Requirement study, architecture design, Gap analysis : Done (See Document [link])

– Collaborative Development: Started (Blueprints are proposed to Nova and Ceilometer)

– Standardization Sync: On-going (by NFV member efforts, joint meeting)

Page 3: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

3

Use Case 1: Fault management

Page 4: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

4

Use Case 2: Maintenance

Page 5: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

5

High Level Architecture

Virtualized Infrastructure

Applications

VIM User and Administrator

Virtualized Infrastructure Manager (VIM)= OpenStack

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Page 6: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

6

Fault Management Sequence

Virtualized Infrastructure

Applications

VIM User and Administrator

Virtualized Infrastructure Manager (VIM)= OpenStack

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Detectio

n

Reaction

Doctor Initial Focus

Page 7: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

8

Key Requirements as VIM

Immediate Notification

Consistent Resource State

Awareness

Extensible Monitoring

Fault Correlation

Page 8: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

9

TO-BE: Functional Blocks

Virtualized Infrastructure

Applications

VIM User and Administrator

VIM

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Notifier

Monitor

Controller

Inspector

Page 9: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

10

Fault Management Scenarios (1/2)

Monitor

Notifier

User-sideManager

Virtualized Infrastructure

Alarm

Conf.3. Update State2. Find Affected

Applications

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

4. (alt) Notify

Admin-side Manager

5. Notify Error

0. Set Alarm

6-. Action

Failure

Policy

MonitorMonitor

Page 10: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

11

Fault Management Scenarios (2/2)

Monitor

Notifier

User-sideManager

Virtualized Infrastructure

Alarm

Conf.3. Update State2. Find Affected

Applications

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

4. (alt) Notify

Admin-side Manager

5. Notify Error

0. Set Alarm6-. Action

Failure

Policy

MonitorMonitor

Page 11: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

12

AS-IS: OpenStack Kilo (1/3)

• How can you find faults as a tenant user?

– Keep-a-live check to each VM– Polling VM state to Nova API– Set alarm on metering service (e.g. CPU runtime)

Page 12: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

13

AS-IS: OpenStack Kilo (2/3)

• How does the metering service work?

1. Resource controller such as Nova monitors usage of resource [Periodically]

2. Get samples from resource controller and register them to DB [Periodically]

3. Evaluate alarm definition on samples [Periodically]4. Raise alarm depend on result of the evaluation

Machine

Hypervisor

VM

Nova Ceilometer (Heat)

Samples

1.

2. 3

.

4.

Page 13: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

14

AS-IS: OpenStack Kilo (3/3)

• Notification

– OpenStack components post events to messaging queue– Ceilometer collects, transform and publish those events which can be

used for billing

NFVI Neutron Ceilometer (Billing)

Samples

Nova

Cinder

Que

ue

Page 14: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

15

Implementation Plan in OpenStack

15

Ceilomter

Virtualized Infrastructure

Applications

Zabbix

VIM User and Administrator

Error Injection

Plugin ?

Event Alarm

Immediate Notification

Queue

Inspector

Nova

Page 15: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

16

Demo (1/3)

• User Scenario

Web Server

Web Server

Web Server

Load Balancer

HTTP ClientsHTTP

ClientsHTTP Clients

Public Net Private Net

Launch New VM

Page 16: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

17

Demo (2/3)

• Demo 1

• Demo 2

Machine

Hypervisor

VM

Nova

Ceilometer (Heat)

Samples

1. Collect CPU time samples

2. Alarm Heat if CPU runtime = 0

3. Create New Web Server

1. Hook

3. Alarm Heat

Agent

Alarm

2. Notify as Event

Machine

Hypervisor

VM

Nova

Ceilometer (Heat)

Agent

Alarm

Page 17: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

18

Demo (3/3) Results

• Demo 1

• Demo 2

90 sec

26 sec

Page 18: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

19

Doctor Southbound API

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

ConfigurationFault Messaging

Unified Event API

Monitor

Monitor

Threshold

Enable

Enable

Page 19: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

20

Case 1: Obvious Fault

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

ZabbixBMC(Inspecto

r)Nova

Ceilometer

User

ConfigurationFault Messaging

SNMP Trap(Power-off)

HTTP POST(Host A down)

HTTP POST(Host A down,

VM A1-A3 down)

HTTP POST(VM A1 down)

HTTP POST(Alert: VM A1 down)

HTTP POST(Create Alarm)

Enable

Enable

Page 20: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

21

Case 2: Threshold Exceeded Fault (Admin Config)

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

Zabbix

Monitor Agent

(Inspector)

NovaCeilomet

erUser

ConfigurationFault Messaging

HTTP POST(Switch down) HTTP POST

(Host A down, VM A1-A3 down)

HTTP POST(VM A1 down)

HTTP POST(Alert: VM A1 down)

HTTP POST(Create Alarm)

Threshold

Enable

Enable

vSwitch

collectd

Admin Threshold

Page 21: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

22

Backup

Page 22: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

23

Fault Management Sequence (Optional)

Virtualized Infrastructure

Applications

VIM User and Administrator

Virtualized Infrastructure Manager (VIM)= OpenStack

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Auto Reaction

Detectio

n

Reaction

Page 23: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

24

Fault Management Scenarios (Optional)

Monitor

Notifier

User-sideManager

Virtualized Infrastructure

Alarm

Conf.3. Update State2. Find Affected

Applications

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

4. (alt) Notify

Admin-side Manager

5. Notify Error

0. Set Alarm

6-. Action

Failure

Policy

Monitor

Auto Reaction

Monitor

Page 24: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

25

Configuration / Policy Enforcement

25

UserNFVI

Conf.Polic

yInspector Notifier

Admin

Policy Service

Conf.

Monitor

ConfigurationFault Messaging

Option 1: Policy Service Integration

Option 2: Using Metadata in Controller

Metadata

Threshold

Enable

Metadata

Controller

PolicyThreshold

Enable

Page 25: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

26

Case 3: Threshold Exceeded Fault (User Config)

26

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

Zabbix

Monitor Agent

(Inspector)

NovaCeilomet

erUser

ConfigurationFault Messaging

HTTP POST(Switch down) HTTP POST

(Host A down, VM A1-A3 down)

HTTP POST(VM A1 down)

HTTP POST(Alert: VM A1 down)

HTTP POST(Create Resource with Policy Label)

vSwitch

collectd

Admin

Policy Service

Enable

ThresholdEnable Threshold

Policy

CongressHTTP POST(Set Policy)

HTTP POST(Data)

Metadata