opensaf symposium - intro to opensaf_9.13.11

21
Introduction to OpenSAF David Fick David Fick Senior Software Architect GoAhead Software

Upload: opensaf-foundation

Post on 13-Jul-2015

2.416 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Introduction to OpenSAF

David FickDavid FickSenior Software Architect

GoAhead Software

Page 2: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Introduction to OpenSAF

• Service availability and high availability systems and concepts have been around for decades

• However, HA terminology tends to vary from industry to industry and company to company

• Goals of this session:• Goals of this session:– High-level technical overview of the Service Availability™ Forum

standards– Overview of the support of those standards within OpenSAF– Allow you to:

• Familiarize yourself with general HA concepts and terminology OR

• Map the HA concepts and terminology with which you are familiar to the SA Forum and OpenSAF versions

– Resources for getting started with OpenSAF

Page 3: OpenSAF Symposium - Intro to OpenSAF_9.13.11

SA Forum Interfaces: AIS & HPI

SAF

Sys

tem

Ma

na

ge

me

nt

Sys

tem

Ma

na

ge

me

nt

ApplicationsApplications

Service Availability MiddlewareService Availability Middleware

Application Interface Specifications (AIS)Application Interface Specifications (AIS)

AvailabilityAvailability Lock (LCK)Lock (LCK)Software MgmtSoftware Mgmt

Framework (SMF)Framework (SMF)SAF

StandardsImplemented

by OpenSAF

Sys

tem

Ma

na

ge

me

nt

Sys

tem

Ma

na

ge

me

nt

Hardware Platform Interface (HPI)Hardware Platform Interface (HPI)

Hardware Hardware

Platform APlatform AHardware Hardware

Platform BPlatform B

Hardware Hardware

Platform CPlatform CHardware Hardware

Platform DPlatform D

Virtualization Virtualization

Operating SystemOperating System

Log (LOG)Log (LOG)

InformationInformation

Model Model Mgmt (IMM)Mgmt (IMM)

Notification (NTF)Notification (NTF)

ManagementManagement

Framework (AMF)Framework (AMF)

Cluster Cluster Membership (CLM)Membership (CLM)

Platform Platform Mgmt (PLM)Mgmt (PLM) Message (MSG)Message (MSG)

Checkpoint (CKPT)Checkpoint (CKPT)

Event (EVT)Event (EVT)

Lock (LCK)Lock (LCK)Framework (SMF)Framework (SMF)

Page 4: OpenSAF Symposium - Intro to OpenSAF_9.13.11

But how to make sense of the

SA Forum “acronym soup”?

Page 5: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Application ServicesResource Availability Management Services

System Management Services

AIS Service Groupings• First, understand that the AIS services fall into three

logical groupings*:

AvailabilityAvailability

ManagementManagement

Checkpoint (CKPT)Checkpoint (CKPT)InformationInformation

Model Mgmt (IMM)Model Mgmt (IMM)

Log (LOG)Log (LOG)

Software MgmtSoftware Mgmt

Framework (SMF)Framework (SMF)

Notification (NTF)Notification (NTF)

ManagementManagement

Framework (AMF)Framework (AMF)

Cluster Cluster Membership (CLM)Membership (CLM)

Platform Platform Mgmt (PLM)Mgmt (PLM) Lock (LCK)Lock (LCK)

Event (EVT)Event (EVT)

Message (MSG)Message (MSG)

Model Mgmt (IMM)Model Mgmt (IMM)

* - Not official SA Forum AIS service groupings

Services that manage central system capabilities commonly used by both:

• AIS services

• Applications

Services that manage and monitor the state of key system resources that affect availability:• Hardware / Operating

system

• Cluster nodes

• Applications

Optional services to support application operations such as:• Inter-process

communication• State replication

• Shared resource access control

Page 6: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Fault Management Cycle • Second, AIS services that

manage availability are designed around a standard fault management cycle

Detection– Detection• E.g. component

healthchecks

– Isolation

Isolation

Recovery

Repair Notification

– Isolation

• E.g. blade power off

– Recovery

• E.g. failover of workload

assignments to associated

standby resources– Repair

• E.g. automatic restart of

failed resource

– Notification• E.g. state change

notifications sent by service managing the resource

Page 7: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Resource Dependencies• Third, Availability Management in the AIS world is

driven by a detailed understanding of the availability management dependencies across all resource types

– Managed Applications• Simple to complex dependencies and relationships can be

modeled between the various software elements• Dependency on a particular node also modeled

– AMF Node• Represents a node where AMF services are provided

AMF Node

Managed Applications

• Represents a node where AMF services are provided• Depends on a CLM node

– CLM Node• Represents a cluster node where AIS services are

provided• Depends on an Execution Environment (optional)

– Platform Resource• Containment and logical dependencies represented

between platform resources• Execution Environment (EE)

– Represents an operating system instance (standalone or virtual)

• Hardware Element (HE)– Represents a physical hardware resource in the system

Hardware Element

Platform Resource

CLM Node

Execution Environment

Page 8: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Common Design Patterns

• Fourth, the AIS services follow common design patterns:– API

• Common library lifecycle

• Naming conventions• Naming conventions

– Resource managed by service � Managed object

• Typically with associated state model

• Managed objects stored in common information model

– Administrative operations

• X.731 style administrative operations for resources which affect availability

– Notifications automatically generated by AIS services for significant system events (alarms, state changes, etc.)

Page 9: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Resource Availability Management Services

• Availability Management Framework (AMF)– Manages the lifecycle and monitors the state of the managed

applications within the system

– More detail in upcoming slides

• Cluster Membership (CLM)– Provides cluster membership change notifications to AIS services

and interested applications

AMF

and interested applications

– OpenSAF CLM implements cluster management protocol dealing with:

• Cluster formation• Active controller selection & failover• Node failure detection

• Platform Management (PLM)– Manages the state of modeled hardware elements and execution

environments (operating system instances)

– Hardware element states and events accessed through Hardware Platform Interface (HPI)

– Manages graceful blade extraction / de-activation cases

– Supports hardware element controls (power on/off and reset)

– Optional service within OpenSAF

PLM

CLM

Page 10: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Availability Management Framework (AMF)AMF Logical Entities

• Structural Entities– AMF Application

• Represents the highest-level service(s) provided by the system

AMF

Application

Service

Group

1..*

– Service Group (SG)• Represents a group of like

ServiceUnit

Component

1..*

Group

1..*

• Represents a group of like logical resources that provide the same service(s)

• Associated redundancy model (e.g. 1+1)

– Service Unit (SU)

• Aggregates a set of resources which when combined provide a higher-level service

– Component

• Represents one or more resources that perform a function within the system

Page 11: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Availability Management Framework (AMF)AMF Logical Entities

• Workload Entities AMF

Application

Protected byService Group

Service Group

Service

Group

1..*– Service Instance (SI)

• Represents a workload to be supported by the system

• Has associated redundancy

Component

ServiceUnit 1

Component

Instance

Component Service Instance

Assigned

Assigned

ServiceUnit 1

ServiceUnit

ComponentComponent

1..*

1..*

Service Instance

1..*

Group

1..*

– Component Service Instance (CSI)

• Represents a more granular workload that needs to be supported by the system

• Has associated redundancy requirements (1+1, N+M, etc.)

• Protected by an identified SG

• Assigned to one or more SUs with an HA state of active, standby, quiescing or quiesced

• Assigned to one or more components

Page 12: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Availability Management Framework (AMF)AMF Logical Entities

• Common Characteristics– Well-defined state model for each logical

entity type• Operational

• Administrative

• Etc.• Etc.

– X.731 style administrative operations• Lock

• Unlock

• Shutdown

• Etc.

• Common AMF Component Types– SA-aware– Non-proxied, non-SA-aware– Proxied, non-SA-aware

AMF

AMF comp process

Library

AMF

Library

CLC-CLI Scripts

Lifecycle mgmt

HA state assignment

SA-aware Component Example

Page 13: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Availability Management Framework (AMF)Service Group Redundancy Models

• Key redundancy model characteristics– Preferred SI assignment model

• # of active resource(s)

• # of standby resource(s)

– Allowed concurrent HA state assignments for SUs

– # of assignable SUs SI1– # of assignable SUs

• Redundancy model options– 2N

• Most common redundancy model

• 1 active resource and 1 standby resource per SI

• SUs can have either all active or all standby SI assignments

– N+M– No Redundancy– N-way– N-way active

Node1 Node2

SU1 SU2

SI1

A S

SI2

A S

2N Service Group Example

Page 14: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Availability Management Framework (AMF)Error Recovery Policies

• Pre-defined AMF component error recovery policies– Configurable– Can be overridden at runtime

• Up to 3 actions per policy– Isolation– Recovery– Recovery– Repair

• Recovery policy scopes– Component– Service Unit– Node

• Recovery policy types– Restart– Failover– Failfast

• Recovery escalation policies

Page 15: OpenSAF Symposium - Intro to OpenSAF_9.13.11

System Management ServicesInformation Model Management (IMM)

• Information Model Highlights– Based on pre-defined object classes

(including AIS classes)

– Holds both configuration and runtimeobjects

– Used by AIS services to store current configuration and runtime state info

– Can be used by applications as well

• Object Management API• Object Management API– Object class management

– Access object attribute values

– Search information model

– Configuration change requests

– Administrative operation invocation

• Object Implementer API– Runtime object management

– CCB validation and application

– Administrative operation handling

• OpenSAF Implementation– Persistence of information model

managed through Persistence BackEnd(PBE) feature

– Replicated to multiple cluster nodes

Page 16: OpenSAF Symposium - Intro to OpenSAF_9.13.11

System Management ServicesSoftware Management Framework (SMF)

Software

Management

Framework

Upgrade

Campaign

Definition

“Upgrade

Instructions”

Adaptation commands

• SMF controls migration from one deployment configuration to another

• Upgrade methods

– Rolling upgrade

– Single step upgrade

FrameworkAdaptation commands

(SMF config object)

SoftwareRepository

InformationModel

Install / remove

software bundles

on target nodes

- Admin operations

- Read/Create/Delete/Update

objects

• [De-]Activation Unit Scope

– AMF Node

– Service Unit

• During the migration SMF

– Maintains the campaign state change model

– Takes measures to enable error recovery

– Monitors for potential errors caused by the migration

– Deploys error recovery procedures

Page 17: OpenSAF Symposium - Intro to OpenSAF_9.13.11

System Management Services

• Notification (NTF)– Publish-and-subscribe semantics for system-level notifications

– Syntax and semantics for ITU X.73x notifications:

• Alarm / security alarm / state change / object create/ delete / attribute change

– Alarm and security alarm notifications automatically logged – Alarm and security alarm notifications automatically logged through LOG service

• Log (LOG)– Flexible, centralized, system-wide logging mechanism

– Pre-defined log streams: alarm, notification, system

– Multiple, custom application log streams allowed

– Configurable log stream characteristics including:

• log file full action: halt, wrap, and rotate

Page 18: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Application Services

• Checkpoint (CKPT)– Intended as a state replication mechanism for distributed

applications

– Can be used for all standby “temperature levels”

• Cold

• Warm• Warm

• Hot

– Through OpenSAF CKPT service API extension

– Semantics of a checkpoint

• Arbitrary set of sections containing opaque data

• Stored in one or more replicas distributed across cluster

• Reads and writes occur against the active replica

– Both synchronous and asynchronous replication options available

– Collocated checkpoint option provided for highest performance

Page 19: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Application Services• Event (EVT)

– Publish-and-subscribe communication paradigm

– Flexible event channel, pattern, and filtering definition

– Subscriber event queue maintained within app process

• Message (MSG)– Messages sent to and read from message queues– Messages sent to and read from message queues

– Single message queue owner at a time

– Message queue maintained outside app process

– Message queues can be logically grouped

• Messages can be sent to a message queue group

• Associated distribution policy (round-robin, broadcast, etc.)

• Lock (LCK)– Cluster-wide, distributed lock service

– Can be used to control access to cluster-level shared resources

Page 20: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Getting Started with OpenSAF

• OpenSAF Technical Educational Resources– Developer Wiki [http://devel.opensaf.org/wiki]

– OpenSAF Developers blog [http://devel.opensaf.org/blog]

– OpenSAF mailing lists [Subscribe: http://list.opensaf.org/maillist/listinfo/]• Users [Archive: http://list.opensaf.org/pipermail/users/]

• Development [Archive: /]• Development [Archive: http://list.opensaf.org/pipermail/devel/]• Announce [Archive: http://list.opensaf.org/pipermail/announce/]

– Latest documentation [http://devel.opensaf.org/hg/opensaf-4.x-

documentation/archive/tip.tar.gz]

– FAQ [http://www.opensaf.org/HOA/assn14944/images/FREQUENTLY%20ASKED%20QUESTIONS%20ABOUT%20OPENSAF%20RE

LEASE%204%20Final%20for%20publication.docx]

– README files in source code repository

• SA Forum Application Interface Specifications [http://www.saforum.org/Service-Availability-Forum:-Application-Interface-Specification-

~217404~16627.htm]

Page 21: OpenSAF Symposium - Intro to OpenSAF_9.13.11

Questions