an introduction to opensaf 5.17.2011

22
Introduction to OpenSAF David Fick Senior Software Architect GoAhead Software

Upload: opensaf-foundation

Post on 13-Jul-2015

4.904 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: An Introduction to OpenSAF 5.17.2011

Introduction to OpenSAF

David FickSenior Software Architect

GoAhead Software

Page 2: An Introduction to OpenSAF 5.17.2011

Introduction to OpenSAF

• Service availability and high availability systems and concepts have been around for decades

• However, HA terminology tends to vary from industry to industry and company to company

• Goals of this session:– High-level technical overview of the Service Availability™ Forum

standards– Overview of the support of those standards within OpenSAF– Allow you to:

• Familiarize yourself with SA Forum and OpenSAF concepts and terminology OR

• Map the HA concepts and terminology with which you are familiar to the SA Forum and OpenSAF versions

– Resources for getting started with OpenSAF

Page 3: An Introduction to OpenSAF 5.17.2011

SA Forum Interfaces: AIS & HPI

SAFStandardsImplemented by OpenSAF

Syst

em M

anag

emen

t

Hardware Platform Interface (HPI)

Hardware Platform A

Hardware Platform B

Hardware Platform C

Hardware Platform D

Virtualization

Operating System

Applications

Service Availability Middleware

Application Interface Specifications (AIS)

Log (LOG)

InformationModel Mgmt (IMM)

Notification (NTF)

AvailabilityManagement

Framework (AMF)

Cluster Membership (CLM)

Platform Mgmt (PLM) Message (MSG)

Checkpoint (CKPT)

Event (EVT)

Lock (LCK)Software MgmtFramework (SMF)

Page 4: An Introduction to OpenSAF 5.17.2011

But how to make sense of theSA Forum “acronym soup”?

Page 5: An Introduction to OpenSAF 5.17.2011

Application ServicesResource Availability Management Services

System Management Services

AIS Service Groupings• First, understand that the AIS services fall into three

logical groupings*:

Log (LOG)

Software MgmtFramework (SMF)

Notification (NTF)

AvailabilityManagement

Framework (AMF)

Cluster Membership (CLM)

Platform Mgmt (PLM) Lock (LCK)

Event (EVT)

Message (MSG)

Checkpoint (CKPT)InformationModel Mgmt (IMM)

* - Not official SA Forum AIS service groupings

Services that manage central system capabilities commonly used by both:• AIS services• Applications

Services that manage and monitor the state of key system resources that affect availability:• Hardware / Operating

system• Cluster nodes• Applications

Optional services to support application operations such as:• Inter-process

communication• State replication• Shared resource access

control

Page 6: An Introduction to OpenSAF 5.17.2011

Fault Management Cycle • Second, AIS services that

manage availability are designed around a standard fault management cycle

Detection

Isolation

Recovery

Repair Notification

– Detection• E.g. component

healthchecks– Isolation

• E.g. blade power off– Recovery

• E.g. failover of workload assignments to associated standby resources

– Repair• E.g. automatic restart of

failed resource– Notification

• E.g. state change notifications sent by service managing the resource

Page 7: An Introduction to OpenSAF 5.17.2011

Resource Dependencies• Third, Availability Management in the AIS world is

driven by a detailed understanding of the availability management dependencies across all resource types

– Managed Applications• Simple to complex dependencies and relationships can be

modeled between the various software elements• Dependency on a particular node also modeled

– AMF Node• Represents a node where AMF services are provided• Depends on a CLM node

– CLM Node• Represents a cluster node where AIS services are

provided• Depends on an Execution Environment (optional)

– Platform Resource• Containment and logical dependencies represented

between platform resources• Execution Environment (EE)

– Represents an operating system instance (standalone or virtual)

• Hardware Element (HE)– Represents a physical hardware resource in the system

Hardware Element

Platform Resource

CLM Node

AMF Node

Managed Applications

Execution Environment

Page 8: An Introduction to OpenSAF 5.17.2011

Common Design Patterns• Fourth, the AIS services follow common design

patterns:– API

• Common library lifecycle• Naming conventions

– Resource managed by service Managed object• Typically with associated state model• Managed objects stored in common information model

– Administrative operations• X.731 style administrative operations for resources which

affect availability– Notifications automatically generated by AIS services for

significant system events (alarms, state changes, etc.)

Page 9: An Introduction to OpenSAF 5.17.2011

Resource Availability Management Services• Availability Management Framework (AMF)

– Manages the lifecycle and monitors the state of the managed applications within the system

– More detail in upcoming slides• Cluster Membership (CLM)

– Provides cluster membership change notifications to AIS services and interested applications

– OpenSAF CLM implements cluster management protocol dealing with:

• Cluster formation• Active controller selection & failover• Node failure detection

• Platform Management (PLM)– Manages state of modeled hardware elements and execution

environments (operating system instances)– Hardware element states and events accessed through Hardware

Platform Interface (HPI)– Manages graceful blade extraction / de-activation cases– Supports hardware element controls (power on/off and reset)– Optional service within OpenSAF

PLM

CLM

AMF

Page 10: An Introduction to OpenSAF 5.17.2011

Availability Management Framework (AMF)AMF Logical Entities

• Structural Entities– AMF Application

• Represents the highest-level service(s) provided by the system

AMF Application

ServiceUnit 1

Component

1..*

Service Group

1..*

1..*

– Service Group (SG)• Represents a group of like

logical resources that provide the same service(s)

• Associated redundancy model (e.g. 1+1)

– Service Unit (SU)• Aggregates a set of resources

which when combined provide a higher-level service

– Component• Represents one or more

resources that perform a function within the system

Page 11: An Introduction to OpenSAF 5.17.2011

Component

ServiceUnit 1

Availability Management Framework (AMF)AMF Logical Entities

• Workload Entities AMF Application

Component Service Instance

Assigned

Assigned

ServiceUnit 1

ServiceUnit 1

ComponentComponent

1..*

1..*

Service Instance

1..*

Protected byService GroupService

GroupService Group

1..*

1..*

– Component Service Instance (CSI)

• Represents a more granular workload that needs to be supported by the system

– Service Instance (SI)• Represents a workload to be

supported by the system• Has associated redundancy

requirements (1+1, N+M, etc.)• Protected by an identified SG• Assigned to one or more SUs

with an HA state of active, standby, quiescing or quiesced

• Assigned to one or more components

Page 12: An Introduction to OpenSAF 5.17.2011

Availability Management Framework (AMF)AMF Logical Entities

• Common Characteristics– Well-defined state model for each logical entity type– X.731 style administrative operations

• Common AMF Component Types– SA-aware

• Applications modified to interact with AMF through AMF API

– Non-proxied, non-SA-aware• Legacy or 3rd party applications that typically cannot

be modified• Interact with AMF through command line scripts to

manage application lifecycle• Always assigned active HA state if running

– Proxied, non-SA-aware• Applications that have knowledge of HA concepts but

do not directly communicate with AMF• Proxy application receives HA “commands” from

AMF and forwards them to proxied application through a custom interface

AMF

AMF comp process

AMF Library

Non-proxied

AMF comp process

AMF

CLC-CLI Scripts

AMF

Proxy component

AMF Library

CLC-CLI Scripts

Proxied AMF comp

process

Lifecycle mgmt

HA state assignment

Lifecycle mgmt

Lifecycle mgmt &HA state

assignment

Proxy HA state assignment ANDProxied comp lifecycle mgmt &HA state assignment requests

Proxy Lifecycle mgmt

Page 13: An Introduction to OpenSAF 5.17.2011

Availability Management Framework (AMF)Service Group Redundancy Models

• 2N– Most common redundancy model– Preferred assignment model per SI:

• 1 active resource• 1 standby resource

– SUs can have either all active or all standby SI assignments

– A.k.a.• 1+1, active-standby, active-backup

• N+M– Preferred assignment model per SI:

• 1 active resource• 1 standby resource

– SUs can have either all active or all standby SI assignments

– Both N and M are configurable– Common variation: N+1

Node1 Node2

SU1 SU2

SI1

A S

Node1 Node2

SU1 SU2

SI1

A

Node3

SU3

S

S

SI2

A

Page 14: An Introduction to OpenSAF 5.17.2011

Node1 Node2 Node3

Node1 Node2

Availability Management Framework (AMF)Service Group Redundancy Models

• No redundancy– Preferred assignment model per SI:

• 1 active resource– Similar to a N+0 redundancy scheme

where N is the number of protected SIs

• N-way– Preferred assignment model per SI:

• 1 active resource• Y standby resources (where Y is

configurable)– SUs can concurrently have both active and

standby assignments

• N-way Active– Preferred assignment model per SI:

• X active resources (where X is configurable)

• No standby resource

Node1 Node2

SU1 SU2

SI1

A A

SI2

SU1 SU2

SI1

A A

SU1 SU2

SI1

A

SU3

S

S

SI2

AS

S

Page 15: An Introduction to OpenSAF 5.17.2011

Availability Management Framework (AMF)Error Recovery Policies

• Pre-defined AMF component error recovery policies– Configurable– Can be overridden at runtime

• Recovery policy scopes– Component– Service Unit– Node

• Recovery policy types– Restart– Failover– Failfast

• Up to 3 actions per policy– Isolation– Recovery– Repair

• Error escalation policies

Page 16: An Introduction to OpenSAF 5.17.2011

System Management ServicesInformation Model Management (IMM)

• Information Model Highlights– Based on pre-defined object classes

(including AIS classes)– Holds both configuration and runtime

objects– Used by AIS services to store current

configuration and runtime state info– Can be used by applications as well

• Object Management API– Object class management– Access object attribute values– Search information model– Configuration change requests– Administrative operation invocation

• Object Implementer API– Runtime object management– CCB validation and application– Administrative operation handling

• OpenSAF Implementation– Persistence of information model

managed through Persistence BackEnd(PBE) feature

– Replicated to multiple cluster nodes

Page 17: An Introduction to OpenSAF 5.17.2011

System Management ServicesSoftware Management Framework (SMF)

Software Management Framework

Upgrade CampaignDefinition

“Upgrade Instructions”

Adaptation commands(SMF config object)

SoftwareRepository

InformationModel

Install / remove software bundles on target nodes

- Admin operations- Read/Create/Delete/Update objects

• SMF controls migration from one deployment configuration to another

• Upgrade methods– Rolling upgrade– Single step upgrade

• [De-]Activation Unit Scope– AMF Node– Service Unit

• During the migration SMF– Maintains the campaign state

change model– Takes measures to enable

error recovery– Monitors for potential errors

caused by the migration– Deploys error recovery

procedures

Page 18: An Introduction to OpenSAF 5.17.2011

System Management Services• Notification (NTF)

– Publish-and-subscribe semantics for system-level notifications• Reader interface for reading historical alarm info as well

– Formal syntax and semantics for ITU X.73x notifications:• Alarm / security alarm / state change / object create/ delete /

attribute change– Used by AIS services to publish service-specific notifications– Alarm and security alarm notifications automatically logged

through LOG service

• Log (LOG)– Flexible, centralized, system-wide logging mechanism– Pre-defined log streams: alarm, notification, system– Supports multiple, custom application log streams– Log streams are configurable on a per log stream basis

• Including log file full action: halt, wrap, and rotate

Page 19: An Introduction to OpenSAF 5.17.2011

Application Services• Checkpoint (CKPT)

– Intended as a state replication mechanism for distributed applications

– Can be used for all standby “temperature levels”• Cold• Warm• Hot

– Through OpenSAF CKPT service API extension– Semantics of a checkpoint

• Arbitrary set of sections containing opaque data• Stored in one or more replicas distributed across cluster• Reads and writes occur against the active replica

– Both synchronous and asynchronous replication options available

– Collocated checkpoint option provided for highest performance

Page 20: An Introduction to OpenSAF 5.17.2011

Application Services• Event (EVT)

– Publish-and-subscribe communication paradigm– Flexible event channel, pattern, and filtering definition– Subscriber event queue maintained within app process

• Message (MSG)– Messages sent to and read from message queues– Single message queue owner at a time– Message queue maintained outside app process– Message queues can be logically grouped

• Messages can be sent to a message queue group• Associated distribution policy (round-robin, broadcast, etc.)

• Lock (LCK)– Cluster-wide, distributed lock service– Can be used to control access to cluster-level shared resources

Page 21: An Introduction to OpenSAF 5.17.2011

Getting Started with OpenSAF• OpenSAF Technical Educational Resources

– Developer Wiki [http://devel.opensaf.org/wiki]– OpenSAF Developers blog [http://devel.opensaf.org/blog]

– OpenSAF mailing lists [Subscribe: http://list.opensaf.org/maillist/listinfo/]• Users [Archive: http://list.opensaf.org/pipermail/users/]• Announce [Archive: http://list.opensaf.org/pipermail/announce/]• Development [Archive: http://list.opensaf.org/pipermail/devel/]

– Latest documentation [http://devel.opensaf.org/hg/opensaf-4.x-documentation/archive/tip.tar.gz]

– FAQ [http://www.opensaf.org/HOA/assn14944/images/FREQUENTLY%20ASKED%20QUESTIONS%20ABOUT%20OPE

NSAF%20RELEASE%204%20Final%20for%20publication.docx]– README files in source code repository

Page 22: An Introduction to OpenSAF 5.17.2011

Questions