4.clusterware components

11
Oracle Clusterware : Oracle Clusterware software is designed to run Oracle in a cluster mode The Clusterware software allows nodes to communicate with each other and forms the cluster that makes the nodes work as a single logical server. Starting with the version 10g Release 1 Oracle introduced an own portable cluster software Cluster Ready Services. This product has been renamed in the version 10g Release 2 to Oracle Clusterware. From 11g Release 2 is part of the Oracle Grid Infrastructure software. It provides basic clustering functionality (node membership, resource management, monitoring, etc.) to Oracle software components like Automatic Storage Management (ASM), Real Application Cluster (RAC), single instance databases as well as any kind of failover application. Runs by using Oracle clusterware software OCR Records and maintains the cluster and node membership information. Voting Disk Acts as a tiebreaker during communication failures. Consistent heartbeat information travels across the interconnect to the voting disk when the cluster is running.

Upload: malru

Post on 21-Jul-2016

7 views

Category:

Documents


0 download

DESCRIPTION

Oracle Clusterware Components

TRANSCRIPT

Page 1: 4.Clusterware Components

Oracle Clusterware :

Oracle Clusterware software is designed to run Oracle in a cluster mode

The Clusterware software allows nodes to communicate with each other and forms the cluster that makes the nodes work as a single logical server.

Starting with the version 10g Release 1 Oracle introduced an own portable cluster software Cluster Ready Services.

This product has been renamed in the version 10g Release 2 to Oracle Clusterware.

From 11g Release 2 is part of the Oracle Grid Infrastructure software.

It provides basic clustering functionality (node membership, resource management, monitoring, etc.) toOracle software components like Automatic Storage Management (ASM), Real Application Cluster (RAC), single instance databases as well as any kind of failover application.

Runs by using

Oracle clusterware software

OCRRecords and maintains the

cluster and node membership information.

Voting DiskActs as a tiebreaker during communication failures. Consistent heartbeat information

travels across the interconnect to the voting disk when the cluster is running.

Page 2: 4.Clusterware Components

Node Fencing:

One of the most important jobs of Oracle Clusterware, is to assure that only healthy nodes are part of thecluster, others should be forcefully removed. Oracle Clusterware will fence a node (cut off access to shared resources) in case of:

1. Not being able to ping cluster peers via the network heartbeat.

2. Not being able to ping the cluster voting files.

3. OS scheduler problems, OS locked up in a driver or hardware (hung processes), CPU starvation, etc. which makes generally the above mentioned ping operations not possible in a timely manner.

4. Crash of some important Oracle Clusterware processes.

Page 3: 4.Clusterware Components

Split-Brain Syndrome

What is Split-Brain?

The term "Split-Brain" is often used to describe the scenario when two or more co-operating processes in a distributed system, typically a high availability cluster, lose connectivity with one another but then continue to operate independently of each other, including acquiring logical or physical resources,under the incorrect assumption that the other process(es) areno longer operational or using the said resources.

Page 4: 4.Clusterware Components

Transmit heartbeat messages. Used also for other purposes, like RAC cache fusion, cluster control, etc.

From 11g Release 1 the same value for all operating systems. In 10g on Linux the timeout was set to 60 sec.

oracle@rac1:~/ [+ASM1] crsctl get css misscount30

Cluster interconnect is used as a communication medium to ping the nodes with small heartbeat messages, making sure all of them are up and running.

Network pings are performed by one of the core cluster processes, ocssd.bin (Cluster Synchronization Services).

The network heartbeats are associated with a timeout called misscount, set from 11g Release 1 to 30 sec.

Failure of the cluster interconnect can be simulated by deactivating (e.g. ifdown ethx) all involved network interfaces.

To prevent a split brain in case of a failed cluster interconnect and to guarantee the integrity of the cluster, Oracle Clusterware will fence at least one node from it. After losing the connectivity and network heartbeats 50 % of the misscount interval (i.e. 15 sec.) long, you will observe the first error messages in the cluster alert log file indicating problems in this area. Losing heartbeats 100% of the misscount interval, will trigger node fencing. From the cluster alert log file:

oracle@rac2:~/ [+ASM1] oifcfg getifbond0 192.168.122.0 global publicbond1 10.10.0.0 global cluster_interconnect

[cssd(2864)]CRS-1612:Network communication with node rac1 (1) missing for 50% of timeout interval.Removal of this node from cluster in 14.920 seconds…[cssd(2864)]CRS-1610:Network communication with node rac1 (1) missing for 90% of timeout interval.Removal of this node from cluster in 2.900 seconds[cssd(2864)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity

Page 5: 4.Clusterware Components

More debugging information is written to the ocssd.bin process log file:

[CSSD][1119164736](:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain.Cohort of 1 nodes with leader 2, rac2, is smaller than cohort of 1 nodes led by node 1, rac1, based on map type 2[CSSD][1119164736]###################################[CSSD][1119164736]clssscExit: CSSD aborting from thread clssnmRcfgMgrThread[CSSD][1119164736]###################################

Page 6: 4.Clusterware Components

CRS Process Functionality Failure of the Process Run AS

OPROCd - Process Monitor

1. It provides basic cluster integrity services by means of I/O fencing for the Oracle cluster.

2. It uses the hangcheck timer or watchdog timer for the cluster integrity.

3. Fencing is used to protect the data, if a node were to have problems fencing presumes the worst and protects the data thus restarts the node in question, it’s better to be save than sorry.

Node Restart root

EVMd - Event Management

Spawns a child process event logger and generates callouts:

1. Spawns a processes called evmlogger and generates the events when things happen.

2. The evmlogger spawns new children processes on demand and scans the callout directory to invoke callouts.

Daemon automatically restarted, no node restart oracle

OCSSd - Cluster Synchronization Services

Basic node membership, group services, basic locking:

1. It provides synchronization services among nodes.

2. It provides access to the node membership and enables basic cluster services, including cluster group services and locking

Node Restart to avoid split-brain situations oracle

CRSd - Cluster Ready Services

Resource monitoring, failover and node recovery:

1. CRS manages the OCR and stores the current know state of the cluster, it requires a public, private and VIP interface in order to run

2. Manages resources such as starting and stopping the services and failover of the application resources.

3. It also spawns separate processes to manage application resources.

Daemon restarted automatically, no node restart

root

Page 7: 4.Clusterware Components

Clusterware Components, Processes and Agents

OverviewOracle Clusterware Version 11g Release 2 introduces the concept of the agent. Agents are multi-threaded daemon programs that provide start, start, cleanup and check actions for different resource types. For example, the oraagent for crsd starts ASM, the oracle listener and starting the SCAN listener. Agents can also receive, process and forward events to clients. The standard agents in Oracle Clusterware 11g Release 2 are oraagent, orarootagent and cssdagent. Additionally there can be an application and script agents.Agents create their own log files. These log files are contained in either ORA_CRS_HOME under a directory associated with the name of the agent. See the section titled “Toubleshooting” for more information on log file associated with Oracle Clusterware.There are a number of different processes that are associated with Oracle Clusterware. These processes are rolled up into several different Clusterware components. The following table lists the Components, associated processes and provide a description of the function of the component/process(es):

Component Process Description

Oracle High Availability Services (OHAS)

Ohasd This process is responsible for starting the rest of the Oracle Clusterware stack on a given node. Ohasd is a brand new cluster startup framework in Oracle Clusterware 11g Release 2 that replaces the old init scripts.

Cluster Ready Service (CRS) crsd See the section titled CRS below for more information on this component and the crsd process.

Cluster Synchronization Service (CSS)

ocssd, cssdmonitor, cssdagent

See the section titled CSS below for more information on this component and the crsd process.

Event Manager (EVM) evmd, evmlogger Responsible for publishing Clusterware events.

Cluster Time Synchronization Service (CTSS)

octssd Provides time synchronization services in an Oracle 11g Release 2 cluster.

Oracle Notification Service (ONS)

ons, enos A publish-and-subscribe service responsible for communicating Fast Application Notification (FAN) events.

Oracle Agent oraagent The Oracle Agent is in conjunction with FAN to run scripts when specific Fan events occur.

Oracle Root Agent orarootagent This agent helps CRSD manage resources that are owned by root .

Grid Naming Sertvice (GNS) gnsd Provides gateway services between the multicast domain name service (which allows DNS requests) and external DNS services. GNS provides for name resolution within a cluster.

Grid Plug and Play (GPnP) gpnpd Supports Grid Plug and Play services, new in Oracle Clusterware 11g Release 2. GPnP provides services that allow you to easily add or remove nodes from a given cluster.

Multicast domain name service (mDNS)

mdnsd This service services DNS requests.

Page 8: 4.Clusterware Components

Cluster Ready Services (CRS)CRS is responsible for managing HA options within the cluster. The crsd process manages CRS operations. CRS manages two kinds of resources:

Cluster resources

Local resources

A cluster resource is a resource that is cluster aware and is managed over the entire cluster via the crsctl command. Cluster resources are subject to cross-node switchover and failover. This means that a resource can be assigned to one or more nodes, but may be re-assigned to a different node (of failed over to a different node) on demand. Cluster resources are managed with the CRS daemon (crsd). The OCR is used by CRS to manage the resource.A local resource runs on each node of the cluster. Examples of cluster resources are RAC instances and listeners. CRS can control these services, starting them, stopping them and restarting them in the event of a failure. CSSCSS is a service that is responsible for determining which nodes of the cluster are available to the cluster. CSS also supports other cluster processes by providing node membership information and locking services. The CSS uses the private interconnect for communications as well as the Clusterware voting disks. Through a combination of heartbeat messages over the interconnect and the voting disks CSS will determine the status of each node of the cluster. CSS is also responsible for interfacing with any third-party Clusterware vendors. In these configurations CSS will interface with the vendor Clusterware and maintain the node membership information.The CSS service is critical to Clusterware operations as it fences the operations of the nodes of the cluster. For example, if the interconnect fails on a given node then the failed node will no longer be able to communicate with the rest of the cluster. Without CSS controlling the situation, the isolated node could cause severe issues on the cluster including corruption of database data. This is what is known as a split-brain condition.To avoid split-brain conditions CSS sends heartbeat messages across the cluster interconnect. If a node fails (say the interconnect fails or the node freezes) then that node will no longer send heartbeat messages. The surviving nodes will detect that the heartbeat messages from the node are no longer being sent. CSS then uses the voting disks to determine which node has gone offline. CSS will then work with Oracle Clusterware to evict the missing node from the cluster.The CSS uses several different processes. Failure of these process will result in the restart of the cluster. The CSS process are:

CSS daemon (ocssd) – Manages cluster node membership information. It’s also used in non-RAC installs to provide Group Services (GS). ASM uses GS to register itself and its disk groups.

CSS Agent (cssdagent) – Monitors the cluster and provides fencing services (was oprocd daemon in

previous versions). The CSS Agent is also responsible for monitoring vendor Clusterware.

CSS Monitor (cssdmonitor) – This process monitors for node hangs, monitoris OCSSD processes for hangs

and is also responsible for monitoring vendor Clusterware.

Clusterware Process StartupOracle Clusterware 11g Release 2 changes the way that Clusterware is started. In a Linux install, Clusterware is now started with one init script, init.ohasd which replaces a number of scripts that were previously used. The ohasd daemon sets off a cascade of processes as outlined in the following graphic:

Page 9: 4.Clusterware Components

Note: This graphic only summarizes the processes started by Oracle Clusterware.You can control the startup or shutdown of the cluster via the crsctl command. For example, use crsctl start cluster to start the cluster and crsctl stop cluster to stop the cluster. You can also use the crsctl check cluster command to check on the status of the cluster. See the section titled “Managing Oracle Clusterware” for more information on crsctl and managing Oracle Clusterware.