ha cluster with opensuse leap

39
High Availability Cluster with openSUSE Leap M. Edwin Zakaria [email protected]

Upload: medwinz

Post on 12-Feb-2017

162 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Ha cluster with openSUSE Leap

High Availability Cluster with openSUSE Leap

M. Edwin [email protected]

Page 2: Ha cluster with openSUSE Leap

2

• Mohammad Edwin Zakaria• Linux user since 1998• openSUSE since 6.2 around 1999

https://en.opensuse.org/User:Medwin• openSUSE member

https://connect.opensuse.org/show/Medwin• openSUSE Indonesia

Page 3: Ha cluster with openSUSE Leap

3

Page 4: Ha cluster with openSUSE Leap

4

Page 5: Ha cluster with openSUSE Leap

5

Page 6: Ha cluster with openSUSE Leap

6

Page 7: Ha cluster with openSUSE Leap

7

Page 8: Ha cluster with openSUSE Leap

8

What is

- Cluster ?

- High Availability ?

Page 9: Ha cluster with openSUSE Leap

9

Curious?

• A computer cluster consists of a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software

Page 10: Ha cluster with openSUSE Leap

10

Curious?

• High availability (HA) is a system that is designed to avoid the loss of service by reducing or managing failures as well as minimizing planned downtime for the system. We expect a service to be highly available when life, health, and well-being, including the economic well-being of a company, depend on it.

Page 11: Ha cluster with openSUSE Leap

11

Curious?

• Harvard Research Group divide the HA into several Availability Environment Classification (AEC): AE4, AE3, AE2, AE1, AE0

• http://www.hrgresearch.com/pdf/AEC%20Defintions.pdf

• Other categories: continuous availability, fault tolerance, disaster tolerance

Page 12: Ha cluster with openSUSE Leap

12

Once again what is cluster?

• High performance computing• Load balancer (high capacity)• High availability

‒ 99.99%

‒ MTBF (mean time between failure = total operating time/total numbers of failure)

‒ Single point of failure

Page 13: Ha cluster with openSUSE Leap

13

Once again what is cluster?

Page 14: Ha cluster with openSUSE Leap

14

Challange in HA

• Murphy’s Law “If anything can go wrong, it will”‒ Loss of data

‒ Service outage

• Flood, fire, earthquake, natural disaster, hardware damage‒ Can you afford a downtime?

‒ Can you afford low availability system?

‒ Cost of downtime?

Page 15: Ha cluster with openSUSE Leap

15

Different between HA term

• HA term is widely use• VMware vSphere HA

‒ Closed source

‒ Hypervisor level and host hardware level

• openSUSE/SUSE HA‒ Open source

‒ OS level

‒ Protect critical resources running on VM

‒ HA within Linux OS

Page 16: Ha cluster with openSUSE Leap

16

HA in Linux • Started with heartbeat project in around 1990• Now manage by ClusterLabs http://clusterlabs.org/• The ClusterLabs stack, incorporating Corosync and

Pacemaker defines an Open Source, High Availability cluster offering suitable for both small and large deployments.

• Pacemaker has been around since 2004 and is primarily a collaborative effort between Red Hat and SUSE, they also receive considerable help and support from the folks at LinBit and the community in general.

Page 17: Ha cluster with openSUSE Leap

17

Hardware Consideration• External network, high traffic, use FO or eth

bonding• Communication network between cluster

node, use for messaging, membership, STONITH

• Storage network, use FO or eth bonding• Manage switch• STONITH/fencing device• Shared storage: NAS (nfs/cifs), SAN (fc/iscsi)

Page 18: Ha cluster with openSUSE Leap

18

Hardware Consideration

Page 19: Ha cluster with openSUSE Leap

19

Hardware Consideration

Page 20: Ha cluster with openSUSE Leap

20

Software Component• Corosync

‒ messaging and membership

• Pacemaker‒ Cluster resource management

• Resource Agents‒ Manage and monitor availability of service

• Fencing device‒ STONITH to ensure data integrity

• User interface‒ Crmsh and Hawk

Page 21: Ha cluster with openSUSE Leap

21

Other component• LVS linux virtual server• HAproxy• Shared file system: OCFS2, GFS2• Block device replication: DRBD• Shared storage: SAN• Geo cluster

Page 22: Ha cluster with openSUSE Leap

22

More details

• Pacemaker :

Pacemaker is a cluster resource manager. It achieves maximum availability for your cluster resources by detecting and recovering from node and resource-level failures by making use of the messaging and membership capabilities provided by your preferred cluster infrastructure (either Corosync or Heartbeat).

Page 23: Ha cluster with openSUSE Leap

23

More details

• Corosync :‒ provides cluster infrastructure functionality

‒ provides messaging and membership functionality

‒ maintains the quorum information.

‒ This feature has been utilized by pacemaker to provide high availability solution.

Page 24: Ha cluster with openSUSE Leap

24

In short ...

• Corosync : A quorum system that notifies applications when quorum is achieved or lost

• Pacemaker :‒ To start/stop resources on a node according to the

score.

‒ To monitor resources according to interval.

‒ To restart resources if monitor fails.

‒ To fence/STONITH a node if stop operation fails.

Page 25: Ha cluster with openSUSE Leap

25

Pacemaker Corosync Conceptual Overview

Page 26: Ha cluster with openSUSE Leap

26

Pacemaker Components

• Non-cluster aware components (illustrated in green). These pieces include the resources themselves, scripts that start, stop and monitor them

• Cluster Resource manager, provides the brain that processes and reacts to events regarding the cluster

• Low level infrastructure, Corosync provides reliable messaging, membership and quorum information about the cluster

Page 27: Ha cluster with openSUSE Leap

27

Pacemaker Stack• pacemaker corosync cluster called as pacemaker stack

• Linux kernel by default comes with DLM (distributed lock manager). It provides locking feature which will be used by cluster aware filesystem

• The GFS2 (Global File System2) and OCFS2 (Oracle cluster File System 2) are called as cluster aware filesystem

• To access single filesytem by multiple hosts you need to have either GFS2 or OCFS2.

• Or you can create a file system on top of cLVM (cluster logical volume manager)

Page 28: Ha cluster with openSUSE Leap

28

Page 29: Ha cluster with openSUSE Leap

29

Cluster Filesystem• If you have shared disk and want several nodes access

it, you need cluster aware filesystem

• The open source solution are GFS2 (Global File System2) and OCFS2 (Oracle cluster File System 2)

Page 30: Ha cluster with openSUSE Leap

30

Cluster Block Device• DRBD (distributed replicated block device) allows you

to create a mirror of two block devices that are located at two different sites across an IP network. When used with Corosync, DRBD supports distributed high-availability Linux clusters. It is a network based raid1, and high performance data replication over network

• CLVM2, see https://www.sourceware.org/lvm2/

• Cluster md raid1, see https://www.kernel.org/doc/Documentation/md-cluster.txt

Page 31: Ha cluster with openSUSE Leap

31

Cluster Block Device

Page 32: Ha cluster with openSUSE Leap

32

STONITH• STONITH is an acronym for “Shoot-The-Other-Node-In-

The-Head”.

• It protects your data from being corrupted by rogue nodes or concurrent access.

• Just because a node is unresponsive, this doesn’t mean it isn’t accessing your data. The only way to be 100% sure that your data is safe, is to use STONITH so we can be certain that the node is truly offline, before allowing the data to be accessed from another node.

• STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere.

Page 33: Ha cluster with openSUSE Leap

33

Split brain – the HA problem• Two nodes run the same service, break the

data integrity

• Solution:‒ Quorum

If cluster doesn’t have quorum no action will be taken, means fencing and resource management are disabled without quorum

‒ STONITH

Shoot the other node in the head

• More on stonith http://ourobengr.com/ha/

Page 34: Ha cluster with openSUSE Leap

34

Reference• SUSE HA Extension Doc (can be use for openSUSE

also) https://www.suse.com/documentation/sle-ha-12/

• HA clusterlabs http://clusterlabs.org

• Corosync doc http://landley.net/kdocs/ols/2008/ols2008v1-pages-85-100.pdf

• DRBD http://drbd.linbit.org/en/

• OCFS2 https://ocfs2.wiki.kernel.org/

• CLVM2 https://sourceware.org/lvm2/

• Linux SCSI http://linux-iscsi.org/

Page 35: Ha cluster with openSUSE Leap

Case Study / Hands-on

Page 36: Ha cluster with openSUSE Leap

36

Setting up HA on Leap

• Scenario: ‒ setting up openSUSE Leap 42.1 as host

‒ create 2 VM with QEMU/KVM, install openSUSE Leap 42.1, configure the network, and all the required packages

‒ Configure pacemaker corosync drbd

‒ Setup HA webserver

Page 37: Ha cluster with openSUSE Leap

37

Preparation• Install openSUSE Leap 42.1

• Configure all repository

• Install all the required software

• Create at least 2 virtul machine with QEMU/KVM

• Configure the Cluster

• Create DRBD

• Activate web server / nginx or apache

• Test the status

Page 38: Ha cluster with openSUSE Leap

Thank you.

Join the conversation,contribute & have a lot of fun!www.opensuse.org

Page 39: Ha cluster with openSUSE Leap

39

Have a Lot of Fun, and Join Us At:www.opensuse.org