availability manager and disaster recovery for vmware vcenter, vhost and vms

13
Apoorva Gouni ID: 009540349 Email: [email protected] CMPE 283 Virtualization Technologies Project 1: Availability Manager and Disaster Recovery Submitter: Apoorva Gouni (ID: 009540349) [email protected] Fall 2014 Submitted To: Professor Shimon Shim Software Engineering Department San Jose State University

Upload: apoorva-gouni

Post on 19-Jul-2015

142 views

Category:

Software


4 download

TRANSCRIPT

Apoorva Gouni ID: 009540349

Email: [email protected]

CMPE 283

Virtualization Technologies

Project 1:

Availability Manager and Disaster

Recovery

Submitter:

Apoorva Gouni (ID: 009540349) [email protected]

Fall 2014

Submitted To:

Professor Shimon Shim

Software Engineering Department

San Jose State University

Apoorva Gouni ID: 009540349

Email: [email protected]

Introduction:

Goals:

The Goal of this project is to:

Work on several hypervisors and their corresponding management services and gain

experience on them.

Explore and understand the VMware VI (vSphere) Java API and apply them to the real

world problem.

Objectives:

The objective of this project is to build an Availability Manager using ESXi’s Java API, which

will:

Monitor the liveliness of the virtual machines running on any of the hosts by ping

mechanism.

Monitor the liveliness of the vhosts by ping mechanism.

Take Snapshots of virtual Machines and vhosts at regular interval of time.

Restore the Virtual machines and vhosts using their snapshots if they are not alive.

Background:

The Host servers and their corresponding virtual machines are managed using the

VMware vCenter Server application.

vCenter Server manages multiple ESX servers and virtual machines from different ESX

servers through a single console application, therefore giving a unified view of the

network.

vCenter Server carries out several operations such as:

o Create data-centers.

o Add hosts to data-center.

o Create resource pools for hosts.

o Create virtual machines on hosts.

o Assign resource pools to hosts.

Though the vCenter server allows managing interfaces like hosts, VMs etc. There is a

need for a separate module to monitor the status and health of the host and VMs at

regular intervals and restore them with working VMs in the event of any unprecedented

failure.

A disaster recovery system that monitors the health of virtual machines and takes

appropriate steps to recover and sustain any damage is designed.

Apoorva Gouni ID: 009540349

Email: [email protected]

Requirements:

Functional Requirements:

The virtual Machine should be pinged at the regular intervals of time.

Take the snapshots of virtual machine and vhosts at regular intervals of time. At any

point of time only one recently taken snapshot should be available.

The virtual Machine is considered to be alive if it responds to ping.

In case virtual Machine fails to respond to ping for a couple of times then application

should identify why it has failed.

If the VM ping failure is due to the network failure or any other accidental shutdown, try

to restore the VM by using its snapshot and power it on.

Even though the VM is powered on, ping to the VM fails in some scenarios. In such

situation try to ping vhost and check if vhost is alive i.e. responding to the ping or not.

If ping to the vhost fails, try to restore the vhost using the snapshot and power on the

vhost.

Application should setup alarm on VM power off. Application should be able to prevent

the failover from occurring if powered off by the user.

Gather the CPU, I/O, network statistics for a virtual machine and display it.

Non-Functional Requirements:

The system design should be robust to encompass all the failure cases. This allows

smooth recovery of the hosts and virtual machines.

The system should ping the VMs at regular intervals to ensure VMs are alive and avoid

and delay in system recovery for failures.

The system should allow sufficient time between VM recovery activities to allow the

operations to be completed successfully.

System performance should be reliable.

Design:

Components:

Hosts running on the datacenter (vCenter).

Virtual Machines on hosts.

Ubuntu32 operating system installed in each of the virtual machine.

VMware tools installed in the virtual machine operating system to get the network

information of virtual machine.

VMware Esxi.

Apoorva Gouni ID: 009540349

Email: [email protected]

Key workflows:

The Availability Manager is designed as per the following key workflows:

There are two threads of operations running simultaneously to meet the functional

requirements.

One thread creates snapshots of hosts and virtual machines for every five minutes to save

the most recent working snap shot of the system. This thread run continuously taking

snapshots at regular intervals. However, the threads checks if the host or VM is alive, by

using the ping mechanism, and take snap shots only for the active systems.

One thread checks if the virtual machines are alive by pinging them at regular intervals of

ten seconds. This thread also checks if the virtual machine fails for any reason and

restores the virtual machine using the snapshot taken, if it is not powered down by the

user. However, there is also a possibility of the host to fail while the virtual machine is

powered on. In this scenario, the host is restored using the snapshot taken.

These threads ensure recovery and disaster management of the virtual machines.

Architecture:

The System architecture encompasses across these classes implementing the key workflow

mentioned above:

VMMonitor.java: This class sets up the system flow by enumerating/mapping the

virtual machines and their corresponding hosts; creating alarm triggers for each VM to

detect a user power off; creating two threads, AvailabilityManager and

HostSnapShotManager to monitor the VM status/perform the recovery mechanism

accordingly and to create last functional snap shot of the host and VM respectively.

AvailabilityManager.java: This class is an individual thread enumerating through the

VM list and checking if the VM is alive using the ping mechanism. If the VM is dead, the

thread checks if it has been powered off accidentally. In this a recovery mechanism is

designed to first check of the host alive and restore just the VM and power it on or else

recover the host and power on the host and its corresponding VMs. This process is

repeated for every VM at a regular time interval of ten seconds.

HostSnapShotManager.java: This class is an individual thread enumerating all the

VMs and their corresponding hosts and taking their last working snap shots at regular

time intervals. These snap shots are used in the AvailablityManager to recover the VMs

and hosts to their last working state.

Implementation:

Environment:

This system is design and developed on Windows operating system.

Using java JRE 1.8.0

Apoorva Gouni ID: 009540349

Email: [email protected]

Tools:

The following tools were used for development, debugging and testing purpose:

Eclipse.

vsphere client and server.

Screen shots:

Ping the virtual machine using IP address and display CPU, I/O and Network statistics.

Apoorva Gouni ID: 009540349

Email: [email protected]

Creating the virtual machine snapshot.

Apoorva Gouni ID: 009540349

Email: [email protected] Creating snap shots for vhosts.

Apoorva Gouni ID: 009540349

Email: [email protected] Restoring the VM using snapshot:

Apoorva Gouni ID: 009540349

Email: [email protected]

Restoring the vhost using the snapstot and power it ON.

Apoorva Gouni ID: 009540349

Email: [email protected]

Triggering the Alarms:

Apoorva Gouni ID: 009540349

Email: [email protected]

Answers to the Questions:

1.Briefly explain the design of your Availability Manager with the help of a class diagram.

Also explain the number of threads you’ve used for the Availability Manager.

Class Diagram of Availability Manager Design:

As per the class diagram above the availability manager is designed using three classes. The

main class (VMMonitor) instantiates the system and creates two threads. One thread

(AvailabiltyManager) checks VM status and performs recovery management, one thread

(HostSnapShotManager) creates snap shots for hosts and VMs at regular intervals.

VMMonitor class: o This class sets up the system flow by enumerating/mapping the virtual machines

and their corresponding hosts – using the HostVMMap() and checkHosts()

methods.

o Creating alarm triggers for each VM to detect a user power off – using the

alarmManager() method.

o Creating two threads:

Apoorva Gouni ID: 009540349

Email: [email protected]

AvailabilityManager to monitor the VM status/perform the recovery

mechanism accordingly – using the manageVirtualMachines() method.

HostSnapShotManager to create last functional snap shot of the host and

VMs – using the manageHostSystems() method.

o In addition to above functionality , the class provides the following methods:

pingHost() – This method is used to ping the host.

pingVirtualMachine()-This method is used to ping the virtual Machine.

createVMSnapshot()-This method is used to create the snapshots of VMs

and vHosts.

restoreVMSnapshot()-This method is used to restore the VMs and

vHosts using snapshot.

AvailabilityManager.java: o This class is an individual thread enumerating through the VM list and checking if

the VM is alive using the ping mechanism – using the pingVirtualMachine()

method.

o If the VM is dead, the thread checks if it has been powered off accidentally.

o In this case a recovery mechanism is designed to first check of the host alive –

using the pingHost() method.

o Restore just the VM and power it on – using the restoreVMSnapshot() method.

o or else recover the host and power on the host and its corresponding VMs– using

the restoreVMSnapshot() and powerOnVM() methods.

o This process is repeated for every VM at a regular time interval of ten seconds.

HostSnapShotManager.java: o This class is an individual thread enumerating all the VMs and their

corresponding hosts and taking their last working snap shots at regular time

intervals – using the createVMSnapshot() method.

o These snap shots are used in the AvailablityManager to recover the VMs and

hosts to their last working state.

2. How does your availability manager handle the scenario where-in the vHost itself is found not to

be alive?

In my case, if the vhost itself is found not to be alive then the availability manager will

Restore the vHost using the latest snapshot.

Power ON the vHost.

Power ON the Virtual machines corresponding to that vHost and

Check if vhost and VMs are working using the ping mechanism.

3. In case of failure, what is a good approach during Disaster Management of Virtual Machines:

o Check the Host first, then the Virtual Machine

o Check the Virtual Machine, then the Host

Justify your answer with sufficient reasons

Apoorva Gouni ID: 009540349

Email: [email protected] In my opinion , Checking the Virtual Machine first and then the vHost is a good approach during the

Disaster Management of virtual machines because:

The virtual machine is an atomic entity that has no dependency on other virtual machines or

hosts. Checking for a virtual machine state first will help detect the problem using a hierarchical

approach.

Restoring a failed virtual machine would easy by simply restoring from the last working snap

shot.

However, a failed host can also be restored using its last working snap shot but the virtual

machine in its resource pool should also be checked and restored if needed.

Discussions: The approach used to configure the failure detection for each VM is:

First check if the VM is alive or not by ping mechanism.

If the VM stopped responding to the ping then it is considered as not alive.

If the VM is not alive it is restored using the latest snapshot.

How host failures were detected :

Check if the host is alive or not by ping Mechanism.

If the vHost stopped responding to the ping then it is considered as not alive.

Conclusion:

Learnt using the VMWare java API for disaster management of virtual machines.

Got familiar with vSphere client and ESXi server tools.

Learnt java threads and their working in detail.

Faced a challenge of creating a snapshot of vHost.

References:

http://vijava.sourceforge.net//

http://vijava.sourceforge.net/doc/getstarted/tutorial.html

http://fuzz-box.blogspot.com/2012/09/how-to-generate-uml-diagrams-from-java.html