more fun with vsphere alarms

22
Horst Mundt, Sr. Technical Account Manager © VMware, 2010 1 More Fun with vSphere Alarms Disclaimer This document is provided “as is”. It is not part of official VMware product documentation. Contents Disclaimer ............................................................................................................................................ 1 1. Alarm trigger types .......................................................................................................................... 2 What’s new in 4.1 ................................................................................................................................ 3 2. Using default alarms ........................................................................................................................ 4 Why you need to define actions for default alarms............................................................................ 4 Moving alarms around ........................................................................................................................ 7 3. Event trigger details ........................................................................................................................ 7 4. Alarm actions ................................................................................................................................. 11 5. Putting it all together .................................................................................................................... 11 6. How do I copy alarm definitions between vCenter servers? ........................................................ 14 7. How do I create an alarm that’s based on a certain vCenter event? ............................................ 15 8. How can vSphere alarms help with managing security and compliance? .................................... 17 9. Monitoring HA ............................................................................................................................... 18 10. What if I’m in Germany? ........................................................................................................... 21 11. Conclusion ................................................................................................................................. 22 About me ........................................................................................................................................... 22

Upload: noone119

Post on 08-Mar-2015

680 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 1

More Fun with vSphere Alarms

Disclaimer

This document is provided “as is”. It is not part of official VMware product documentation.

Contents Disclaimer ............................................................................................................................................ 1

1. Alarm trigger types .......................................................................................................................... 2

What’s new in 4.1 ................................................................................................................................ 3

2. Using default alarms ........................................................................................................................ 4

Why you need to define actions for default alarms ............................................................................ 4

Moving alarms around ........................................................................................................................ 7

3. Event trigger details ........................................................................................................................ 7

4. Alarm actions ................................................................................................................................. 11

5. Putting it all together .................................................................................................................... 11

6. How do I copy alarm definitions between vCenter servers? ........................................................ 14

7. How do I create an alarm that’s based on a certain vCenter event? ............................................ 15

8. How can vSphere alarms help with managing security and compliance? .................................... 17

9. Monitoring HA ............................................................................................................................... 18

10. What if I’m in Germany? ........................................................................................................... 21

11. Conclusion ................................................................................................................................. 22

About me ........................................................................................................................................... 22

Page 2: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 2

In terms of Alarms, vCenter 4 has much more to offer than vCenter 2.5.

There is a whole range of default alarms available when you install vCenter 4, and they will give you a

very good first shot for monitoring your vSphere environment. If you’ve never wondered what

exactly the default alarms mean, or how to tune them – that’s fine.

If you’re interested in a bit more detail – read on.

This doc assumes that you are familiar with vSphere alarms in general. I won’t explain every detail.

There is also a great introduction to vSphere alarms at http://www.vmworld.com/docs/DOC-3766.

1. Alarm trigger types vCenter 4 has three different types of alarm triggers: event triggers, condition triggers , and state

triggers.

Confused by ‘condition’ vs. ‘state’ ? I was, since they can both translate to the same word in my

native language. So here’s what they mean in vSphere:

- A condition trigger always refers to a numeric value exceeding a certain threshold.

Example: CPU Usage in MHz > 500

- A state trigger always corresponds to one element out of a discrete set of (non-numeric) possible

states that a managed entity can have with regard to a given property.

For instance, the possible states that the Host connection state property can be in are “connected”,

“not connected”, or “not responding”.

You can actually combine condition and state triggers within a vSphere alarm definition:

The third trigger type – event triggers - cannot be combined with any of the two other types in any

vSphere alarm definition. As the name implies, event triggers relate to certain events that happened

in the vSphere environment, for example a VM was powered of, an ESX host lost access to its storage

etc.

Sometimes this can be a little bit confusing. Should you be looking for a state trigger or an event

trigger if you want to create a new alarm for a certain situation? The first place to look for this kind of

information is the vSphere Basic System Administration guide (“BSA”)

(http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_admin_guide.pdf). For vSphere 4.1 this

has been moved to the Datacenter Administration Guide

Page 3: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 3

(http://www.vmware.com/pdf/vsphere4/r41/vsp_41_dc_admin_guide.pdf). It has a good section on

Alarm Triggers. However it does not give you the triggers “at-a-glance”, so I’ve created an Excel sheet

that lists all the condition, state, and event triggers from the BSA and can be used as a planning sheet

for setting up your vSphere alarms. That’s the sheet called “Alarm Triggers from BSA” in the attached

Excel workbook.

What’s new in 4.1 There were not too many change in vSphere 4.1. Here are the triggers that are listed in the 4.1

manual but not in the 4.0 manual:

Entity Trigger Type Trigger Name / Event Category Description / Available Events

Virtual Machine Event Deployment VM Created

Virtual Machine Event Deployment VM auto renamed

Virtual Machine Event Deployment VM being cloned

Virtual Machine Event Deployment VM being creating

Virtual Machine Event Deployment VM deploying

Virtual Machine Event Deployment VM emigrating

Virtual Machine Event Deployment VM hot migrating

Virtual Machine Event Deployment VM migrating

Virtual Machine Event Deployment VM reconfigured

Virtual Machine Event Deployment VM registered

Virtual Machine Event Deployment VM removed

Virtual Machine Event Deployment VM renamed

Virtual Machine Event Deployment VM relocating

Virtual Machine Event Deployment VM upgrading

Virtual Machine Event Deployment Cannot complete clone

Virtual Machine Event Deployment Cannot migrate

Virtual Machine Event Deployment Cannot relocate

Virtual Machine Event Deployment Cannot upgrade

Virtual Machine Event HA Insufficient failover resources

Cluster Event HA Cluster overcommitted

Cluster Event HA Virtual Machine heart beat failed

The HA “Cluster overcommitted” trigger sounds quite useful.

Page 4: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 4

Personally I find that one of the most useful changes in 4.1 is that you can now get an alarm if an

uplink on a distributed switch fails. This was not possible in 4.0.

Note that the Basic System Administration Guide does not list every possible trigger. In fact it does

not even list all the triggers that are available in the vSphere Client.

For a complete listing of the event triggers that are new in 4.1 please refer to the tab called

“All_API_Events_41” in the attached excel sheet. It has a column that allows you to filter for events

that are (not) available in vSphere 4.0. In total , 4.1 has 117 new event triggers.

2. Using default alarms The vSphere Basic System Administration Guide also lists most of the default alarms defined in

vSphere. Again, I’ve copied them to an excel sheet for easier use in planning. That’s the sheet called

“vSphere default alarms” in the attached Excel workbook.

Why you need to define actions for default alarms The datacenter administration guide says “VMware provides preconfigured alarms for the vCenter Server system that trigger automatically when problems are detected. You only need to set up actions for these alarms.” And indeed you should set up alarm actions for those default alarms. This is especially important for

the so-called stateless alarms. Let’s have a look at an example:

The “cannot connect to storage” alarm is probably quite a useful alarm to have. If we look at its

definition we see that the triggers have a status of “unset”.

Page 5: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 5

The alarm has an action defined , it sends an SNMP trap.

So what’s going to happen if I remove the storage from one of my hosts?

Let’s look at the vCenter events for that host:

Looking at the events from bottom up we see:

1. A “Lost connectivity to storage device” event is generated

2. The alarm state changed from “gray” to “gray”

3. It triggers an action

4. SNMP trap is sent

We also see that the affected host does not show any errors:

Page 6: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 6

So what would have happened if I had not had any alarm action defined on that alarm?

Well, nothing. I’d never notice that there was a problem with the host unless I’d take a close look at

the vCenter events.

Now let’s try something different. We take a look at the “Network connectivity lost” alarm:

Other than the storage alarm it has a status setting of “Alert”.

I’ve removed the default “Send trap“ action:

So let’s see what happens if I remove a network connection from one of my ESX hosts.

Page 7: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 7

As we see, the host turns “red”. I’d probably notice this in my vSphere client. I don’t get a

notification though, since I don’t have an alarm action defined.

Key takeaway: Always define suitable actions for the default alarms. Otherwise they might be less

useful then you’d expect.

Moving alarms around If you have worked with the default alarms in vSphere you will have noticed that they are defined

“on top level”, i.e. they apply to all objects that are managed by your vCenter server.

One question I get to hear frequently is “Can I overwrite alarm settings on a lower level?”.

Why would you want to do that? Well, let’s say your alarm action is set to “send an email” if an ESX

host gets disconnected. All these emails eventually generate an SMS to your mobile phone. You have

to react 24x7. You don’t want to spend your weekends hunting after alarms generated by some not-

so-important hosts in your test environment.

Fortunately, that’s an easy one. You can disable alarm actions on any managed entity (datacenters,

clusters, hosts, VMs, datastores, …) by right-clicking the entity in the vSphere client and choosing

“disable alarm actions”. No more SMS on Sundays.

However, this turns off alarm actions completely (Alarms will still be shown in the vSphere client, but

no more emails, snmp traps, script executions , or other actions). If you want to keep some of the

default alarms on a given entity, but disable others and/or change the alarm actions on some, the

there’s unfortunately no “built-in” way to do this.

But there’s a great PowerCLI script by LucD that can help you achieve this. It basically copies (or

moves, as you like) the alarm definitions to lower levels in the object hierarchy where you can modify

them. Check http://lucd.info/?p=1799 for details.

3. Event trigger details We will now focus on event triggers. And of course the first question that comes to mind is “Where’s

the list of event triggers that are available in vSphere?”.

Page 8: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 8

And the short answer to that is “It’s in the attached Excel sheet”. That’s the sheet called

“All_API_events” in the attached Excel workbook.

The long answer is – well – slightly longer: You can define event triggers on any event that’s available

in the vSphere API. The vSphere API reference is available at

http://www.vmware.com/support/developer/vc-sdk/visdk400pubs/ReferenceGuide/index.html.

Have fun.

Not the answer you were looking for? OK, here’s some more detail. Actually the API provides a lot of

information that can be gathered by querying you vCenter server. If you have a close look at the API

reference you’ll notice that it does not list the event triggers. So I guess they can change between

vCenter releases (at least slightly). Here’s a PowerCLI scriptlet I used to query the event triggers from

vCenter 4.0 U1.

connect-viserver

$eventMan = get-view eventManager

$eventMan.get_Description() | select -expand Eventinfo | Export-Csv -

NoTypeInformation

Easy, huh? Of course I manually formatted the resulting excel sheet and also added some grouping.

That’s the first column in the sheet, and it’s entirely mine, including any potential mistakes

The 422 different event trigger may be a bit overwhelming, so let’s get back to the default alarms

based on event triggers for the moment. Of course the alarm names in vSphere client are pretty self-

explaining, but maybe you want to know exactly how the alarms work.

Let’s have a closer look at the “anatomy” of an event trigger. If you’ve worked with condition and

state alarms you may have noticed that the trigger conditions can be combined in an “OR-fashion” or

in an “AND-fashion”. In the vSphere client this is called “Trigger if any conditions are satisfied” (“OR-

fashion”) or “Trigger if all of the conditions are satisfied” (“AND-fashion”):

Now with event triggers you don’t have that choice:

Page 9: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 9

With event triggers, different trigger expressions are always combined in an “OR-fashion”, i.e. the

alarm will trigger if any of the events happens. But you can see that there are “advanced settings

associated with this trigger”. We’ll get back to that in a moment (BTW you should always read this as

“advanced setting may be associated with this trigger”).

So where are the exact definitions of the default event triggers in vSphere? Again, the short answer

is “in the attached excel”. That’s the sheet called “Default_event_triggers” in the attached Excel

workbook. And, again, the long answer is they can be retrieved using the API. The Powercli code to

get them is in the attached Get_Alarms2.ps1. It’s a tad more complicated than the previous script.

If you look at the excel sheet, you’ll notice that some triggers have additional fields called

“Comparisons”. That’s what they are called in the vSphere API. In the VI client they are called

“Advanced Settings”.

Example:

Page 10: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 10

Corresponds to:

Page 11: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 11

4. Alarm actions Now that we’ve covered the alarm triggers, let’s have a short look at the actions that can be taken

when an alarm is triggered. Here’s what the vSphere client offers:

As you can see, the available options differ slightly, depending on if you are setting an alarm on a

host or on a VM. You should be used to the notification stuff from vCenter 2.5, but the other options

are new in vSphere – and they are pretty powerful .

My favorite for testing is “Run a command”. This will run a script on vCenter server that can be used

to process the alarm information and pass it on to any other monitoring toll of your choice. vCenter

will pass certain information on the alarm to the script by using environment variables. We’ll see how

that works in a moment.

5. Putting it all together Here’s an example. We define a custom alarm that will trigger when a VM is powered down, and

execute a script (in a real environment you’d probably rather send an SNMP trap or an email, but

let’s do the script for educational purposes). Here’s the script:

Pretty basic, just writes the message “alarm triggered” into a file, and appends the environment

variables.

Here’s our alarm definition:

Page 12: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 12

So our expectation is that this will run the script called “C:\alarm.cmd” whenever a VM gets powered

off. And indeed it will:

Page 13: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 13

Let’s have a look at the file C:\alarm.txt (that’s where our alarm action script wrote its output). We

see that indeed the script has generated the message we expected, and the environment variables

contain useful information about the alarm that can be consumed by other tools:

If you want to do some more advanced stuff, make sure to read

http://blogs.vmware.com/vipowershell/2009/09/how-to-run-powercli-scripts-from-vcenter-

alarms.html

Now the fun part starts.

Take the same alarm definition that triggers when a VM is powered down. But change the alarm

action from “Run a command” to “Power on VM”.

Page 14: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 14

What will this give us? It will give us an alarm that is triggered whenever a VM is powered down, and

as an action it will immediately power on that VM. Go ahead and try it.

The vSphere API for managing alarms is powerful, but it does not give away its treasures easily. For

instance, if you try the reverse of the above example (alarm that triggers when VM is powered on,

and then power off VM ), you may find that it does not work. If you’re running in a DRS cluster, try

the “DRS VM powered on” trigger instead.

Word of warning: Don’t try to apply this to your whole cluster if you vCenter server is running in a

VM.

6. How do I copy alarm definitions between vCenter servers? Imagine you’ve put a lot of time and effort into fine tuning your alarms definition in one of your

vCenter servers. Now you want to have the exact same alarm definitions in another vCenter server.

It’s probably a bit more complicated to achieve this than you might imagine. Copying event based

alarms definitions is rather straightforward using the vSphere API. But alarms that are based on

performance counters require some extra work.

Let’s have a look at an example: We define an alarm that triggers if the host disk utilization exceeds a

certain threshold:

If we look at the alarm definition through the API, we see that the alarm refers to a certain

performance metric that is identified by a counter ID:

In this case the counter ID is 101. So the alarm definition has no direct information about disk usage

– it just refers to a performance counter. We can get the details on that performance counter by

using the vSphere API performance manager’s QueryPerfCounter method. And indeed we’ll find

that this is the performance counter for average disk usage:

Page 15: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 15

Now if we just created an alarm in the destination vCenter using the same counter ID we could run

into a situation where counter ID 101 means something completely different in the destination

vCenter, especially if the destination vCenter is a different version. So we need to remember the

semantics of the performance counters , not just the IDs.

Attached is a sample script that shows the whole process of copying alarms definitions between

vCenter servers.

7. How do I create an alarm that’s based on a certain vCenter event? Sometimes you see an event in vCenter and would like to create an alarm that triggers whenever

that event happens.

Let’s say you want to raise an alarm every time someone changes a custom field on a virtual machine.

In the vCenter events this shows up like this:

Unfortunately there’s no such thing in the drop down list for event triggers in vCenter. So the first

step is to find out the internal (API) name of that specific event. The Excel sheet that comes with this

document has two tabs called “All_API_Events” (one for 4.0, one for 4.1). We search one of these

tabs for the string “Changed custom field”:

Page 16: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 16

This gives us the API Name for the event: CustomFieldValueChangedEvent. Now we have two options.

Either we google for a script that creates event based alarms (here’s a good one:

http://www.lucd.info/2009/11/27/alarm-expressions-part-2-event-alarms/#more-1058) and modify

it to suit our needs.

Or we use a simple trick. Have you ever noticed that the “Drop down list” for Event triggers in the

vSphere client isn’t really what it seems to be? You can actually just type something instead of

selecting one of the predefined options. So we type our CustomFieldValueChangedEvent (prefixed by

‘vim.event.’).

Now we change the content of the custom fields on one of our VMs, et voilà …

… we get a nice alarm on that VM.

It’s not a very useful alarm, but you get the meaning.

Note that this trick is strictly speaking probably not supported , and there’s no guarantee it will work

in future vCenter releases, but I couldn’t find any difference between an alarm that was generated

using a script and an alarm generated using this “GUI shortcut”.

If you have a closer look at the “FullFormat” column on one of the All_API_Events sheets you’ll notice

that some of the events description start with things like “esx.clear”, “com.vmware”, “esx.problem”

or “vprob.net”:

Page 17: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 17

If you want to use these triggers you don’t prefix them with “vim.event”.

8. How can vSphere alarms help with managing security and

compliance? Here’s an example that may be more useful in a real life environment. If your environment has

specific security or compliance requirements you’ll probably want to get notified if someone changes

roles or permissions in vCenter.

Like this:

Page 18: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 18

9. Monitoring HA You might want to get notified if HA restarts one of your VMs on another host because the original

host failed. This is quite easy to do by defining an appropriate VM alarm:

However you might be even more interested in knowing if HA failed to restart a VM. So let’s look at

the vCenter events for some failed restarts.

We might encounter a situation where a surviving host in the HA cluster has insufficient resources to

start the VM:

You’ll get an event like this also if the VM is connected to a port group that does not exist on the host

that’s trying to restart it (which is a very broad interpretation of “not enough resources” if you ask

me).

We also might encounter a situation where a surviving host tries to restart a VM but fails. This is

usually the case if the original host is isolated but still keeps a lock on the storage.

The interesting thing about this is that the events pop up in vCenter while the VM is disconnected:

Page 19: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 19

After some time HA will give up and see an event telling us that HA has reached the maximum retry

count for this VM1:

So let’s try to create an alarm that goes to yellow (warning) state on a “Failover unsuccessful” event

and goes red on the “Not enough resources” or “Reached maximum restart count” event.

First we look up the “Not enough resources for failover” event:

Obviously the event is called „NotEnoughResourcesToStartVMEvent”. Nice.

Now we look up the “Reached maximum restart count” event:

This one’s called “VmMaxRestartCountReached”.

1 This is five by default, determined by the parameter das.maxvmrestartcount. See VMware KB 1009625

for details

Page 20: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 20

Turns out there is a predefined item in the vsphere client for the unsuccessful failover event, so we

just use that one.

Here’s the alarm:

We test the alarm by disconnecting all network interfaces from a host while keeping the connection

to the storage. This will provoke an unsuccessful failover event, since I have the isolation response in

my cluster set to “leave powered on” and thus the original host will keep the locks on the VM files.

By the way if you want to try this for yourself and don’t have a bunch of ESX hosts that you can use

for testing: VMware Workstation 7 is a great tool for doing things like this. Just make sure you have

enough RAM in your host, and include an iSCSI appliance for shared storage in your setup.

Let’s see what happens. First after roughly a minute we see a warning pop up on the VM:

We also see that a trap is sent (because I configured the alarm to do so):

Page 21: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 21

If we are patient enough we‘ll again see HA give up after some time:

…and the VM has an Alert, as we would expect:

In a production environment, you’d probably want to send this trap or an email to a management

system, so someone can react to it.

10. What if I’m in Germany? vCenter server is available in localized versions in German , French , and Japanese. Now many

vSphere admins prefer to have it in English, especially in large international companies. One thing

that is particularly annoying about vCenter is the fact that it switches its messages to German as soon

as it runs on machine that has German regional settings. I guess it’s similar in French and Japanese.

The only way to prevent this is to replace the message files in the ‘de’ (fr,jp) locale folder with the

ones from the ‘en’ folder. See VMware KB 1015646 for details.

Page 22: More Fun With vSphere Alarms

Horst Mundt, Sr. Technical Account Manager © VMware, 2010 22

11. Conclusion I hope this has given you some ideas what can be done with alarms in vSphere. I strongly recommend

that you do any tests in an environment that is separate from your production systems.

Have fun.

About me I am a Senior Technical Account Manager for VMware in Germany. I work with customers who have

fairly large VMware deployments since 2008. Monitoring the environment is a topic that is always

good for discussions. Most of the content in this document has been inspired by discussions with

customers and colleagues.