environment monitoring

Post on 23-Jun-2015

165 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Brief description and best practices

TRANSCRIPT

ENVIRONMENT MONITORING

CONCEPTS & GOOD PRACTICES

R.Radünz

rradunz@gmail.com

1

Agenda

2

Concepts

What

Why

Who

How

When – ALWAYS!

Good Practices

3

What you should know

Monitoring is only one small part of a larger scenario that includes a Service Desk function, Incident, Problem, Change,

Release Management and other processes. Used alone, it will not transform your datacenter in a state-of-the-art

showcase.

Nonetheless, it is indispensable.

3

4

Concepts

4

Technology

Process

People

5

What is monitoring?

5

Watch

Alert

Document

Act*

* Automated Response Systems

6

But Wait!

Why are we doing this???

6

Abbreviate the return to normality

REACTIVE

Prevent Interruptions to a process

PROACTIVE

Optimize the use of a resource

ADVANCED

7

What…

7

Server

O.S.

Services

Applications

Services

DBMS

Database

Database

Storage

Network

Switch

Router

FirewallExternal resourcesPlug & Pray

Watch!

8

What…

8

Destination and Content depend on

SEVERITY of the event

TYPE of probe

TIME of Day or Shift

Alert!

9

What…

9

SEVERITY

Error

Warning

Informational

TYPE of probe

Ping,TCP

URL, Exists, Available

Current Value, % In

Use…

TIME of Day

Shift 1

Shift 2

Night - WE

Alert!

10

What…

10

Informational

• Performance Analysis

• Capacity Planning teams

Warning

• Level 2 teams• Performance

Error

• 24x7 teams• Operations

Center• Service Desk*

Alert!

Destination depends on SEVERITY

* Remember ESCALATION RULES!

11

What…

11

% In Use, # of jobs, # of tasks

• Performance Analysis

• Capacity Planning teams

Databases, Services, % avail

• Level 2 teams• Performance

Ping, URL, Services

• 24x7 teams• Operations

Center• Service Desk*

Alert!

Destination depends on TYPE of Probe

* Remember ESCALATION RULES!

12

What…

12

Shift 1

• Shift 1 teams

Shift 2

• Shift 2 teams

Night & weekends

• 24x7 teams• Manager

Alert!

Destination depends on TIME of Day

13

What…

13

Accountability

SLM

Process Improvement

Trends

Audit Trail

Complex Event Review

Document!

14

What…

14

Automated Response Scripts

Reboot

Restart

VMs - Load Balance/VMotion

Act!

15

Good Practices

15

Please DO NOT

Flood the incident team with false positives!Generate more than one alert for the same

eventSend an alert to the wrong personForget people needs restForget no one is reading email at 3 AM

16

Good Practices

16

Please DO

Establish a single point of contact (DC status)Warn the teams about planned maintenanceDefine clear responsibility (who does what)Define and document escalation proceduresEnjoy your success!

17

Questions?

17

top related