august 28, 1998new features in patrol 3.01 new features in patrol version 3 michael jung...

8
August 28, 1998 New features in PATROL 3.0 1 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol actions and resources control Configuration WWW Interface Patrol usage

Upload: job-lyons

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: August 28, 1998New features in PATROL 3.01 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol

August 28, 1998 New features in PATROL 3.0 1

New features in PATROL version 3

Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY)

System overview

Patrol actions and resources control

Configuration

WWW Interface

Patrol usage

Page 2: August 28, 1998New features in PATROL 3.01 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol

August 28, 1998 New features in PATROL 3.0 2

Patrol 3.0

based on SLAC patrol by C. Boheim modified and extended by M. Jung WWW interface in Javascript available

supported architectures

AIX IRIX

SunOS Linux

Solaris DEC-Unix

HP-UX

easy adaptation to new architectures by specifying patterns for the output of certain system commands

Page 3: August 28, 1998New features in PATROL 3.01 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol

August 28, 1998 New features in PATROL 3.0 3

Patrol resources control

Obtaining information onprocesses and daemons (ps)

file systems (df)

file sizes (ls)

services and ports (netstat)

hosts (uptime)

return codes or timing (timeout) of arbitrary commands

Resource checks are based on value and change of value(as compared to last run of patrol)

Tests on limits (value, value+delta, (val1, val2, val3+delta)

with relops (>, <, =, !=, =~/regexp/)

Page 4: August 28, 1998New features in PATROL 3.01 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol

August 28, 1998 New features in PATROL 3.0 4

Patrol actions

If tests fulfil specified criteria, perform actions:mail (to users, to admins)

kill (processes)

nice (processes)

restart (daemons)

write (to syslog)

execute system commands

execute snippets of perl code (access to patrol internal variables)

Mail texts, system commands and perl snippets are defined in blocks for easier reference

Page 5: August 28, 1998New features in PATROL 3.01 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol

August 28, 1998 New features in PATROL 3.0 5

Patrol configuration patrol actions are defined as rules in a configuration file rules do act on targets (identified by hostname, ostype, netgroup, ...)

rule format: rule_type target resource condidion action

rule types:

F file system HL host, load limit

D daemon HT host, uptime limit

SP system port HU host, user limit

PC process, cpu limit CC command, exit code

PM process, memory limit CT command, time out

PT process, time limit W file size

PN process, number limit SP service port

Page 6: August 28, 1998New features in PATROL 3.01 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol

August 28, 1998 New features in PATROL 3.0 6

Configuration examples

restart sshd and notify admins by email

D * sshd restart(“sshd”), mail(admin, MD) renice some jobs on IRIX systems (not Codine batch jobs)

PC [irix] !{cod_} >50% nice(8), mail($user, admin, MPC) watch the /usr1 file system on host hydra

F hydra /usr1 >95% mail(admin, MF1) notify admins, if load on aisa machines is above 2

HL aisa[0-9] >2 mail(admin, MHL) notify admins, if /etc/check has nonzero return code (netgroup hps)

CC (hps) /etc/check >0 mail(admin, MCC)

Page 7: August 28, 1998New features in PATROL 3.01 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol

August 28, 1998 New features in PATROL 3.0 7

WWW Interface

patrol runs periodically (cron) on all hosts to be checked, no communication between hosts, no central information retrieval

WWW interface runs periodically on a single host (WWW server) gathers information on all hosts over a (configurable) period of time consists of a perl script (part of patrol) and Javascript HTML files

(generated by patrol) provides both global view of the system, information on (configurable)

subgroups and on individual hosts can also process and display data from other (monitoring) tools

see screen dump of our system in routine use

Page 8: August 28, 1998New features in PATROL 3.01 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol

August 28, 1998 New features in PATROL 3.0 8

Patrol usage at DESY Zeuthen

presently approximately 100 hosts controlled by patrol patrol started by cron every 15 minutes

Controlling tasks on all hosts Load monitoring Execution time monitoring of user processes (except batch) Presence of important daemons (cron, xntp, syslog, afs, batch, …)

Tasks on selected hosts (usually servers) File system usage (/, /usr, /tmp, /home, …) presence of daemons (named, sendmail, …)

Depending on the problem appropriate actions are taken

(mail, restart, log, …)

Observed increased stability of services for users