nagios conference 2012 - andreas ericsson - merlin

Download Nagios Conference 2012 - Andreas Ericsson - Merlin

If you can't read please download the document

Upload: nagios

Post on 16-Apr-2017

1.951 views

Category:

Technology


1 download

TRANSCRIPT

MerlinRedundancy for Nagios

op5

About op5 and op5 Monitor

Need for distributed monitoring

Concept and implementation

Examples and scenarios

Customer cases

More Info

Agenda

If changes are made in the top headline, the Agenda have to Manuel be changed.

About op5

7 Consecutive Years of Growth

10+ New International Business Partners per year

700+ Customers in 6 years and growing

98% Renewal of subscription

100% Reference Accounts

op5 Monitor

Reports & Trend analysisService levels

Availability

Easy configuration

Data status visualization

SLA reports, maps, etc

Monitor virtual, cloud and outsourced services

Distributed and load-balanced setups

Automated intelligent alarms and action on events

op5 Monitor

Infrastructure

Delivers

Storage

Application

Network

Status

VM's

Need For Distributed Monitoring

Need For Distributed Monitoring

24x7 SLA for Mission-Critical Services

Availability and SLA Reporting

Reliability

Need For Distributed Monitoring

PerformanceHigh Number of Active System checks

Limitations of the Operating System

Growth

Need For Distributed Monitoring

Distributed MonitoringLocations

Combinations of Network

IP Address Conflicts

Security

Why Merlin?

Other solutions for redundancy are clunky at best

Redundancy is important

Load-balancing and automatic fail-over

Network functionality (as its users see it) is hardly ever measurable from a single place

In this page there is room for custom changes to fit your specific needs with the presentation.

Explanation for each point:

Why Scale - Referring to the common problems, issues regarding scaling monitoring.

Keeping up with changes in business and IT - The constant changes in the IT environment and in business, were the ability to respond rapidly to growth (scale) is a success factor. Keeping even pace whit the ever changing environments creates new possibilities for companys

Scale out, not up When scaling a system it is common to start by stuffing as much as possible into existing server, at the cost of a slower system. Share the workload between servers to ensure no one is under or over utilized. Preventing failures

Load-balancing Se above

Distributed Monitoring over several location When monitoring several different locations a distributed setup the best option, handling the load. A distributed setups is ideal when when there is a need for a wide geographical coverage and when you need a resilient design.

How we monitor cloud applications Room for personal changes

Several NOCs -

Monitoring from uses perspective, GUI, easy-to understand, easy to use. etc

Key Features

Distributed monitoring Geographical coverage

Our solution for a scalable monitoring refers to an: Easy to use system

Capable of constantly changing to fit the needs of your business

Give stability and performance

Load balancing Share the workload

Redundancy Ensure availability

Performance Handle larger networks

op5 3 Key features for a scalable monitoring solution.

1.Redundancy ensure availability of the IT network2. Load balancing Share the workload3. Distributed Monitoring . Geographical coverage

Contextop5 Monitor introduces a flexible affordable system that can scale to the needs of your business and adapt to the ever-changing challenges of the IT environment, regardless if it consists of small business critical IT to large enterprise monitoring needs with tens of thousands of services. By our standards a scalable Monitoring Solution refers to an easy to use system that is capable of constantly changing to fit the needs of your business without sacrificing stability or performance.

Concept and Setups

op5 Merlin Open Source Project

Merlin - Module for Effortless Redundancy and Load balancing in Nagios

For setting up distributed Nagios installations

Brief project info

Started 2006 as a prototype for a huge installation

First used as redundancy engine 2009

Used in production at +800 installations

Largest production installation has 3 masters and 14 pollers

v2.0.0 (with Nagios 4 support) to be released officially next week

Current bleeding edge is v2.0.0-beta2-p10

In this page there is room for custom changes to fit your specific needs with the presentation.

Explanation for each point:

Why Scale - Referring to the common problems, issues regarding scaling monitoring.

Keeping up with changes in business and IT - The constant changes in the IT environment and in business, were the ability to respond rapidly to growth (scale) is a success factor. Keeping even pace whit the ever changing environments creates new possibilities for companys

Scale out, not up When scaling a system it is common to start by stuffing as much as possible into existing server, at the cost of a slower system. Share the workload between servers to ensure no one is under or over utilized. Preventing failures

Load-balancing Se above

Distributed Monitoring over several location When monitoring several different locations a distributed setup the best option, handling the load. A distributed setups is ideal when when there is a need for a wide geographical coverage and when you need a resilient design.

How we monitor cloud applications Room for personal changes

Several NOCs -

Monitoring from uses perspective, GUI, easy-to understand, easy to use. etc

Key design concepts

Peer loadbalancing is 100% transparent

Pollers take care of one or more hostgroups

Pollers can be (and often are) peered

Binary protocol for extreme performance32-bit and 64-bit machines can't play together :-/

Object config of two peers must be identical

Pollers must never know about objects they're not responsible for

In this page there is room for custom changes to fit your specific needs with the presentation.

Explanation for each point:

Why Scale - Referring to the common problems, issues regarding scaling monitoring.

Keeping up with changes in business and IT - The constant changes in the IT environment and in business, were the ability to respond rapidly to growth (scale) is a success factor. Keeping even pace whit the ever changing environments creates new possibilities for companys

Scale out, not up When scaling a system it is common to start by stuffing as much as possible into existing server, at the cost of a slower system. Share the workload between servers to ensure no one is under or over utilized. Preventing failures

Load-balancing Se above

Distributed Monitoring over several location When monitoring several different locations a distributed setup the best option, handling the load. A distributed setups is ideal when when there is a need for a wide geographical coverage and when you need a resilient design.

How we monitor cloud applications Room for personal changes

Several NOCs -

Monitoring from uses perspective, GUI, easy-to understand, easy to use. etc

Merlin System Design

Config File

Merlin Daemon

Merlin Module

Database

Backlog

Socket

Backlog

dbi

Backlog

Merlin System Design

Config file

Backlog

Socket

Merlin Module

Merlin Daemon

dbi

Backlog

Database

Database

Backlog

Merlin Daemon

dbi

Config file

Backlog

Socket

Merlin Module

Peer + Peer System Design

Peered Setup

Scalability / High Availability


The backend allows a variety of high availability setups and allows almost infinite scalability by adding more "peers"

Config

Check results

Poll/check

Peer

Peer

Monitored objects

Peer

Master/Poller Setup

PollerCloud Poller

Poller

Poller

Master

Monitored objects

Monitored objects

Monitored objects

Poller

Remote modules allow the monitoring of individual services and devices using a dedicated, but centrally managed monitoring systemRemote Modules

Config

Check results

Poll/check

Combined Setup

Master/Peer

Monitored objects

Monitored objects

Monitored objects

Peer

PollerPoller

Peer

PollerPoller

PollerPoller

Peer

Configuration And Management

Configuration And Management

Merlin automatically distributes object configSplit and Push Config-in master / poller configurations

Straight-up sync for peers when needed

Early Adopters

Mogul Services ABHosts critical services for operators, call-centers banking, online media and emergency broadcast channels

Very early implementation (beta-stage in POC-deal)

Quite complex setup (peered masters, multiple pollers)

Very high availability demands

Examples And Scenarios

Performance

Add more if needed to scale out performance monitoring rather than to scale up on hardware

Growth of the monitoring system with the requirements of the company

Peer

Peer

Peer

Peer

Peer

Peer

Reliability

Peer

Peer

Peer

Peer

Peer

Peer

Through dynamic distribution of service checks the individual nodes are peered. This setup also provides redundancy

Security

Master

Monitored objects

Monitored objects

Monitored objects

DMZ

Customer Network

Secure Network

Safety zones in the network

Monitoring as a Service

DMZ

Branch offices with one-way availability

PollerPoller

PollerPoller

PollerPoller

Cloud Monitoring

Cloud Poller

Master

Monitored objects

Monitored objects

DMZ

Monitoring of publicly available services

Services outside their own network monitor

PollerPoller

Large Organisation Setup

Customer case study: Merlin Ahoy!

CompanySince late 1959, the Viking Line ships sail daily from Finland to Sweden. The shipping company Viking Line Abp based in Mariehamn, the capital of the autonomous land Islands in Finland. Challenge:Unreliable network uplinks to the core system, the change between cable connections, wireless and satellite networks, depending on the location of the ship make it difficult to monitor all on-board IP services such as IPTV, VoIP, WiFi hotspot and infotainment. Solution:On each ship an op5 Monitor instance was installed. It allows distributed monitoring of Viking Line, monitoring all services on all vessels and provides centrally managed monitoring.

There have been improvements in functionalitywhen we communicate via satellite links. It is easier then before to have the server onboard to communicate with our main serverson shore

Jonas Lindroos, IT department at Viking Line

Questions?

http://git.op5.org

http://www.op5.org

Klicka fr att redigera rubriktextens

Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivnNionde dispositionsnivn

Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivn

Nionde dispositionsnivnKlicka hr fr att ndra format p bakgrundstexten

Niv tv

Niv tre

Niv fyra

Niv fem

Klicka fr att redigera rubriktextens formatKlicka hr fr att ndra format

28/09/12

Klicka fr att redigera rubriktextens

Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivnNionde dispositionsnivn

Klicka fr att redigera rubriktextens

Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivnNionde dispositionsnivn

Klicka fr att redigera rubriktextens

Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivnNionde dispositionsnivn