nagios conference 2012 - andreas ericsson - merlin
TRANSCRIPT
MerlinRedundancy for Nagios
op5
About op5 and op5 Monitor
Need for distributed monitoring
Concept and implementation
Examples and scenarios
Customer cases
More Info
Agenda
If changes are made in the top headline, the Agenda have to Manuel be changed.
About op5
7 Consecutive Years of Growth
10+ New International Business Partners per year
700+ Customers in 6 years and growing
98% Renewal of subscription
100% Reference Accounts
op5 Monitor
Reports & Trend analysisService levels
Availability
Easy configuration
Data status visualization
SLA reports, maps, etc
Monitor virtual, cloud and outsourced services
Distributed and load-balanced setups
Automated intelligent alarms and action on events
op5 Monitor
Infrastructure
Delivers
Storage
Application
Network
Status
VM's
Need For Distributed Monitoring
Need For Distributed Monitoring
24x7 SLA for Mission-Critical Services
Availability and SLA Reporting
Reliability
Need For Distributed Monitoring
PerformanceHigh Number of Active System checks
Limitations of the Operating System
Growth
Need For Distributed Monitoring
Distributed MonitoringLocations
Combinations of Network
IP Address Conflicts
Security
Why Merlin?
Other solutions for redundancy are clunky at best
Redundancy is important
Load-balancing and automatic fail-over
Network functionality (as its users see it) is hardly ever measurable from a single place
In this page there is room for custom changes to fit your specific needs with the presentation.
Explanation for each point:
Why Scale - Referring to the common problems, issues regarding scaling monitoring.
Keeping up with changes in business and IT - The constant changes in the IT environment and in business, were the ability to respond rapidly to growth (scale) is a success factor. Keeping even pace whit the ever changing environments creates new possibilities for companys
Scale out, not up When scaling a system it is common to start by stuffing as much as possible into existing server, at the cost of a slower system. Share the workload between servers to ensure no one is under or over utilized. Preventing failures
Load-balancing Se above
Distributed Monitoring over several location When monitoring several different locations a distributed setup the best option, handling the load. A distributed setups is ideal when when there is a need for a wide geographical coverage and when you need a resilient design.
How we monitor cloud applications Room for personal changes
Several NOCs -
Monitoring from uses perspective, GUI, easy-to understand, easy to use. etc
Key Features
Distributed monitoring Geographical coverage
Our solution for a scalable monitoring refers to an: Easy to use system
Capable of constantly changing to fit the needs of your business
Give stability and performance
Load balancing Share the workload
Redundancy Ensure availability
Performance Handle larger networks
op5 3 Key features for a scalable monitoring solution.
1.Redundancy ensure availability of the IT network2. Load balancing Share the workload3. Distributed Monitoring . Geographical coverage
Contextop5 Monitor introduces a flexible affordable system that can scale to the needs of your business and adapt to the ever-changing challenges of the IT environment, regardless if it consists of small business critical IT to large enterprise monitoring needs with tens of thousands of services. By our standards a scalable Monitoring Solution refers to an easy to use system that is capable of constantly changing to fit the needs of your business without sacrificing stability or performance.
Concept and Setups
op5 Merlin Open Source Project
Merlin - Module for Effortless Redundancy and Load balancing in Nagios
For setting up distributed Nagios installations
Brief project info
Started 2006 as a prototype for a huge installation
First used as redundancy engine 2009
Used in production at +800 installations
Largest production installation has 3 masters and 14 pollers
v2.0.0 (with Nagios 4 support) to be released officially next week
Current bleeding edge is v2.0.0-beta2-p10
In this page there is room for custom changes to fit your specific needs with the presentation.
Explanation for each point:
Why Scale - Referring to the common problems, issues regarding scaling monitoring.
Keeping up with changes in business and IT - The constant changes in the IT environment and in business, were the ability to respond rapidly to growth (scale) is a success factor. Keeping even pace whit the ever changing environments creates new possibilities for companys
Scale out, not up When scaling a system it is common to start by stuffing as much as possible into existing server, at the cost of a slower system. Share the workload between servers to ensure no one is under or over utilized. Preventing failures
Load-balancing Se above
Distributed Monitoring over several location When monitoring several different locations a distributed setup the best option, handling the load. A distributed setups is ideal when when there is a need for a wide geographical coverage and when you need a resilient design.
How we monitor cloud applications Room for personal changes
Several NOCs -
Monitoring from uses perspective, GUI, easy-to understand, easy to use. etc
Key design concepts
Peer loadbalancing is 100% transparent
Pollers take care of one or more hostgroups
Pollers can be (and often are) peered
Binary protocol for extreme performance32-bit and 64-bit machines can't play together :-/
Object config of two peers must be identical
Pollers must never know about objects they're not responsible for
In this page there is room for custom changes to fit your specific needs with the presentation.
Explanation for each point:
Why Scale - Referring to the common problems, issues regarding scaling monitoring.
Keeping up with changes in business and IT - The constant changes in the IT environment and in business, were the ability to respond rapidly to growth (scale) is a success factor. Keeping even pace whit the ever changing environments creates new possibilities for companys
Scale out, not up When scaling a system it is common to start by stuffing as much as possible into existing server, at the cost of a slower system. Share the workload between servers to ensure no one is under or over utilized. Preventing failures
Load-balancing Se above
Distributed Monitoring over several location When monitoring several different locations a distributed setup the best option, handling the load. A distributed setups is ideal when when there is a need for a wide geographical coverage and when you need a resilient design.
How we monitor cloud applications Room for personal changes
Several NOCs -
Monitoring from uses perspective, GUI, easy-to understand, easy to use. etc
Merlin System Design
Config File
Merlin Daemon
Merlin Module
Database
Backlog
Socket
Backlog
dbi
Backlog
Merlin System Design
Config file
Backlog
Socket
Merlin Module
Merlin Daemon
dbi
Backlog
Database
Database
Backlog
Merlin Daemon
dbi
Config file
Backlog
Socket
Merlin Module
Peer + Peer System Design
Peered Setup
Scalability / High Availability
The backend allows a variety of high availability setups and allows almost infinite scalability by adding more "peers"
Config
Check results
Poll/check
Peer
Peer
Monitored objects
Peer
Master/Poller Setup
PollerCloud Poller
Poller
Poller
Master
Monitored objects
Monitored objects
Monitored objects
Poller
Remote modules allow the monitoring of individual services and devices using a dedicated, but centrally managed monitoring systemRemote Modules
Config
Check results
Poll/check
Combined Setup
Master/Peer
Monitored objects
Monitored objects
Monitored objects
Peer
PollerPoller
Peer
PollerPoller
PollerPoller
Peer
Configuration And Management
Configuration And Management
Merlin automatically distributes object configSplit and Push Config-in master / poller configurations
Straight-up sync for peers when needed
Early Adopters
Mogul Services ABHosts critical services for operators, call-centers banking, online media and emergency broadcast channels
Very early implementation (beta-stage in POC-deal)
Quite complex setup (peered masters, multiple pollers)
Very high availability demands
Examples And Scenarios
Performance
Add more if needed to scale out performance monitoring rather than to scale up on hardware
Growth of the monitoring system with the requirements of the company
Peer
Peer
Peer
Peer
Peer
Peer
Reliability
Peer
Peer
Peer
Peer
Peer
Peer
Through dynamic distribution of service checks the individual nodes are peered. This setup also provides redundancy
Security
Master
Monitored objects
Monitored objects
Monitored objects
DMZ
Customer Network
Secure Network
Safety zones in the network
Monitoring as a Service
DMZ
Branch offices with one-way availability
PollerPoller
PollerPoller
PollerPoller
Cloud Monitoring
Cloud Poller
Master
Monitored objects
Monitored objects
DMZ
Monitoring of publicly available services
Services outside their own network monitor
PollerPoller
Large Organisation Setup
Customer case study: Merlin Ahoy!
CompanySince late 1959, the Viking Line ships sail daily from Finland to Sweden. The shipping company Viking Line Abp based in Mariehamn, the capital of the autonomous land Islands in Finland. Challenge:Unreliable network uplinks to the core system, the change between cable connections, wireless and satellite networks, depending on the location of the ship make it difficult to monitor all on-board IP services such as IPTV, VoIP, WiFi hotspot and infotainment. Solution:On each ship an op5 Monitor instance was installed. It allows distributed monitoring of Viking Line, monitoring all services on all vessels and provides centrally managed monitoring.
There have been improvements in functionalitywhen we communicate via satellite links. It is easier then before to have the server onboard to communicate with our main serverson shore
Jonas Lindroos, IT department at Viking Line
Questions?
http://git.op5.org
http://www.op5.org
Klicka fr att redigera rubriktextens
Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivnNionde dispositionsnivn
Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivn
Nionde dispositionsnivnKlicka hr fr att ndra format p bakgrundstexten
Niv tv
Niv tre
Niv fyra
Niv fem
Klicka fr att redigera rubriktextens formatKlicka hr fr att ndra format
28/09/12
Klicka fr att redigera rubriktextens
Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivnNionde dispositionsnivn
Klicka fr att redigera rubriktextens
Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivnNionde dispositionsnivn
Klicka fr att redigera rubriktextens
Klicka fr att redigera dispositionstextens formatAndra dispositionsnivnTredje dispositionsnivnFjrde dispositionsnivnFemte dispositionsnivnSjtte dispositionsnivnSjunde dispositionsnivnttonde dispositionsnivnNionde dispositionsnivn