netways nagios conference 2008 · netways nagios conference 2008 ... network troubleshooting...
Post on 04-Jun-2018
227 Views
Preview:
TRANSCRIPT
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 1
NETWAYS Nagios Conference 2008
Using Nagios for service monitoring in GSM-based T-Mobile networks
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 2F. Maerz / C. Hirsch
Using Nagios for service monitoring in GSM-based
networks at T-Mobile
Introducing Network, Service and Host Management forIntroducing Network, Service and Host Management forIntroducing Network, Service and Host Management forIntroducing Network, Service and Host Management for
TTTT----Mobile European Service Operation Centre International RoamingMobile European Service Operation Centre International RoamingMobile European Service Operation Centre International RoamingMobile European Service Operation Centre International Roaming
Frank März
frank.maerz@t-mobile.de
Christian Hirsch
christian.hirsch@t-mobile.de
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 3F. Maerz / C. Hirsch
T-Mobile ESOC IR
European Service Operation Centre for International Roaming
� Started 1993 when International Roaming was introduced together with Italy
� Today managing roaming services for
� T-Mobile Deutschland
� T-Mobile Austria
� T-Mobile UK
� T-Mobile Netherlands
� T-Mobile Czech Rep.
and supporting T-Mobile national companies in Poland,
Slovakia, Croatia, USA, Hungary
� Core team (17) based in Nuremberg, Germany
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 4F. Maerz / C. Hirsch
T-Mobile ESOC IR
Tasks
� IREG (GSM Association IR Expert Group) IREG (GSM Association IR Expert Group) IREG (GSM Association IR Expert Group) IREG (GSM Association IR Expert Group) � Testing new roaming partners for any type of service
� Voice roaming, prepaid roaming, data roaming, WLAN, MMS interworking
� Network troubleshooting
� Roaming EngineeringRoaming EngineeringRoaming EngineeringRoaming Engineering� Introducing new roaming and inter-working services
� Active network testing
� Network monitoring
� Service Interface Desk Service Interface Desk Service Interface Desk Service Interface Desk � Interface desk for roaming partner and carriers
� Technical support for customer care
� SIM Card management
� Reporting
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 5F. Maerz / C. Hirsch
The most international Nagios implementation
T-Mobile uses 3 Nagios installations to monitor
205 countries in the world
530 foreign networks
every 5 minutes !
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 6F. Maerz / C. Hirsch
T-Mobile ESOC IR service monitoring philosophy
� Layer 1 Connectivity (NAGIOS)
� Between the IP core networks for all packet service roaming partners
� Between all CS (voice) roaming partners
� Towards all used equipment
� Layer 2 Performance (partly NAGIOS)
� Service confirmation
� Performance data capturing
� Layer 3 Verification
� Performance data analysis
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 7F. Maerz / C. Hirsch
Layer 1 - Connectivity
These active tests are executed in short intervals, an outage caThese active tests are executed in short intervals, an outage caThese active tests are executed in short intervals, an outage caThese active tests are executed in short intervals, an outage can be recognized immediately.n be recognized immediately.n be recognized immediately.n be recognized immediately.
� On controlled environment (10%):
� by standard network management tools (e.g. PING)
� On uncontrolled environment (90%):
� by simulated “user” traffic (e.g. SMTP-Mail-From)
� by simulated “control” traffic (e.g. GTP-Echo)
Ensuring connectivity for service availability:
““““Connectivity is the basis for every IT serviceConnectivity is the basis for every IT serviceConnectivity is the basis for every IT serviceConnectivity is the basis for every IT service””””
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 8F. Maerz / C. Hirsch
Layer 2 - Performance monitoring
� Checking system function� Does the system provide the service it offers? (a DNS server response to a DNS request)
� Requesting status information� Utilizes network management protocols to gather status information (load, temperature,
disk usage)
� Using real user data traffic� Capture user traffic and check if it’s correct (protocol analyzer)
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 9F. Maerz / C. Hirsch
Layer 3 - Traffic Verification
� Compare performance results over a period of time� Different values may indicate a load or bottleneck issue (e.g. compare Round Trip Time
values)
� Look at complete call details for a single user� Filter for a single user connection in order to find problems on the bit level
� Run statistic analysis on captured network traffic� Utilize captured user data for statistic analysis in order to measure success rates and
performance (e.g. Create PDP Context Reply Rate)
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 10F. Maerz / C. Hirsch
History of Nagios at T-Mobile ESOC IR
2003200320032003
T-Mobile ESOC IR started testing Nagios in with DNS and SNMP checks
2004200420042004GTP (GPRS Tunnel Protocol) plugin for Nagios allowed us to simulate a GSM core node (SGSN)
2005200520052005Support contract with Netways
Introduced Nagios Grapher
Including server monitoring
NRPE design / start of rollout to other T-Mobile networks
2006200620062006Integrated gateway into SS7 network together with Telesoft Technologies (UK)
KPI performance monitoring reporting
2007200720072007International rollout for SS7 gateways
2008200820082008Nagios 3 on virtual XEN environment
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 11F. Maerz / C. Hirsch
Nagios – the perfect match for connectivity checks
� Connectivity check
� Retrieving network data
� This requires a solution which is capable of making:� Connectivity check
� Retrieve network data
� Schedule these tasks
� Present the results and forward performance data to other systems
� Send alarms to external systems
� Very powerful
� Extremely flexible
� It may be complex to manage and likely very expensive.
} � Active checksActive checksActive checksActive checks
Not withNot withNot withNot with
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 12F. Maerz / C. Hirsch
NRPEs
Nagios T-Mobile for IP (Nagios Master)
NRPEs in local T-Mobile backbone networks
Nagios TMO IP(nagios-master)
� IP connectivity monitoring for GPRS / 3G
� Checking MMS Inter-working (SMTP dialogs towards MMS Centers)
� WLAN Roaming (Radius authentication)
� Central Nagios Server with access to NRPEs in IP core networks in Germany, UK,
Netherlands, Austria
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 13F. Maerz / C. Hirsch
Nagios TMD
System Monitoring with Nagios
� quite normal system health checks like:
� hardware health
� ping
� load
� ssh
� disk space
� services
� …
� performance / capacity monitoring:
� router traffic
� RTTs
� route availability
� …
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 14F. Maerz / C. Hirsch
Nagios T-Mobile SS7
Connectivity checks for voice roaming
� Central Nagios Server triggers MAP dialogs on Telesoft Technologies application
server which runs NRPE
NAGIOS
SS7
� The application server opens the MAP dialog in the local T-Mobile network
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 15F. Maerz / C. Hirsch
Summary of all used Nagios service checks for GSM
networks
� Nagios checks “everything” every 5 minutes, over 250.000.000 checks a year
� Connectivity check for GSM networks� Packet roaming – „GTP Echo“
� MMS Interworking – SMTP Dialog“
� CS Roaming – „MAP dialogs“
� WLAN Roaming - Radius authentication
� Performance� BGP routes to roaming partners
� BGP peers status to neighbors
� Interface status for physical links
� Link usage
� ftp/sftp connections
� Serverload, user, temperature, disk usage, raid status, power supply, fans, zombie, processes
� Running process
� Log-In (ssh, telnet)
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 16F. Maerz / C. Hirsch
Technical Realization
Christian Hirsch
PART 2
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 17F. Maerz / C. Hirsch
Technical Realization
Special Plugin Design
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 18F. Maerz / C. Hirsch
GPRS / 3G Roaming network environment
MNO B
PeeringExchange
GRX1 GRX2
GGSNCPELocaltail
T-Mobile
IP
BorderGateway
BG
Nagios NRPE DNS DNS
IPnetwork
IPnetwork
� It uses the DNS protocol to resolve the APN (access point name) for IR partners
� The DNS responds with return of the IP from the home GGSN for the roaming partner
� The NRPE sends a GTP Echo towards the GGSN IP address
� If the GGSN responds the connectivity is OK
DNS.req
DNS.reqDNS.res
DNS.res
GTP-Echo.req
GTP-Echo.res
� RTT is displayed in Nagios Grapher, RTT indicates backbone bottlenecks
HowHowHowHow check_ggsncheck_ggsncheck_ggsncheck_ggsn worksworksworksworks::::
� Nagios acts like a GSM network node (SGSN)
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 19F. Maerz / C. Hirsch
MNO B
SS7Carrier
SS7Carrier
MNO A
SS7 SS7
Voice roaming network environment
NAGIOS
SIGNALLINGGATEWAY
� This allows Nagios to simulate GSM functions like register to a network, initial calls or SMS
� The gateway was designed by T-Mobile and Telesoft Technologies
� NAGIOS interacts with a SS7 gateway which “speaks” GSM MAP (3GPP 29.002)
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 20F. Maerz / C. Hirsch
Technical Realization
Nagios 3 on virtual XEN environment
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 21F. Maerz / C. Hirsch
Nagios 3 on virtual XEN environment
� reduced hardware costs
� High Availability
� minimize downtimes during scheduled maintenance
� easy backups
� reduced power consumption and need for cooling (GREEN IT)(GREEN IT)(GREEN IT)(GREEN IT)
nagios-tmd
nagios-master
nagios-ss7
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 22F. Maerz / C. Hirsch
Physical Node Design
HP ProLiant DL380 G5
XEN Dom0.0
Partition – Table:
/dev/cciss/c0d0p1 100MB /boot
/dev/cciss/c0d0p2 48GB /
/dev/cciss/c0d0p3 16GB swap
/dev/cciss/c0d0p4 extended
/dev/cciss/c0d0p5 618,76GB LVM
eth0eth1eth2eth3eth4eth5
XEN Dom0.1
Partition – Table:
/dev/cciss/c0d0p1 100MB /boot
/dev/cciss/c0d0p2 48GB /
/dev/cciss/c0d0p3 16GB swap
/dev/cciss/c0d0p4 extended
/dev/cciss/c0d0p5 618,76GB LVM
CPUs 2x Intel Xeon
5160 Dual Core, 3.0 GHz
RAM 8 GB SDRAM
NICs:
HD 6 * 146 GB
HP ProLiant DL380 G5
eth0eth1eth2eth3eth4eth5
CPUs 2x Intel Xeon
5160 Dual Core, 3.0 GHz
RAM 8 GB SDRAM
NICs:
HD 6 * 146 GB
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 23F. Maerz / C. Hirsch
Physical
Volumes/dev/cciss/c0d0p5
(618,76 GB)
/dev/cciss/c0d0p5
(618,76 GB)
Virtual Disk Design
devel
LV
nagios-tmd
LV
nagios-master
LV nagios-ss7
LV
devel
LV
nagios-tmd
LV LV nagios-ss7
LV
Logical
Volumes
nagios-master
m4nxhpsrm
121 m
4nxhpsrm
122
Crosslink
eth0 eth1 eth0eth1
Physical
Devices
eth2 eth2
Bond 0Bond 0
VLAN
drbd1drbd resource
drbd5drbd resource
drbd6drbd resource
drbd7drbd resource
Mirrored
Logical
Volumes
(DRBD-Resources)
Volume
Groups
Volume Group
/dev/VirtualDomains
Volume Group
/dev/VirtualDomains
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 24F. Maerz / C. Hirsch
Virtual Network Design
VLAN
m4nxhpsrm
121 m
4nxhpsrm
122
Crosslink
(DRBD sync / XEN live migration)
eth0 eth1
eth0
nagios-tmd nagios-master nagios-ss7
eth2eth2
devel
Xen
Bridge
Virtual
Layer
Physical
LayerBond 0
eth1
eth0 eth1
Bond 0
xenbr0 xenbr0virbr0 virbr0
eth0 eth1 eth0 eth1 eth0 eth1
11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 25F. Maerz / C. Hirsch
Nagios 3 on virtual XEN environment
Virtual Node Design
nagios-tmd
drbd5drbd resource
eth0eth1
NICs
HD (15GB)
RAM 1024 MB
CPUs 2
Partition – Table:
/dev/xvda1 100MB /boot
/dev/xvda2 2 GB swap
/dev/xvda3 13 GB /
nagios-master
drbd6drbd resource
eth0eth1
NICs
HD (15GB)
RAM 1024 MB
CPUs 2
Partition – Table:
/dev/xvda1 100MB /boot
/dev/xvda2 2 GB swap
/dev/xvda3 13 GB /
nagios-ss7
drbd7drbd resource
eth0eth1
NICs
HD (15GB)
RAM 1024 MB
CPUs 2
Partition – Table:
/dev/xvda1 100MB /boot
/dev/xvda2 2 GB swap
/dev/xvda3 13 GB /
top related