multicast troubleshooting with ssmping and asmping stig venaas [email protected]
TRANSCRIPT
Introduction
There are very few tools available for end users to verify that multicast works
Also there are few tools to aid network administrators in debugging multicast connectivity problems
We will present two tools called ssmping and asmping that can be of use to both end users and network administrators
We will explain how these tools can be used to test multicast connectivity, and provide much more detailed information that might aid in discovering, debugging and fixing multicast problems
We will first look at ssmping for testing SSM SSM – Source Specific Multicast
And next look at asmping for testing ASM ASM – Any Source Multicast (the traditional multicast model)
ssmping
A tool for testing multicast connectivity and more Behaviour is a bit like the common ping tool Implemented at application layer using UDP
No additional requirements on the operating system The operating system and network must support SSM
A server must run ssmpingd A client pings a server by sending a unicast ssmping
query The server replies with both unicast and multicast
ssmping replies In this way a client can check that it receives SSM
from the server And also parameters like delay, number of router hops etc.
How it worksClient Server
User runsssmping <S>
Client joins S,G
Clients sendsunicast to S
Server receives unicast ssmping query
Responds with ssmpingunicast reply andmulticast reply to G
Client receivesreplies andprints RTT andhops fromserver
Client sends anew query everysecond
t t
Example output
$ ssmping -c 5 -4 flo.nrc.cassmping joined (S,G) = (132.246.2.20,232.43.211.234)pinging S from 158.38.63.20 unicast from 132.246.2.20, seq=1 dist=13 time=122.098 ms unicast from 132.246.2.20, seq=2 dist=13 time=122.314 msmulticast from 132.246.2.20, seq=2 dist=13 time=125.061 ms unicast from 132.246.2.20, seq=3 dist=13 time=122.327 msmulticast from 132.246.2.20, seq=3 dist=13 time=122.345 ms unicast from 132.246.2.20, seq=4 dist=13 time=122.334 msmulticast from 132.246.2.20, seq=4 dist=13 time=122.371 ms unicast from 132.246.2.20, seq=5 dist=13 time=122.360 msmulticast from 132.246.2.20, seq=5 dist=13 time=122.384 ms
--- 132.246.2.20 ssmping statistics ---5 packets transmitted, time 5003 msunicast: 5 packets received, 0% packet loss rtt min/avg/max/std-dev = 122.098/122.286/122.360/0.394 msmulticast: 4 packets received, 0% packet loss since first mc packet (seq 2)
recvd rtt min/avg/max/std-dev = 122.345/123.040/125.061/1.192 ms
What does the output tell us?
13 unicast hops from source, also 13 for multicast Multicast is likely to follow same path as multicast Note that with SSM we immediately join shortest path tree
Multicast RTTs are slightly larger and vary more The difference in unicast and multicast RTT shows one way
difference for unicast and multicast replies, since they are replies to the same request packet
The multicast tree is not ready for first multicast reply, ok for 2nd
No unicast loss, no multicast loss after tree established
Is it useful?
We believe it complements multicast beacons Useful for “end users” or others that want to perform a
“one-shot” test rather than continuously running a beacon
Beacons don’t show how long it takes to establish the multicast tree, the show only the “steady state” We’ve seen cases where it takes much longer than expected
Neither do they compare unicast and multicast Are there other data than RTT and hops that should
be measured? Hops are measured by always using a ttl/hop count of 64
when sending replies
History
Based on an idea by Pavan Namburi, Kamil Sarac University of Texas and Kevin C. Almeroth UCSB http://www.utdallas.edu/~ksarac/research/publications/
draft-sarac-mping-00.txt http://www.utdallas.edu/~ksarac/research/publications/
CIIT04-1.pdf Their idea is to extend IGMP/MLD
Not much interest in IETF so far This does some of the same, but doesn’t require
network support Only uses UDP But it requires server to run ssmpingd
Summary
Tool and further documentation available from http://www.venaas.no/multicast/ssmping/
You can deploy your own server, or check that you can receive from the public servers listed at the above URL
Supports both IPv4 and IPv6 Currently it works for Linux, Solaris and some BSD systems
Note that ssmping client requires SSM support BSD systems need to be patched to support SSM No SSM support yet on Mac OS X
Version with some reduced functionality available for Windows XP IPv4 only, no IPv6 SSM available on XP Does not show number of hops
There is also asmping
asmping is very similar to ssmping asmping is ASM version of ssmping A tool for testing multicast connectivity Behavior is a bit like normal ping A server must run ssmpingd (latest version supports asmping) A client pings a server by sending a unicast asmping query The server replies with both unicast and multicast asmping replies In this way a client can check that it can receive ASM from the
server And also parameters like delay, number of router hops etc.
How it worksClient Server
User runsasmping <G> <S>
Client joins G
Clients sendsunicast to S
Server receives unicast asmping query
Responds with asmpingunicast reply andmulticast reply to G
Client receivesreplies andprints RTT andhops fromserver
Client sends anew query everysecond
t t
Example output
$ asmping -c 5 224.1.2.234 ssmping.net.switch.chssmping joined (S,G) = (130.59.35.130,224.1.2.234)pinging S from 158.38.63.22 unicast from 130.59.35.130, seq=1 dist=11 time=66.203 ms unicast from 130.59.35.130, seq=2 dist=11 time=66.042 msmulticast from 130.59.35.130, seq=2 dist=11 time=66.492 ms unicast from 130.59.35.130, seq=3 dist=11 time=66.515 msmulticast from 130.59.35.130, seq=3 dist=11 time=66.520 ms unicast from 130.59.35.130, seq=4 dist=11 time=66.316 msmulticast from 130.59.35.130, seq=4 dist=11 time=66.321 ms unicast from 130.59.35.130, seq=5 dist=11 time=66.407 msmulticast from 130.59.35.130, seq=5 dist=11 time=66.956 ms
--- 130.59.35.130 ssmping statistics ---5 packets transmitted, time 5000 msunicast: 5 packets received, 0% packet loss rtt min/avg/max/std-dev = 66.042/66.296/66.515/0.326 msmulticast: 4 packets received, 0% packet loss since first mc packet (seq 2) recvd rtt min/avg/max/std-dev = 66.321/66.572/66.956/0.296 ms
What may the output tell us?
The output on the previous slide pretty much tells us the same as in the ssmping example
It appears we got all the multicast packets on the shortest path tree At least all multicast packets have travelled same number of
hops as the unicast packets However, there are several cases where packets may
not be forwarded on the optimal path, and asmping might reveal this
We will now look at the possible ASM forwarding paths when using PIM-SM which is the most common multicast routing protocol
Forwarding via PIM registers
B
E
F
D
C
Source
RP Assume we have the above network with receiver R and source S
R might be running asmping client, and S a server Initially the multicast packets are encapsulated into PIM register
messages and sent to the RP (B) The PIM register is a unicast packet sent from the source’s DR to the RP
From the RP the packets will be forwarded on the so-called shared tree to R
The hop count seen by R will be 5 Routers D, B, C, E and F
Note that when doing inter-domain IPv4 multicast we also use MSDP We can think of MSDP as a way of doing PIM registers from source to all
RPs on the internet. MSDP may do encapsulation just like PIM registers
A
S
R
Native forwarding on shared tree
B
E
F
D
C
Source
RP When the RP receives packets via PIM registers and know that there
are members joined to the group, the RP will join towards the source and we will get native forwarding
Here the hop count seen by R will be 6 Routers D, A, B, C, E and F
A
S
R
Shortest Path Tree
B
E
F
D
C
Source
RP When F receives the first multicast packet, it will usually join the
Shortest Path Tree to get optimal forwarding Note that behaviour might depend on configuration and vendor
We then get the above optimal path Here the hop count seen by R will be 3
Routers D, E and F
So with this topology, the hop count displayed by asmping will tell us which path we are following and when we switch between them 5, 6 and 3 hops for the resp. paths
A
S
R
More interesting example
sv@xiang /tmp $ asmping 224.3.4.234 ssmping.uninett.nossmping joined (S,G) = (158.38.63.22,224.3.4.234)pinging S from 152.78.64.13 unicast from 158.38.63.22, seq=1 dist=23 time=57.261 ms unicast from 158.38.63.22, seq=2 dist=23 time=56.032 msmulticast from 158.38.63.22, seq=2 dist=7 time=207.876 msmulticast from 158.38.63.22, seq=2 dist=7 time=208.567 ms (DUP!) unicast from 158.38.63.22, seq=3 dist=23 time=56.852 msmulticast from 158.38.63.22, seq=3 dist=21 time=70.352 msmulticast from 158.38.63.22, seq=4 dist=21 time=57.208 ms unicast from 158.38.63.22, seq=4 dist=23 time=57.910 ms unicast from 158.38.63.22, seq=5 dist=23 time=56.206 msmulticast from 158.38.63.22, seq=5 dist=21 time=57.375 ms
What does output tell us?
23 unicast hops from source For multicast we first have 7, later stays at 21 We also get some info about loss and RTTs
Number of hops is perhaps the most interesting In the stable situation with 21 there is probably native forwarding all
the way Forwarding is probably on shortest path tree from source to receiver In theory we might also detect switch from RPT to SPT if number of
hops are different Number of hops on RPT are then probably larger than number of
unicast hops But why only 7 hops initially? Why did we get the packet twice, and why do both have a very large
ttl?
The mystery of the 7 hops
So why only 7, and why large RTT and duplicate? The reason is MSDP and PIM registers In this case we know for sure that packets must have
been encapsulated in MSDP We are about 5 hops from the local RP There should not be any other tunnels, so no other way we
can have only 7 hops (23 unicast hops) Assuming there are no other listeners, the packet was
first encapsulated in a PIM register going to the local RP. Then it was encapsulated in MSDP all the way to our local RP
So that explains the number of hops, but why large RTT and duplicate?
Large RTT and duplicate
So why the large RTT and duplicate? The reason is again MSDP
At least that’s the theory The register packets are in this case passed with MSDP to
the RP, and probably cached there When our (*,G)-joins reach our RP, we receive the packet
from the cache, which by now is 200ms old Next, our last-hop router switches to SPT, and what we
think happens, is that the (S,G)-join also reaches the RP (RP is on the SPT path), and the RP again forwards its copy when this happens
This also gives us a measure of how long it took before the RPT to SPT switch took place
Troubleshooting 1/2
If it takes a long time from we join until we receive data or we stop receiving after a while, then something is wrong
Initial delay might be due to join packets getting lost, or join and MSDP propagation delays at routers
We will now give further examples of how the tools have been used for troubleshooting
Case 1, multicast working initially, then stopping after about 180s This was an IGMP issue Host sends IGMP report when start pinging If host doesn’t receive the periodic IGMP queries from
the router, the router state will time out
Troubleshooting 2/2
Case 2, multicast working initially, then stopping after about 210s We observed this for several different groups and
sources We then found from PIM-SM specification that this is
when the initial join state times out if the periodic PIM joins don’t refresh it
Knowing this, we tracked down the problem to an MTU mismatch
When PIM routers send periodic joins they may aggregate many joins, resulting in large PIM packets For the initial join there is usually no aggregation, so packet much
smaller than MTUs
Summary
The tools might be used to simply check connectivity, but also get info like RTT and hops, join latency etc
With asmping we might be also able to derive some more interesting information due to the complexities of PIM register, MSDP and switching of forwarding paths
asmping works on most operating systems Both tools support both IPv4 and IPv6 See http://www.venaas.no/multicast/ssmping/ for more
info