nasa eos active network performance testing using web100 andy germain swales aerospace 1 august 2002...
TRANSCRIPT
NASA EOSNASA EOSActive Network Performance TestingActive Network Performance Testing
Using Web100Using Web100
Andy Germain
Swales Aerospace
1 August 2002
301-902-4352
24 June 2002 Andy Germain
EOS Active Testing OverviewEOS Active Testing Overview End-to-end user level test
– Active testing, no visibility into network internals Communities
– EOS Internal Network: 9 Sites, 8 Sources, 13 Sinks "Production" Flows, dedicated bandwidth
– EOS Science Users: About 50 sites, tested from EOS DAACs "QA" and Science flows, often via Abilene
– CEOS: About 20 International sites Earth Observation data sharing
Purposes– Verify that networks as implemented meet SLA and/or requirements– Assess whether networks can support intended applications– Resolve user complaints: Network problems -- or elsewhere??– Determine bottlenecks -- seek routing alternatives– Provide a basis for allocation of additional resources
Results at http://corn.eos.nasa.gov/networks
24 June 2002 Andy Germain
Test Process Test Process Test script runs hourly to each site: Traceroute (1 way)
– Number of hops -- route stability Hops Chart Pings
– 100 pings prior to thruput test and/or 100/300 during– Round Trip Time RTT Chart– Packet Loss Packet Loss Chart
TCP Throughput – Iperf Thruput Chart– keeps send buffer full for 30 Seconds– Netstat packets retransmitted (if pings blocked)
24 June 2002 Andy Germain
EOS DAAC
NASA Nodes
SCFsQA Other
Key:
ORST
UCSB
ArizLANL
Wisc
Miami
SUNY-SB
BU
GSFC
LaRC
EDC
MSFC, NSSTC
NCAR
Mont
JPL
Toronto
Colo St.
Niagara
ASF
Chicago
Other Nodes
SLAC
NSIDC
NMEX
CCRS
UVA
UMD
GPNNGDC,NOAA
USF
RSS
EOS Performance Test SitesEOS Performance Test Sites
Texas
UCSD
Wash
Mich
NOAA
OhioPenn State
NCDC
MIT
24 June 2002 Andy Germain
EOSDIS MissionPartner
CEOSPI: QA/IST
EOS International Test SitesEOS International Test Sites
GSFC
CCRS
JPL NASDA (ADEOS, TRMM, Aura, Aqua)
CSIRO
ESRIN
INPE(Aqua),IDN
CONAE
IRE-RAS
Israel
ASF
NSIDC
EDC
LaRC
MITI (Terra)
CAO (SAGE III)
RAL, OXFORD (Aura)
Toronto(Terra)
UCL(Terra)
JRC
AIT,RFD,GISTDA
KNMI (Aura)
24 June 2002 Andy Germain
Uses of Web100Uses of Web100 One of our sources at GSFC runs Web100
– King = "GSFC MAX"– Connected to MAX by GigE
Typical use is in problem solving– DTB, Triage
Window size (easier to use than tcpdump) Vs. circuit limitations vs. packet loss
– Also ANLiperf Window size again Plan: extract packet drops from web100, not pings or
netstats
24 June 2002 Andy Germain
A recent caseA recent case Sending data from LaRC to JPL via a
project dedicated 20 mbps ATM VC.– Problem surfaced after firewall was installed
Portus "proxy" firewall
RTT of 60 ms requires 150 KB windows – To fill pipe with a single TCP stream
Iperf worked well – a single stream typically got over 15 mbps
But ftp got < 8 mbps
24 June 2002 Andy Germain
A recent case (2)A recent case (2) The problem, of course, was window size
– Looked like it was the ftp application, since iperf performance showed that O/S was OK
– But which end? Ran ftps from both nodes to web100 node
– Used DTB to capture window size– Problem: small disk quota FTPs were quick
FTP data session not established until ftp started So had to be quick to capture data with DTB
– DTB showed one site had 64 kb windows But problem was in O/S (IRIX), not ftp
– Tcp_recvspace and tcp_sendspace– Iperf can exceed O/S defaults!
24 June 2002 Andy Germain
Case #2Case #2 Another case of limited thruput
– This time iperf was limited – from one source to several destinations– Limit inverse to RTT window size– But source and dest clearly used large windows
Testing to Web100 box showed source was not using extended windows
TCPdump on source showed it was!
Problem turned out to be PIX firewall– Nop'd out the WSCALE field!
24 June 2002 Andy Germain
Case #3Case #3 Iperf from GSFC to Tokyo XP
– Via MAX, Abilene, Seattle, TransPac Thruput appears to ramp up linearly for
about 5 minutes (when no loss)– Then becomes window limited:
1 MB window @ 188 ms RTT 42.5 mbps– Repeatable (more or less)– Low or no packet loss
Web100 Triage usually reports 100% path limited– But can't show early part of session (?)
What causes this ramp-up ???
24 June 2002 Andy Germain
TracerouteTraceroute
traceroute to perf.jp.apan.net (203.181.248.44), 30 hops max, 38 byte packets 1 enpl-rtr1-ge (198.10.49.57) 0.427 ms 0.325 ms 0.396 ms 2 169.154.192.49 (169.154.192.49) 0.397 ms 0.375 ms 0.275 ms 3 169.154.192.2 (169.154.192.2) 0.740 ms 1.266 ms 1.225 ms 4 gsfc-wash.maxgigapop.net (206.196.177.13) 1.093 ms 1.169 ms 0.907 ms 5 dcne-so3-1-0.maxgigapop.net (206.196.178.45) 1.434 ms 1.621 ms 1.410 ms 6 abilene-wash-oc48.maxgigapop.net (206.196.177.2) 1.073 ms 1.439 ms 1.352 ms 7 nycm-wash.abilene.ucaid.edu (198.32.8.46) 5.436 ms 5.570 ms 5.680 ms 8 clev-nycm.abilene.ucaid.edu (198.32.8.29) 17.747 ms 17.954 ms 17.764 ms 9 ipls-clev.abilene.ucaid.edu (198.32.8.25) 24.006 ms 24.380 ms 24.072 ms10 kscy-ipls.abilene.ucaid.edu (198.32.8.5) 33.335 ms 33.263 ms 33.321 ms11 dnvr-kscy.abilene.ucaid.edu (198.32.8.13) 43.781 ms 43.977 ms 43.756 ms12 sttl-dnvr.abilene.ucaid.edu (198.32.8.49) 72.129 ms 72.286 ms 72.004 ms13 TRANSPAC-PWAVE.pnw-gigapop.net (198.32.170.46) 72.204 ms 72.404 ms 72.220 ms14 192.203.116.34 (192.203.116.34) 188.150 ms 188.216 ms 187.811 ms15 perf.jp.apan.net (203.181.248.44) 187.786 ms 188.103 ms 188.040 ms
24 June 2002 Andy Germain
Typical ramp upTypical ramp upClient connecting to perf.jp.apan.net, TCP port 5002TCP window size: 1000 KByte (WARNING: requested 500 KByte)------------------------------------------------------------[ 3] local 198.10.49.62 port 3623 connected with 203.181.248.44 port 5002[ ID] Interval Transfer Bandwidth[ 3] 0.0- 1.5 sec 808 KBytes 4.4 Mbits/sec[ 3] 1.5- 2.1 sec 856 KBytes 12.1 Mbits/sec[ 3] 2.1- 3.0 sec 1.4 MBytes 12.3 Mbits/sec[ 3] 3.0- 4.2 sec 1.7 MBytes 12.6 Mbits/sec
[ 3] 13.0-14.2 sec 2.0 MBytes 14.8 Mbits/sec[ 3] 14.2-15.1 sec 1.7 MBytes 15.2 Mbits/sec[ 3] 15.1-16.0 sec 1.7 MBytes 15.2 Mbits/sec
[ 3] 104.0-105.1 sec 2.8 MBytes 21.1 Mbits/sec[ 3] 105.1-106.0 sec 2.5 MBytes 22.4 Mbits/sec[ 3] 106.0-107.0 sec 2.8 MBytes 23.8 Mbits/sec