nasa eos active network performance testing using web100 andy germain swales aerospace 1 august 2002...

12
NASA EOS NASA EOS Active Network Performance Active Network Performance Testing Testing Using Web100 Using Web100 Andy Germain Swales Aerospace 1 August 2002 [email protected] 301-902-4352

Upload: claire-lawson

Post on 27-Mar-2015

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

NASA EOSNASA EOSActive Network Performance TestingActive Network Performance Testing

Using Web100Using Web100

Andy Germain

Swales Aerospace

1 August 2002

[email protected]

301-902-4352

Page 2: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

EOS Active Testing OverviewEOS Active Testing Overview End-to-end user level test

– Active testing, no visibility into network internals Communities

– EOS Internal Network: 9 Sites, 8 Sources, 13 Sinks "Production" Flows, dedicated bandwidth

– EOS Science Users: About 50 sites, tested from EOS DAACs "QA" and Science flows, often via Abilene

– CEOS: About 20 International sites Earth Observation data sharing

Purposes– Verify that networks as implemented meet SLA and/or requirements– Assess whether networks can support intended applications– Resolve user complaints: Network problems -- or elsewhere??– Determine bottlenecks -- seek routing alternatives– Provide a basis for allocation of additional resources

Results at http://corn.eos.nasa.gov/networks

Page 3: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

Test Process Test Process Test script runs hourly to each site: Traceroute (1 way)

– Number of hops -- route stability Hops Chart Pings

– 100 pings prior to thruput test and/or 100/300 during– Round Trip Time RTT Chart– Packet Loss Packet Loss Chart

TCP Throughput – Iperf Thruput Chart– keeps send buffer full for 30 Seconds– Netstat packets retransmitted (if pings blocked)

Page 4: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

EOS DAAC

NASA Nodes

SCFsQA Other

Key:

ORST

UCSB

ArizLANL

Wisc

Miami

SUNY-SB

BU

GSFC

LaRC

EDC

MSFC, NSSTC

NCAR

Mont

JPL

Toronto

Colo St.

Niagara

ASF

Chicago

Other Nodes

SLAC

NSIDC

NMEX

CCRS

UVA

UMD

GPNNGDC,NOAA

USF

RSS

EOS Performance Test SitesEOS Performance Test Sites

Texas

UCSD

Wash

Mich

NOAA

OhioPenn State

NCDC

MIT

Page 5: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

EOSDIS MissionPartner

CEOSPI: QA/IST

EOS International Test SitesEOS International Test Sites

GSFC

CCRS

JPL NASDA (ADEOS, TRMM, Aura, Aqua)

CSIRO

ESRIN

INPE(Aqua),IDN

CONAE

IRE-RAS

Israel

ASF

NSIDC

EDC

LaRC

MITI (Terra)

CAO (SAGE III)

RAL, OXFORD (Aura)

Toronto(Terra)

UCL(Terra)

JRC

AIT,RFD,GISTDA

KNMI (Aura)

Page 6: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

Uses of Web100Uses of Web100 One of our sources at GSFC runs Web100

– King = "GSFC MAX"– Connected to MAX by GigE

Typical use is in problem solving– DTB, Triage

Window size (easier to use than tcpdump) Vs. circuit limitations vs. packet loss

– Also ANLiperf Window size again Plan: extract packet drops from web100, not pings or

netstats

Page 7: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

A recent caseA recent case Sending data from LaRC to JPL via a

project dedicated 20 mbps ATM VC.– Problem surfaced after firewall was installed

Portus "proxy" firewall

RTT of 60 ms requires 150 KB windows – To fill pipe with a single TCP stream

Iperf worked well – a single stream typically got over 15 mbps

But ftp got < 8 mbps

Page 8: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

A recent case (2)A recent case (2) The problem, of course, was window size

– Looked like it was the ftp application, since iperf performance showed that O/S was OK

– But which end? Ran ftps from both nodes to web100 node

– Used DTB to capture window size– Problem: small disk quota FTPs were quick

FTP data session not established until ftp started So had to be quick to capture data with DTB

– DTB showed one site had 64 kb windows But problem was in O/S (IRIX), not ftp

– Tcp_recvspace and tcp_sendspace– Iperf can exceed O/S defaults!

Page 9: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

Case #2Case #2 Another case of limited thruput

– This time iperf was limited – from one source to several destinations– Limit inverse to RTT window size– But source and dest clearly used large windows

Testing to Web100 box showed source was not using extended windows

TCPdump on source showed it was!

Problem turned out to be PIX firewall– Nop'd out the WSCALE field!

Page 10: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

Case #3Case #3 Iperf from GSFC to Tokyo XP

– Via MAX, Abilene, Seattle, TransPac Thruput appears to ramp up linearly for

about 5 minutes (when no loss)– Then becomes window limited:

1 MB window @ 188 ms RTT 42.5 mbps– Repeatable (more or less)– Low or no packet loss

Web100 Triage usually reports 100% path limited– But can't show early part of session (?)

What causes this ramp-up ???

Page 11: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

TracerouteTraceroute

traceroute to perf.jp.apan.net (203.181.248.44), 30 hops max, 38 byte packets 1 enpl-rtr1-ge (198.10.49.57) 0.427 ms 0.325 ms 0.396 ms 2 169.154.192.49 (169.154.192.49) 0.397 ms 0.375 ms 0.275 ms 3 169.154.192.2 (169.154.192.2) 0.740 ms 1.266 ms 1.225 ms 4 gsfc-wash.maxgigapop.net (206.196.177.13) 1.093 ms 1.169 ms 0.907 ms 5 dcne-so3-1-0.maxgigapop.net (206.196.178.45) 1.434 ms 1.621 ms 1.410 ms 6 abilene-wash-oc48.maxgigapop.net (206.196.177.2) 1.073 ms 1.439 ms 1.352 ms 7 nycm-wash.abilene.ucaid.edu (198.32.8.46) 5.436 ms 5.570 ms 5.680 ms 8 clev-nycm.abilene.ucaid.edu (198.32.8.29) 17.747 ms 17.954 ms 17.764 ms 9 ipls-clev.abilene.ucaid.edu (198.32.8.25) 24.006 ms 24.380 ms 24.072 ms10 kscy-ipls.abilene.ucaid.edu (198.32.8.5) 33.335 ms 33.263 ms 33.321 ms11 dnvr-kscy.abilene.ucaid.edu (198.32.8.13) 43.781 ms 43.977 ms 43.756 ms12 sttl-dnvr.abilene.ucaid.edu (198.32.8.49) 72.129 ms 72.286 ms 72.004 ms13 TRANSPAC-PWAVE.pnw-gigapop.net (198.32.170.46) 72.204 ms 72.404 ms 72.220 ms14 192.203.116.34 (192.203.116.34) 188.150 ms 188.216 ms 187.811 ms15 perf.jp.apan.net (203.181.248.44) 187.786 ms 188.103 ms 188.040 ms

Page 12: NASA EOS Active Network Performance Testing Using Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

24 June 2002 Andy Germain

Typical ramp upTypical ramp upClient connecting to perf.jp.apan.net, TCP port 5002TCP window size: 1000 KByte (WARNING: requested 500 KByte)------------------------------------------------------------[ 3] local 198.10.49.62 port 3623 connected with 203.181.248.44 port 5002[ ID] Interval Transfer Bandwidth[ 3] 0.0- 1.5 sec 808 KBytes 4.4 Mbits/sec[ 3] 1.5- 2.1 sec 856 KBytes 12.1 Mbits/sec[ 3] 2.1- 3.0 sec 1.4 MBytes 12.3 Mbits/sec[ 3] 3.0- 4.2 sec 1.7 MBytes 12.6 Mbits/sec

[ 3] 13.0-14.2 sec 2.0 MBytes 14.8 Mbits/sec[ 3] 14.2-15.1 sec 1.7 MBytes 15.2 Mbits/sec[ 3] 15.1-16.0 sec 1.7 MBytes 15.2 Mbits/sec

[ 3] 104.0-105.1 sec 2.8 MBytes 21.1 Mbits/sec[ 3] 105.1-106.0 sec 2.5 MBytes 22.4 Mbits/sec[ 3] 106.0-107.0 sec 2.8 MBytes 23.8 Mbits/sec