national center for atmospheric research pittsburgh supercomputing center national center for...
TRANSCRIPT
![Page 1: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/1.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Web100
Wendy Huntoon - PSC
Jim Ferguson - NCSA
I2 Members Meeting
May 2002
![Page 2: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/2.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Outline
• Project Overview– Motivation: What is the problem– Web100 Collaboration
• Progress to Date– Standardization Process– Code Release
• Code Capabilities• Overview of Users• Web100 Resources
![Page 3: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/3.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Motivations: What’s the Problem?
• High performance flows slower than line rate– Delays continue/increase even with higher bandwidth
• TCP tuning issues are non-trivial• Poorly conceived stacks• Router/switch buffer queues inadequate• Slow start and AIMD algorithm • Eliminate/dramatically reduce the “wizard gap”• Need for kernel instrumentation set for TCP variables
![Page 4: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/4.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
The Wizard Gap
TCP over a long haul path
Year Wizards Non-wizards Ratio
1988 1Mb/s 300kb/s 3:1
1991 10Mb/s
1995 100Mb/s
1999 1Gb/s 3Mb/s 300:1
Scientists/researchers not happy with this
![Page 5: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/5.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
![Page 6: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/6.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
TCP tuning is painful debugging
• All problems limit performance– IP routing, long round trip times
– Improper MSS negotiations or path MTU discovery
– IP Packet reordering
– Packet losses, congestion, lame hardware
– TCP sender or receive buffer space
– Inefficient applications
• Any one problem can mask all the others and confound all but the best (and few) tuning gurus
• Need for better diagnostics and visibility into problems
![Page 7: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/7.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Goal and Method
• Make it “easy” (transparent) for non-experts to achieve higher throughput performance
• Enhance TCP capabilities with better (finer grain) kernel instrumentation and automatic controls
• Real time triage capability determines sender, receiver, and/or network bottlenecks
![Page 8: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/8.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Why Focus on TCP
• TCP has an ideal vantage point into throughput problem space
• TCP can identify bottleneck subsystem(s)
• TCP already measures the network (some)
• TCP can measure the application
• TCP can adjust itself (auto-tuning feedback)
![Page 9: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/9.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Web100 Collaboration
• Funded by the NSF– Currently Year 2 of a 3 Year grant.– Cisco URP for initial seed funding.
• Collaborators– PSC (Matt Mathis, R. Reddy, Janet Brown,
John Heffner)– NCAR (Peter O’Neil, Marla Meehl)– NCSA (John Estabrook, Tanya Brethour,
Stephen Engelhardt, Jim Ferguson)
![Page 10: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/10.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
What is in the code
• Web100 software consists of:– TCP Kernel Instrument Set (TPC-KIS)
• Instruments coded directly in to the Operating System kernel.
– Derived Instrument Set (DIS)• Information that is collected based on KIS
parameters.
– Application Code• Tools, applications, etc. that use the information
provided by the KIS and DIS.
![Page 11: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/11.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Kernel Instrument Set
• Definition– Set of instruments designed to collect as much of the
information as possible to enable a user to isolate the performance problems of a TCP connection.
• How it is implemented– Each instrument is a variable in a "stats" structure that
is linked through the kernel socket structure.
– The Linux /proc interface is used to expose these instruments outside the kernel.
![Page 12: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/12.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
What is the TCP-KIS?
• TCP-KIS instruments group naturally into categories.– Currently roughly 19 categories.
• Already more than 125 instruments have been developed.• For each instrument:
– Precise (standards ready) definition.– Instrument code in the kernel– Implementation verification tests
• Does the kernel implementation meet the definition.
• Prototype diagnostic tool(s) to demonstrate functionality and effectiveness.
![Page 13: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/13.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
TCP-KIS
• Basic instrumentation examples• Connection ID: 5-tuple that uniquely
identifies a connection.• State: determines what protocol features or
algorithms are enabled.• Traffic out: statistics aggregate packets and
traffic sent out on a connection.
![Page 14: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/14.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Local Sender Triage
• Group of instruments associated with the local sender.– Determine what subsystems are throttling TCP
data transmission.– Three parallel sets of instruments that measure:
• Receiver Window
• Network Congestion
• Senders Availability
![Page 15: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/15.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Local Sender Groups
• Other groups of instruments associated with the Local Sender:
• Local Sender Congestion Model
• Local Sender Loss Model
• Local Sender Re-order Model
• Local Sender RTT
• Local Sender Segment Size
• Local Sender Bottlenecks
• Local Sender Tuning
![Page 16: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/16.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Other Instruments
• Similar instruments for the Local Receiver.• Observed Receiver instruments
– Often inferred from the data stream.
– E.g, Observed Receiver - receivers state is inferred from the ACK stream.
• Application Interface– Future instruments to collect statistics on how the
application is using the network.
![Page 17: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/17.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Userland Distribution
• Released asynchronously with kernel distribution
• Currently at Alpha 1.1– Version 1.2 release imminent
• Consists of– The web100 library– Command line utilities– GUI utilities
![Page 18: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/18.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Web100 Library
• Web100 kernel exposes critical TCP variables/instruments through /proc
• Web100 library provides the necessary access functions to access these variables/instruments
• Functions– Read the value of a variable/instrument– Snap shot of a group (facilitates atomic reading of a group of
variables)– Modify tunable variables (ex. send buffer size)– Etc …
![Page 19: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/19.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Utilities
• Command line utilities– Useful in batch scripts– Serve as demo codes for the usage of web100
library
• GUI utilities– Based on GTK+– Useful for troubleshooting network
applications– Serve as examples for application developers
![Page 20: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/20.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
GUI Sample Screens – DTB
![Page 21: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/21.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Connection Selector
![Page 22: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/22.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Looking at a Variable
![Page 23: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/23.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Timeline - Year 1
• Alpha code development• Establish User Support
– www.web100.org
• Initial User Community– Very limited to begin with.
– Knowledgeable users, expected to provide technical input on the code.
– Understand and develop applications.
![Page 24: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/24.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Timeline - Year 2
• Began standardization process.– Develop MIB– Submit to IETF
• Develop public code– Fix bugs in alpha versions– Add instrumentation– Code release
• Continue code development– Identify and add new instruments
![Page 25: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/25.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Code Releases - To date• Initial Release
– Alpha0.2, released May 23, 2001– Alpha0.3, released Sept. 19, 2001
• Alpha 1.0-Separation of Kernel and Userland code– Kernel Patch:
• Alpha 1.1 for Linux 2.4.16, released March 18, 2002• Alpha 1.0, released March 1, 2002 • Alpha 1.0, released February 26,2002
– Userland:• Alpha 1.1, released February 28, 2002• Alpha 1.0, released February 26,2002
![Page 26: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/26.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Timeline - Year 3
• New pathprobe diagnostic tool (wip, unreleased).• Add another 10-12 instruments.• Review instruments and code with other wizards.• Gain vendor support for ideas and code.• Finalize IETF draft by December IETF meeting.
![Page 27: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/27.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Milestones
• Over a year of ~ 30 alpha testers – Including: SLAC, ORNL, LBNL, and universities
– www.net100.org
• Modified Linux kernel supports 2.4.16• Separation between KIS and library functions• draft-ietf-tsvwg-tcp-mib-extension-00.txt• draft-ietf-ipngwg-rfc2012-update-01.txt
![Page 28: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/28.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Web100 Collaborator Activity
• Rich Carlson, ANL• Tom Dunnigan, ORNL• Tom Hacker, U. of Michigan• Doug Chang, SLAC• Andreas Burkhardt & Matt Grob, Qualcomm• Larry Dunn & Scott Dier, Cisco/U. of Minnesota• Jason Lee, LBL
![Page 29: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/29.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Collaborator Assistance
• Bugs!– Kernel– Utilities– Release
• Request new features• Review and criticize documentation
– Way too easy on us
![Page 30: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/30.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Collaborator Activity
• Carlson/ANL working on a troubleshooting guide for LANs.
• Set up network of 13 identically equipped PIII connected via Cisco 5500 network switch, running Web100-enabled Linux.
• Introduces typical network faults (duplex mismatches, other config errors) and analyzes data for “signatures” of these faults.
• Modified Iperf 1.2 to collect variables and reverse flow.
![Page 31: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/31.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Collaborator Activity
• Dunnigan/ORNL has found web100 helpful in seeing losses/retransmission and congestion avoidance parameters of individual TCP flows, and for tuning flows
• Has developed a Web100-enabled ttcp• Has developed a daemon that logs web100 variables for
designated paths when a flow closes• Has developed an autotuning daemon that uses web100 to
tune flows, including modifications to web100 to support "event notification", so the daemon knows when a new flow/socket is opened
![Page 32: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/32.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Collaborator Activity
• Hacker/U.Michigan has been using the web100 software to help tune and diagnose end-to-end network performance problems across the U-M campus network as well as across Abilene for the Visible Human and Atlas projects at U-M.
• Chang/SLAC is looking to fix performance problem between Linux and Solaris machines.
![Page 33: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/33.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Collaborator Activity
• Qualcomm is using Web100 to measure TCP performance over certain types of high speed wireless links under development. Web100 is partially integrated into some other tools - in the sense that output reports are published automatically in a format similar to other tools Qualcomm uses.
• Dunn/Cisco currently using Web100 for a class at U.Minnesota. Includes accounts on test machine at NCSA.
![Page 34: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/34.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Collaborator Activity
• Lee/LBL has obtained accounts at SLAC and ANL for WAN testing, and have co-located one of our machines in Washington D.C. to do testing over SuperNet. Still in the process of testing all this out.
• Keith Jackson at LBL has written Python wrappers to the Web100 calls using swing.
![Page 35: National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Wendy Huntoon - PSC Jim](https://reader038.vdocuments.us/reader038/viewer/2022110303/5513d9b155034674748b50e5/html5/thumbnails/35.jpg)
National Center for Atmospheric ResearchPittsburgh Supercomputing CenterNational Center for Supercomputing Applications
Web100 Summary
• Main WWW site: www.web100.org• Freely available software distribution
– www.web100.org/download– hundreds of downloads
• Please be cognizant of impacts on others• Please use, test, provide feedback, contribute code • IETF standards process to benefit all• Attention turning to working with OS vendors to
incorporate standards enhancements into their stacks