improving end2end performance for the columbia...
TRANSCRIPT
March 12-14, 2007La Jolla, CA
cenic07.cenic.org
CENIC `07MAKING WAVES
Improving End2EndPerformance for the
Columbia SupercomputerMark Foster
Computer Sciences Corp.NASA Ames Research Center
March 2007
This work is supported by the NASA Advanced Supercomputing Division underTask Order A61812D (ITOP Contract DTTS59-99-D-00437/TO #A61812D) withAdvanced Management Technology Incorporated (AMTI).
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESend2end for Columbia
• overview• Columbia system• LAN• WAN• e2e efforts
– what we observed– constraints, and tools used– impact of efforts
• sample applications– earth, astro, science, aero, spaceflight
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESoverview
• scientists using large scale supercomputing resources toinvestigate problems: work is time critical– limited computational cycles allocated– results needed to feed into other projects
• 100’s GBs to multiple TB data sets now common andincreasing– data transfer performance becomes crucial bottleneck
• many scientists from many locations/hosts: no simplesolution
• bringing network engineers to the edge, we have been ableto improve the transfer rates from a few Mbps to a few Gbpsfor some applications
• system utilization now often well above 90%
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESshared challenges
• Chris Thomas @ UCLA :– 10 Mbps end hosts, OC3 campus/group access– asymmetric (campus) path– firewall performance consideration– end users: not network engineers
• Russ Hobby on Cyber Infrastructure:– it is a system (complex, but not as complex as earth/ocean as John
Delaney described)– composition of components that must work together (efficiently)– not all problems are purely technical
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESthe Columbia supercomputer
Systems: SGI Altix 3700, 3700-BX2 and 4700Processors: 10,240 Intel Itanium 2
(single and dual core)Global Shared Memory: 20 Terabytes
Front-End: SGI Altix 3700 (64 proc.)Online Storage: 1.1 Petabytes RAIDOffline Storage: 6 Petabytes STK Silo
Internode Comm: InfinibandHi-Speed Data Transfer: 10 Gigabit Ethernet2048p subcluster: NUMAlink4 interconnect
• 8th fastest supercomputer in world: 62Tflops peak
• supporting wide variety of projects– >160 projects; >900 accounts; ~150simultaneous logins
– Users from across and outside NASA– 24x7 support
• effective architecture: easier applicationscaling for high-fidelity, shorter time-to-solution, higher throughput– 20 x 512p/1TB shared memory nodes– Some applications scaling to 2048p andabove
• fast build: order to full ops in 120 days;dedicated Oct. 2004– Unique partnership with industry (SGI, Intel,Voltaire)
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESColumbia configuration
SATA35TB
Capability System13 TF
Capacity System50 TF
T512p
Front Ends (3)28p Altix 3700Hyperwall Access (HWvis)16p Altix 3700Networking- 10GigE Switches- 10GigE Cards (1 Per 512p)- InfiniBand Switch (288port)- InfiniBand Cards (6 per 512p)- Altix 3700 2BX 2048 NumalinkCompute Node (Single Sys
Image)- Altix 3700 (A)- Altix 3700 BX2 (T)- Altix 4700 (M)
Storage Area Network- Brocade Switch 2x128port
Online Storage (1,040 TB) - 24racks
- SATA RAID- FC RAID- SATA RAID
A512p
A512p
A512p
A512p
A512p
A512p
A512p
A512p
A512p
A512p
A512p
T512p
T512p
T512p
T512p
M512p
T512p
T512p
T512p
FC Switch 128p
SATA35TB
SATA35TB
SATA35TBSATA
35TBSATA35TB
SATA35TB
SATA35TB
Fibre Channel
20TB
Fibre Channel
20TB
Fibre Channel
20TB
Fibre Channel
20TBFibre
Channel20TB
Fibre Channel
20TB
Fibre Channel
20TB
Fibre Channel
20TB
SATA75TB
SATA35TB
SATA35TB
SATA35TBSATA
75TBSATA75TB
SATA75TB
SATA75TB
10GigE
CFE1 CFE3
FC Switch
CFE2 HWvis
InfiniBand
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESColumbia access LAN
C1C1
C1C1
6500
6500
6500
C1C1
C1C1
C1C1
C1C1
C1C1
C1C1
C1C1
C1Cn
PE 6500NISN
NREN
Columbianodes
interconnect andaggregation
access and borderpeering
external peers
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESwide area network - NREN
10G waves at the core, dark fiber to end sites
Ext Peering PointsDistributed ExchNLR/Regional net10 GigE1 GigE
JPLJPL
McLean, VAMcLean, VA
ESNetESNet
LRCLRC
PacWavePacWave
Sunnyvale, CASunnyvale, CA
Los Angeles, CALos Angeles, CA
ARCARC
Norfolk, VANorfolk, VA
NGIX-ENGIX-E
GSFCGSFC
CENICCENIC
NLRNLR
MATP/MATP/ELITEELITE
Huntsville, ALHuntsville, AL
MSFCMSFC
SLRSLR
MAX/MAX/DRAGONDRAGON
Atlanta,Atlanta,GAGA
•National and Regional optical networks provide links over which 10 Gbps and 1 Gbps wavescan be established.•Distributed exchange points provide interconnect in metro and regional areas to othernetworks and research facilities
NGIX-WNGIX-WAIXAIX
GSFCGSFC
(in progress)
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESend2end efforts
what we observed– long running but low data rates (Kbps, Mbps)– very slow bulk file moves reported– bad mix: untuned systems, small windows, small mtu, long rtt
(insert historical graph here)
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESend2end efforts
constraints, and tools used– facilities leveraging web100 could be really helpful, but…– local policies/procedures sometimes preclude helpful changes
• system admin practices: “standardization” for lowest commondenominator, “fear” of impact (mtu, buffers size increase)
• IT security policies, firewalls: “just say no”• WAN performance issues: “we don’t observe a problem on our LAN”
– path characterization: ndt, npad, nuttcp, iperf, ping, traceroute• solve obvious issues early (duplex mismatch, mtu limitation, poor route)
– flow monitoring: netflow, flow-tools (Fullmer), FlowViewer (Loiacono)
– bulk transfer: bbftp (IN2P3/Gilles Farrache), bbscp (NASA), hpnssh(PSC/Rapier), starting to look at others: VFER & UDT
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESinitial investigations
• scp 2-5 Mbps (or worse): cpu limits, and tcp limits– can achieve much better results with HPN-SSH (enables tcp window
scaling), and by using RC4 encryption (much more efficient on someprocessors - use “openssl speed” to assess cpu’s performance)
– even with these improvements, still need to use 8-12 concurrentstreams to get maximum performance with small MTUs
• nuttcp shows udp performance near line rate in many cases,but tcp performance still lacking– examine tcp behavior (ndt, npad, tcptrace)– tcp buffer sizes main culprit in large RTT environment; small amount
of loss can be hard to detect/resolve– mid-span (or nearby) test platforms helpful
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESrecommend TCP adjustments
typical linux example for 85ms rtt: # Set maximum TCP window sizes to 100 megabytes net.core.rmem_max = 104857600
net.core.wmem_max = 104857600 # Set minimum, default, and maximum TCP buffer limits net.ipv4.tcp_rmem = 4096 524288 104857600 net.ipv4.tcp_wmem = 4096 524288 104857600
# Set maximum network input buffer queue length net.core.netdev_max_backlog = 30000 # Disable caching of TCP congestion state (2.6 only) # (workaround a bug in some Linux stacks) net.ipv4.tcp_no_metrics_save = 1
# Ignore ARP requests for local IP received on wrong interface net.ipv4.conf.all.arp_ignore = 1
ref: “Enabling High Performance Data Transfers”www.psc.edu/networking/projects/tcptune
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESrecommend ssh changes
• at least OpenSSH 4.3p2, using OpenSSL 0.9.8b (May 2006)• use faster ciphers than the default (RC4 leverage of
processor specific coding)• OpenSSH should be patched (HPN-SSH) - support large
buffers and congestion windowwww.psc.edu/networking/projects/hpn-ssh
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVES firewall impacts
Prior to firewallupgrade(199 - 644 Mpbs)
After firewallupgrade(792 - 980 Mpbs)
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESend host aggregate improvement
host performanceusing multiple streams,with some tuning8 streams: 257 Mbps
after more tuning,firewall upgrade4 streams: 4.7 Gbps
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESexample application increase
308
169
5.7
0 50 100 150 200 250 300 350
Improved File
Transfer Application &
Jumbo Frames
Improved File
Transfer Application
Standard File Transfer
Application
Multi-Stream bbFTP + Jumbo Frames
Multi-Stream bbFTP
SCP
• NASA Goddard’s 3-D Cloud-Resolving Model: 54x throughputperformance gains
• collaboration between NREN and GSFC Scientific & EngineeringNetwork (SEN), High-End Computing Network (HECN) teams
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESfactors driving traffic growth
• increased Columbia usage• storage and file system
upgrades on Columbia• aggressive campaign to work
with users to improveperformance to Columbiathrough the use of better filetransfer tools, end systemtuning and user education
• network bandwidth increasesacross the NREN wide areanetwork, local area networksand firewalls
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESimpact of e2e efforts
• trends showaggregate 5TB/moincreased to morethan 20TB/mo
• three of previous fivemonths exceed1TB/day
efforts do not just result in increased bandwidth; improvednetwork performance results in improved capability, increasedfidelity, more efficient computing, & better productivity
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESecco at JPL/MIT - visualization
High Temporal Resolution Visualization ProvidesNew Insights for Ocean Researchers
• NAS Visualization groupcompleted a 110-hourcomputational run of theMassachusetts Institute ofTechnology’s general circulationmodel, MITgcm, simulating anentire year of ocean dynamics
• These visualizations areallowing researchers at MIT andNASA’s Jet PropulsionLaboratory to investigate modeldynamics with unprecedentedtemporal resolution
salt concentration, parts per thousand at15m depth
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESdark matter at UCSC
• Madau, Diemand,and Kuhlen at UCSanta Cruzsimulate evolutionof dark matter halo
• Projected darkmatter density-square maps ofthe simulated MilkyWay-size halo at13.3 billion yearsago, 460 millionyears after the BigBang.
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVEScombustion science at LBL
• Marc Day andcollaborators at LBLperform high-fidelitynumerical simulationson Columbia
• results aid in thedevelopment of cleanfuel-efficientcombustion systemsfor transportation andstationary powergeneration
partial period of combustion simulation,colored by the local fuel consumption rate
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESnational combustion code
•Combustor hardware iscomplex and the turbulentreacting flow process iscomplicated (and still notwell understood).•Massively parallelprocessing via messagepassing interface speedsup the calculations toacceptable levels—toapproximately a wall-clockweek.
Partially resolvedNavier-Strokessimulations ofthe GE LM6000combustor
Combustioninstabilities in aLOX-Methanerocket engine
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESAres I stage separation
•Ares I rocket launchsystem concept is similar tothe Saturn rocket of theApollo Program.•A simulation of a high-altitude stage separationcomputes the flow of airaround the vehicle and theresultant aerodynamicforces.
early stagesof separation
flowfieldaround thevehicle post-separation
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESOrion - crew vehicle
re-entry wake turbulence
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESOrion - crew vehicle
re-entry wake turbulence - IBWAN at SC06 (Henze)
Obsidian longbow IB over NREN via 10 GbE NLR FrameNetbetween NASA Ames and Tampa
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESNLCS awards - March 2007
4.75 million hours of supercomputing time under NASA’sNational Leadership Computing System (NLCS) initiative:computationally intensive research projects of nationalinterest
• Transition in High-Speed Boundary Layers: Numerical Investigations Using DNS andLES: Led by Hermann Fasel, University of Arizona, Tucson: high-fidelity simulations tounderstand how turbulence starts in high-speed airflow over air vehicles
• Large Scale URANS/DES Ship Hydrodynamics Computations with CFDShip-Iowa: Ledby Fredrick Stern, University of Iowa: accelerate code development for viscous shiphydrodynamics simulation
• Flame Dynamics and Emission Chemistry in High-Pressure Industrial Burners: Led byMarcus Day, Lawrence Berkeley National Laboratory: simulate natural gas combustion inpower-generation turbines to quantify the mechanisms that control the formation ofpollutants
• Multi-Scale Modeling and Computation of Convective Geophysical Turbulence: Led byKeith Julien, University of Colorado, Boulder: new algorithms in large-scale simulations tostudy the role of global ocean thermohaline circulation (THC) in modulating the world’sclimate
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESsome tuning references
• NREN TCP Performance Tuning Guide www.nren.nasa.gov/tcp_tuning.html(also has links for bbftp, bbscp)
• Other useful guides:WAN Tuning and Troubleshooting
www.internet2.edu/~shalunov/writing/tcp-perf.htmlEnabling High Performance Data Transfers
www.psc.edu/networking/projects/tcptuneTCP Tuning Guide www-didc.lbl.gov/TCP-tuning
CENIC `07: Making WavesMarch 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
CENIC `07MAKING WAVESthank you
Mark FosterComputer Sciences [email protected]