nagi rao (nageswara s.v. rao) computer science and mathematics division

23
http:// www.csm.ornl.gov/ghpn/wk2003.html Report of the “DOE Workshop on Ultra High-Speed Transport Protocols and Dynamic Provisioning for Large- Scale Science Applications” April 10-11, 2003, Argonne, IL Panel on Future Directions in Networking International Conference on Network Protocols November 5-7, 2003, Atlanta, GA Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division Oak Ridge National Laboratory [email protected] Transport Protocols and Dynamic Provisioning for Large-Scale Science Applications

Upload: shay-young

Post on 30-Dec-2015

63 views

Category:

Documents


0 download

DESCRIPTION

Transport Protocols and Dynamic Provisioning for Large-Scale Science Applications. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

http://www.csm.ornl.gov/ghpn/wk2003.html

Report of the “DOE Workshop on Ultra High-Speed Transport Protocols and Dynamic Provisioning for Large-Scale Science

Applications”April 10-11, 2003, Argonne, IL

Panel on Future Directions in NetworkingInternational Conference on Network Protocols

November 5-7, 2003, Atlanta, GA

Nagi Rao

(Nageswara S.V. Rao)

Computer Science and Mathematics Division

Oak Ridge National Laboratory

[email protected]

Transport Protocols and Dynamic Provisioning for Large-Scale Science Applications

Page 2: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

We engineered the Internet, and it works fine for e-mail and web; but to do “world-class” scientific research needed in DOE scientific applications, we need to develop a science of networking that delivers usable performance to the applications

Allyn Romanow, Cisco Systems

Page 3: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

Introduction

Organization Details

Provisioning Group Notes

Transport Group Notes

Dynamics and Stabilization of Network Transport

Conclusions

Outline

Opinions expressed in this presentation belong to the author and are not necessarily the official positions of US Department of Energy, Oak Ridge National Laboratory or UT-Battelle LLC.

Page 4: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

Next generation of DOE scientific breakthroughs critically depend on large multi-disciplinary geographically dispersed research teams:

– high energy physics, climate simulation, fusion energy, genomics, astrophysics, spallation neutron source, and others

These applications are Inherently distributed in: Data – archival or on-line Computations – supercomputers or clustersResearch teams – experts in different domainsExperimental facilities – one of the kind user facilities

– they all need to be seamlessly networked

Networking for DOE large-science applications

Page 5: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

DOE Large-Scale Science Applications and Numerous Other Science Applications - need extreme and acute networking

Detailed account of the needs were identified and discussed atDOE High-Performance Network Planning Workshop, August 13-15, 2002,

http://DOECollaboratory.pnl.gov/meetings/hpnpw

ScienceAreas

CurrentEnd2End Throughput

5 yearsEnd2End Throughput

5-10 YearsEnd2End Throughput

General Remarks

High Energy Physics 0.5 Gbps E2E 100 Gbps E2E 1.0 Tbps high throughput

Climate Data & Computations

 0.5 Gbps E2E 160-200 Gbps n Tbps high throughput

SNSNanoScience

 does not exist 1.0 Gbps steady state

Tbps & control channels

remote control & high throughput

Fusion Energy 500MB/min(Burst)

500MB/20sec(burst)

n Tbps time critical transport

Astrophysics 1TB/week N*N multicast 1TB+ & stable streams

computational steering & collaborations

Genomics Data & Computations

1TB/day 100s users Tbps & control channels

high throughput & steeringNee

d m

ore

than

bulk

ban

dwidt

h

Page 6: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

• Science Objective: Understand supernova evolutions– Teams of field experts across the country collaborate on computations

• Experts in hydrodynamics, fusion energy, high energy• Universities and national labs

– Massive computational code• Terabyte in days are generated currently• Archived at nearby HPSS• Visualized locally on clusters – only archival data

• Desired capability– Archive and supply massive amounts of data– Collaboratively visualize archival or on-line data– Monitor, visualize and steer computations into regions of interest

Astrophysics Computations

Visualization channel

Control channel

Page 7: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

• Data Movement Operations– Experimental and

computational data• Stored across the country• Terabytes of data per day

– Between users, archives and computers

• Molecular Dynamics Computations– Supercomputers or

clusters– Monitor, visualize, and

steer computations

Genomics Networking Needs

data channel

visualization channel

steering channel

Page 8: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

• Experimental Setups and Monitoring of Expensive Facilities (SNS – billion$)– Setup parameters and start

experiments– Adjust parameters as

needed; stop when necessary

• Data Movements– Archive and access

massive amounts of experimental data

Neutron Facilities – SNS, HFIR

Page 9: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

DOE faces unique or acute challenges: Small user base with extreme needs – large data transfers at application-level – rates much higher than current

backbones– highly controlled end-to-end data streams – unprecedented agility and stability– capabilities must be available to science users – not just to network experts

with special networks

Commercial and other networks will not adequately meet these acute requirements– Not large enough user base– Very limited business case

New advances in Transport and Provisioning hold enormous promise, if suitably fostered and integrated– Flexible and powerful routers/switches, ultra high bandwidth links, new

transport protocols can get us partway there– But, need several critical technologies and expertise:

• end-to-end dynamic provisioning of paths with guaranteed performance• transport methods that optimally provide to user applications

Current Network Capabilities: Transport and Provisioning

Page 10: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

Workshop Goal

• Address the research, design, development, testing and deployment aspects of transport protocols and network provisioning as well as the application-level capability needed to build operational ultra-speed networks to support emerging DOE distributed large-scale science applications over the next 10 years.

Page 11: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

Workshop Focus:• Ultra High-Speed Networks to support

DOE Large-Science Applications– not a general network research workshop addressing Internet

problems

• Formulate DOE roadmap in the specific areas:– Transport and Provisioning

• two very critical subareas of network research needed to meet DOE large-science requirements

• Not in other areas such as security, wireless networks

• “Working” workshop– Discussions on very specific problems, methods, potential solutions

in transport and provisioning areas– Very short introductory presentations– Not just primarily informational or educational

Page 12: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

Balanced participation from universities, industry and national laboratories to represent the needs, technologies, research and business aspects

Total: 32National Laboratories: 10

ORNL:3; ANL:2: LANL:2; PNNL:1; SLAC:1; ESnet: 1Universities: 11

UMass; GaTech(2), Uva, UIC,Indiana U, U Va, U Tennessee, UC Davis, PSC, CalTech

Industry:8Celion, Cienna, Cisco, Juniper, Level3, Lightsand, MCNC, Qwest

DOE Headquarters: 3Working Groups:

Provisioning: 14Transport:15

Participants

Page 13: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

Need Focused Efforts inDevelop a scalable architecture for fast provisioning Circuit

Switched NetworkBuild an application-centric circuit-switched cross country

test-bedCoordination and Graceful integration with

Applications and MiddlewareTransport and OS DevelopersLegacy and evolutionary networks

Summary: Provisioning for DOE Large-Science Networks

Page 14: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

• Recommendation 1: Agile Optical Network infrastructure: – A scalable architecture for fast provisioning of circuit switched dedicated channels

specified on-demand by the applications.

• Recommendation 2: Hybrid Switched Networks: – High capacity (Tbps) switchable channels for Petabyte data transport, a

combination of requirements to accommodate burst, real-time streams as well as lower priority traffic, multi-point or shared use, for large file and data transfers, and for low latency and low jitter.

• Recommendation 3: Dynamically Reconfigured Channels: – Provisioning of dynamically specified end-to-end quality paths for computational

steering and time-constrained experimental data analysis.

• Recommendation 4: Multi-Resolution Quality of Service:– Channels with various types of Quality of Service (QoS) parameters must be

supported at various resolutions using GMPLS, service provisioning and channel sharing technologies.

• Recommendation 5: Experimental Test-Bed

Provisioning Recommendations

Page 15: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

• Limited deployment of ultra-long haul DWDM links;• Lack of support for striped/parallel transport both at the core and application levels;• Lack of high-speed circuit-switched infrastructure with network control-plane design and

synchronous NICs with high-speed and on-demand reconfigurability; and• Lack of well-developed methods and application interfaces for scheduling/reserving,

allocation, and initiation.• DOE applications do not follow the commercial scaling model of large number of users

each with smaller bandwidth requirements;• Lack of security paradigms for dedicated paths and the infrastructure that to manage them;• Lack of a robust multi-cast solution efficiently supported on dedicated channels;• High cost of equipment, including the costs of links, routers/switches and other equipment

as well as deployment and maintenance;• Lack of field-hardening of optical components such as memory/buffer, high-speed switches,

Reamplification, Reshaping and Retiming (RRR) equipment, and lambda conversion gear;• Lack of effective contention resolution methods for the allocation of channel pools; and• Limited interoperability with other data networks, particularly legacy networks.

Provisioning: Barriers

Page 16: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

Current transport methods are massively inadequate– Unattained Throughput: wizards can achieve several Gbps for certain

durations• But throughputs are needed at application user level

– Cannot provide sustained and stable streams for control operations– TCP has complicated dynamics – hard to use in finer control operations

Need focused efforts in: – Optimal transport methods to exploit provisioning to meet requirements

• Transport -Tbps throughputs• Support stable and agile control channels

– Comprehensive theory of transport: synergy and extensions of traditional disciplines

• Stochastic control, non-linear control, statistics, optimization, protocol engineering

– Strict algorithmic design• Modular , autonomic, adaptive, composable

Integration and interactions:– DOE deployment, wider adoption, legacy integration– Experiment and test-bed– Instrumentation and Diagnostics tools

• web100/Net100• Statistical Inference and optimized data collection

Summary: Transport for DOE Large-Science Networks

Page 17: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

Scientists (must) view the network as they view a computer as a resourceBut they are becoming (not always willingly) network experts –

“wizard gap” at all levels• but “gray matter” tax must be low

Advance the state of network protocols to make them plug-and-play for the application users

- need significant effort

Time-to-solution in networking area is currently too high – TCP tuning for Gbps throughputs took years

Peak is not enough – need sustained throughputs at application level

Transport Group Notes

Page 18: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

• Recommendation 1: Transport Protocols and Implementations– Transport methods for dedicated channels and IP networks for achieving high

throughput, steering and control. The transport methods include TCP-, UDP- and SAN-based methods together with newer approaches.

• Recommendation 2: Transport Customization and Interfacing– Transport methods optimized to single and multiple hosts as well as channels of

different modes.

– Transport methods suitably interfaced with storage methods to avoid impedance mismatches that could degrade the end-to-end transport performance.

• Recommendation 3: Stochastic Control Methods– Stochastic control theoretic methods to design protocols with well-understood and/or

provable stability properties.

• Recommendation 4: Monitoring and Estimation Methods– Monitoring and statistical estimation techniques to monitor the critical transport

variables and dynamically adjust them to ensure transport stability and efficiency.

• Recommendation 5: Experimental Test-Bed

Transport Group Recommendations: 1-5 years

Page 19: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

• Recommendation 1: Modular Adaptive Composable and Optimized Transport Modules:– Highly dynamic and adaptive methods to dynamically compose transport methods to match the

application requirements and the underlying provisioning.

• Recommendation 2: Stochastic and Control Theoretic Design and Analysis:– Stochastic control theoretic methods for composable transport methods to analyze them as well

as to guide their design to ensure stability and effectiveness

• Recommendation 3: Graceful Integration with Middleware and Applications:– Application data and application semantics must be mapped into transport methods to optimally

meet application requirements

– boundary between middleware and transport must be made transparent to applications.

• Recommendation 4: Vertical Integration of Applications, Transport and Provisioning:– Vertical integration of resource allocation policies (cost and utility) with transport methods to

present a unified view and interface to the applications.

Transport Group Recommendations: 5-10 years

Page 20: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

There is a need for systematic scientific approaches to the design, analysis and implementation of the transport methods and to network provisioning.

• On-Demand Bandwidth and Circuit Optimization: – Dynamic optimization and scheduling methods to allocate the bandwidth pipes to

applications. – A comprehensive approach for on-line estimation and allocation of the “bandwidths” – Signaling to provide the required timeliness and reliability of the allocated channels. – Scientific, systematic understanding to integrate the components for bandwidth allocation,

channel scheduling, channel setup and teardown, and performance monitoring.• Comprehensive Theory of Transport:

– Rigorous transport design methods tailored to the underlying provisioning modes. – A synergy and extensions of a number of traditional disciplines. – New stochastic control methods may be required to design suitable transport control

methods. – Non-linear control theoretic methods to analyze delayed feedback. – Statistical theory for designing rigorous measurements and tests. – Optimization theory to obtain suitable parameters for tuning protocols.

• Strict Algorithmic Design and Implementation: – Strict algorithmic design methods to efficiently implement the designed protocols. – Implementations must be modular, autonomic, adaptive, and composable.

• Statistical Inference and Optimized Data Collection: – Due to the sheer data volumes, it is inefficient to collect measurements from all nodes all the

time for the purposes of diagnosis, optimization and performance tuning. – Systematic inferencing methods to identify the critical and canonical sets of measurements

needed. – Statistical design of experiments to ensure that the measurements are strategic and optimal.

Science of High-Performance Networking

Page 21: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

High-Performance Network Test-beds: Recommended by both groups

State-of-the-art Components: software and hardware networking components, including routers/switches, high bandwidth long-haul links, protocols and application interface modules.

Integrated Development Environments: mechanisms to integrate a wide spectrum of network technologies including high throughput protocol, dynamic provisioning, interactive visualization and steering, and high performance cyber security measures

Smooth Technology Transition: transition of network technologies from research stages to production stages by allowing them to mature in such an environment.

Characteristics of ultra high-speed network test-bed:1. Interconnection of at least three science facilities with large-scale

science applications;2. Geographical coverage adequate to capture optical characteristics,

transport protocols dynamics, and application behaviors comparable to that of real-word applications;

3. Integration with appropriate middleware;4. Scalable network measurement tools; and 5. Well-defined technology transfer plan.

Page 22: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

Integration, Interaction, and Interfacing

Applications are empowered to “tune” the network

Application 1 Application 2 Application 3

IP provisioning Dynamic lambda switching

Net100modules Non-TCP

protocols

Stabilizemodule

Controlmodules

UltraNet

MiddlewareProtocols

Network-AwareApplications

Supernova:Large data streamControl stream

Molecular dynamics visualization

HEP data transfers

Page 23: Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division

While not on the original agenda, security issues have significant impact on application performance – DOE sites have very strict firewalls

• Securing Operational and Development Environments: – authentication, validation and access controls– data speeds of multiple tens of Gbps or higher– new security methods for on-demand dedicated channels.

• Effects of Security Measures on Performance: – impact of security measures on application performance.– graceful interoperation of science applications under secured network

environments.• Proactive Countermeasures:

– protect bandwidth allocation, and signaling to setup and tear down the paths

– vulnerability of new transport protocols to certain attacks

Network Security Issues