1 esnet update joint techs meeting, july 19, 2004 william e. johnston, esnet dept. head and senior...

32
1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael S. Collins, Stan Kluz, Joseph Burrescia, and James V. Gagliardi, ESnet Leads and the ESnet Team Lawrence Berkeley National Laboratory

Upload: darrell-ball

Post on 31-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

1

ESNet UpdateJoint Techs Meeting, July 19, 2004

William E. Johnston, ESnet Dept. Head and Senior Scientist

R. P. Singh, Federal Project Manager

Michael S. Collins, Stan Kluz,Joseph Burrescia, and James V. Gagliardi, ESnet Leads

and the ESnet Team

Lawrence Berkeley National Laboratory

Page 2: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

2

TWC

JGISNLL

LBNL

SLAC

YUCCA MT

BECHTEL

PNNLLIGO

INEEL

LANL

SNLAAlliedSignal

PANTEX

ARM

KCP

NOAA

OSTIORAU

SRS

ORNLJLAB

PPPL

ANL-DCINEEL-DCORAU-DC

LLNL/LANL-DC

MIT

ANL

BNL

FNALAMES

4xLAB-DCNERSC

NR

EL

ALBHUB

LLNL

GA DOE-ALB

SDSC

Japan

GTN&NNSA

International (high speed)OC192 (10G/s optical)OC48 (2.5 Gb/s optical)Gigabit Ethernet (1 Gb/s)OC12 ATM (622 Mb/s)OC12 OC3 (155 Mb/s)T3 (45 Mb/s)T1-T3T1 (1 Mb/s)

Office Of Science Sponsored (22)NNSA Sponsored (12)Joint Sponsored (3)

Other Sponsored (NSF LIGO, NOAA)Laboratory Sponsored (6)

QWESTATM

42 end user sites

ESnet IP

GEANT - Germany - France - Italy - UK - etc Sinet (Japan)Japan – Russia(BINP)

CA*net4CERNMRENNetherlandsRussiaStarTapTaiwan (ASCC)

CA*net4KDDI (Japan)FranceSwitzerlandTaiwan (TANet2)

AustraliaCA*net4Taiwan (TANet2)Singaren

ESnet core: Packet over SONET Optical Ring and HubsIPv6: backbone and numerous peers

ELP HUB

SNV HUB CHI HUB

NYC HUB

ATL HUB

DC HUB

peering points

MAE-ES

tarligh

tChi NAP

Fix-W

PAIX-W

MAE-W

NY-NAP

PAIX-E

Euqinix

PNW

G

SEA HUB

ESnet Connects DOE Facilities and Collaborators

hubs SNV HUB

Abi

lene A

bile

ne

Abilene

Ab

ilene

Page 3: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

STARLI

GH

T

MAE-E

NY-NAP

PAIX-E

GA

LB

NL

ESnet Logical InfrastructureConnects the DOE Community With its Collaborators

ESnet Peering (connections to other networks)

Commercial

NYC HUBS

SEA HUB

Japan

SNV HUB

MAE-W

FIX

-W

PAIX-W 26 PEERS

CA*net4CERNMRENNetherlandsRussiaStarTapTaiwan (ASCC)

Abilene +7 Universities

22 PEERS

MAX GPOP

GEANT - Germany - France - Italy - UK - etc SInet (Japan)KEKJapan – Russia (BINP)

AustraliaCA*net4Taiwan

(TANet2)Singaren

20 PEERS3 PEERS

LANL

TECHnet

2 PEERS

39 PEERS

CENICSDSC

PNW-GPOP

CalREN2 CHI NAP

Distributed 6TAP19 Peers

2 PEERS

KDDI (Japan)France

EQX-ASH

1 PEER

1 PEER

5 PEERS

ESnet provides complete access to the Internet by managing the full complement of Global Internet routes (about 150,000) at 10 general/commercial peering points + high-speed peerings w/ Abilene and the international networks.

ATL HUB

University

International

Commercial

Abilene

EQX-SJ

Abilene

6 PEERS

Abilene

Page 4: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

4

ESnet New Architecture Goal• MAN rings provide dual site and hub connectivity

• A second backbone ring will multiply connect the MAN rings to protect against hub failure

EuropeAsia-

Pacific

ESnetCore/Backbone

New York (AOA)

Chicago (CHI)

Sunnyvale (SNV)

Atlanta (ATL)

Washington, DC (DC)

El Paso (ELP)

DOE sites

Page 5: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

5

NERSC

LBNL

Joint Genome Institute

SLAC

SF Bay Area

Qwest /ESnet hub

mini ring

SF BA MAN ring topology – phase 1

Existing ESnet Core Ring

Chicago

El Paso

First Step: SF Bay Area ESnet MAN Ring

• Increased reliability and site connection bandwidth

• Phase 1o Connects the primary Office of

Science Labs in a MAN ring

• Phase 2o LLNL, SNL, and

UC Merced

• Ring should not connect directly into ESnet SNV hub (still working on physical routing for this)

• Have not yet identified both legs of the mini ring

NLR / UltraScienceNet

Seattle and Chicago

LA and San Diego

Level 3hub

Page 6: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

6

Traffic Growth Continues

Annual growth in the past five years has increased from 1.7x annually to just over 2.0x annually.

TB

ytes

/M

onth

ESnet Monthly Accepted TrafficESnet is currently transporting about 250 terabytes/mo.

Page 7: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

7

Traffic coming into ESnet = GreenTraffic leaving ESnet = BlueTraffic between sites% = of total ingress or egress traffic

Note that more that 90% of the ESnet traffic is OSC traffic

ESnet Appropriate Use Policy (AUP)

All ESnet traffic must originate and/or terminate on an ESnet an site (no transit traffic is allowed)

Who Generates Traffic, and Where Does it Go?ESnet Inter-Sector Traffic Summary,

Jan 2003 / Feb 2004 (1.7X overall traffic increase, 1.9X OSC increase) (the international traffic is increasing due to BABAR at SLAC and the LHC tier 1 centers at

FNAL and BNL)

Peering Points

Commercial

R&E (mostlyuniversities)

International

21/14%

17/10%

9/26%

14/12%

10/13%

4/6%

ESnet

~25/18%

DOE collaborator traffic, inc.data

72/68%

53/49%

DOE is a net supplier of data because DOE facilities are used by universities and commercial entities, as well as by DOE researchers

DOE sites

Page 8: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

8

ESnet Top 20 Data Flows, 24 hrs., 2004-04-20

Fermila

b (US)

CERN

SLAC (US)

IN2P3 (F

R)

1 te

raby

te/d

ay

SLAC (US)

INFN P

adva (I

T)

Fermila

b (US)

U. C

hicago (U

S)

CEBAF (US)

IN2P3 (F

R)

INFN P

adva (I

T) S

LAC (US)

U. Toro

nto (CA)

Ferm

ilab (U

S)

DFN-WiN

(DE)

SLAC (U

S)

DOE Lab D

OE Lab

DOE Lab D

OE Lab

SLAC (US)

JANET (U

K)

Fermila

b (US)

JANET (U

K)

Argonne (U

S) Leve

l3 (US)

Argonne

SURFnet (

NL)

IN2P3 (F

R) S

LAC (US)

Fermila

b (US)

INFN P

adva (I

T)

A small number of science

users account for a significant

fraction of all ESnet traffic

Page 9: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

Top 50 Traffic Flows Monitoring – 24hr 2 Int’l and 2 Commercial Peering Points

10 flows> 100 GBy/day

More than 50 flows

> 10 GBy/day

Page 10: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

10

LBNL

PPPL

BNL

AMES

Remote Engineer• partial duplicate infrastructure

DNS

Remote Engineer• partial duplicate

infrastructure

TWCRemoteEngineer

Disaster Recovery and Stability

• The network must be kept available even if, e.g., the West Coast is disabled by a massive earthquake, etc.

ATL HUB

SEA HUB

ALBHUB

NYC HUBS

DC HUB

ELP HUB

CHI HUB

SNV HUB Duplicate InfrastructureCurrently deploying full replication of the NOC databases and servers and Science Services databases in the NYC Qwest carrier hub

Engineers, 24x7 Network Operations Center, generator backed power

• Spectrum (net mgmt system)• DNS (name – IP address

translation)• Eng database• Load database• Config database• Public and private Web• E-mail (server and archive)• PKI cert. repository and

revocation lists• collaboratory authorization

service

Reliable operation of the network involves• remote Network Operation Centers (3) • replicated support infrastructure• generator backed UPS power at all critical

network and infrastructure locations

• high physical security for all equipment• non-interruptible core - ESnet core

operated without interruption throughoN. Calif. Power blackout of 2000othe 9/11/2001 attacks, andothe Sept., 2003 NE States power blackout

Page 11: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

11

Disaster Recovery and Stability

• Duplicate NOC infrastructure to AoA hub in two phases, complete by end of the yearo 9 servers – dns, www, www-eng and noc5 (eng.

databases), radius, aprisma (net monitoring), tts (trouble tickets), pki-ldp (certificates), mail

Page 12: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

Maintaining Science Mission Critical Infrastructurein the Face of Cyberattack

• A Phased Response to Cyberattack is being implemented to protects the network and the ESnet sites

• The phased response ranges from blocking certain site traffic to a complete isolation of the network which allows the sites to continue communicating among themselves in the face of the most virulent attacks

o Separates ESnet core routing functionality from external Internet connections by means of a “peering” router that can have a policy different from the core routers

o Provide a rate limited path to the external Internet that will insure site-to-site communication during an external denial of service attack

o Provide “lifeline” connectivity for downloading of patches, exchange of e-mail and viewing web pages (i.e.; e-mail, dns, http, https, ssh, etc.) with the external Internet prior to full isolation of the network

Page 13: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

13

Phased Response to Cyberattack

LBNL

ESnet

router

router

borderrouter

X

peeringrouter

Lab

Lab

gatewayrouter

ESnet second response – filter traffic from outside of ESnet

Lab first response – filter incoming traffic at their ESnet gateway router

ESnet third response – shut down the main peering paths and provide only limited bandwidth paths for specific

“lifeline” services

Xpeeringrouter

gatewayrouter

border router

router

attack trafficX

ESnet first response – filters to assist a site

Sapphire/Slammer worm infection created a Gb/s of traffic on the ESnet core until filters were put in place (both into and out of sites) to damp it out.

Page 14: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

14

Phased Response to Cyberattack

Phased Response to Cyberattack

Architecture to allow• phased response to cybersecurity

attacks• lifeline communications during

lockdown conditions.

Design the Architecture Software; site, core and peering routers topology, and; hardware configuration

1Q04

Design and test lifeline filters

Configuration of filters specified 4Q04

Configure and test fail-over and filters

Fail-over configuration is successful 4Q04

In production The backbone and peering routers have a cyberattack defensive configuration

1Q05

Page 15: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

15

Grid Middleware Services

• ESnet is the natural provider for some “science services” – services that support the practice of scienceo ESnet is trusted, persistent, and has a large (almost

comprehensive within DOE) user base

o ESnet has the facilities to provide reliable access and high availability through assured network access to replicated services at geographically diverse locations

However, service must be scalable in the sense that as its user base grows, ESnet interaction with the users does not grow (otherwise not practical for a small organization like ESnet to operate)

Page 16: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

16

Grid Middleware Requirements (DOE Workshop)

• A DOE workshop examined science driven requirements for network and middleware and identified twelve high priority middleware services (see www.es.net/#research)

• Some of these services have a central management component and some do not

• Most of the services that have central management fit the criteria for ESnet support. These include, for example

o Production, federated RADIUS authentication serviceo PKI federation serviceso Virtual Organization Management services to manage organization

membership, member attributes and privilegeso Long-term PKI key and proxy credential managemento End-to-end monitoring for Grid / distributed application debugging and

tuningo Some form of authorization service (e.g. based on RADIUS)o Knowledge management services that have the characteristics of an

ESnet service are also likely to be important (future)

Page 17: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

17

Science Services: PKI Support for Grids

• Public Key Infrastructure supports cross-site, cross-organization, and international trust relationships that permit sharing computing and data resources and other Grid services

• DOEGrids Certification Authority service provides X.509 identity certificates to support Grid authentication provides an example of this modelo The service requires a highly trusted provider, and requires a

high degree of availability

o Federation: ESnet as service provider is a centralized agent for negotiating trust relationships, e.g. with European CAs

o The service scales by adding site based or Virtual Organization based Registration Agents that interact directly with the users

o See DOEGrids CA (www.doegrids.org)

Page 18: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

18

ESnet PKI Project

• DOEGrids Project Milestoneso DOEGrids CA in production June, 2003

o Retirement of initial DOE Science Grid CA (Jan 2004)

o “Black rack” installation completed for DOE Grids CA (Mar 2004)

• New Registration Authoritieso FNAL (Mar 2004)

o LCG (LHC Computing Grid) catch-all: near completion

o NCC-EPA: in progress

• Deployment of NERSC “myProxy” CA

• Grid Integrated RADIUS Authentication Fabric pilot

Page 19: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

19

PKI Systems

Secure racks

Secure Data Center

Building Security

LBNL Site security

Internet

Fire Wall

Bro Intrusion Detection

Vaulted Root CA

HSM

DOEGrids Security

RAs andcertificate applicants

Page 20: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

20

Science Services: Public Key Infrastructure

• The rapidly expanding customer base of this service will soon make it ESnet’s largest collaboration service by customer count

Registration AuthoritiesANLLBNLORNLDOESG (DOE Science Grid)ESG (Climate)FNALPPDG (HEP)Fusion GridiVDGL (NSF-DOE HEP collab.)NERSCPNNL

Page 21: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

21

ESnet PKI Project (2)

• New CA initiatives:o FusionGrid CAo ESnet SSL Server Certificate CAo Mozilla browser CA cert distribution

• Script-based enrollment

• Global Grid Forum documentso Policy Management Authority Chartero OCSP (Online Certificate Status Protocol) Requirements

For Gridso CA Policy Profiles

Page 22: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

22

Grid Integrated RADIUS Authentication Fabric

• RADIUS routing of authentication requests

• Support One-Time Password initiativeso Gateway Grid and collaborative uses: standard UI and APIo Provide secure federation point with O(n) agreementso Support multiple vendor / site OTP implementationso One token per user (SSO-like solution) for OTP

• Collaboration between ESnet, NERSC, a RADIUS appliance vendor, PNNL and ANL are also involved, others welcome

• White paper/report ~ 01 Sep 2004 to support early implementers, proceed to pilot

• Project pre-proposal: http://www.doegrids.org/CA/Research/GIRAF.pdf

Page 23: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

23

Collaboration Service

• H323 showing dramatic increase in usage

Page 24: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

24

Grid Network Services Requirements (GGF, GHPN)

• Grid High Performance Networking Research Group, “Networking Issues of Grid Infrastructures” (draft-ggf-ghpn-netissues-3) – what networks should provide to Gridso High performance transport for bulk data transfer (over 1Gb/s

per flow)

o Performance controllability to provide ad hoc quality of service and traffic isolation.

Dynamic Network resource allocation and reservation

o High availability when expensive computing or visualization resources have been reserved

o Security controllability to provide a trusted and efficient communication environment when required

o Multicast to efficiently distribute data to group of resources.

o Integrated wireless network and sensor networks in Grid environment

Page 25: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

25

Priority Service

• So, practically, what can be done?

• With available tools can provide a small number of provisioned, bandwidth guaranteed, circuitso secure and end-to-end (system to system)

o various Quality of Service possible, including minimum latency

o a certain amount of route reliability (if redundant paths exist in the network)

o end systems can manage these circuits as single high bandwidth paths or multiple lower bandwidth paths of (with application level shapers)

o non-interfering with production traffic, so aggressive protocols may be used

Page 26: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

26

Guaranteed Bandwidth as an ESNet Service

usersystem2

usersystem1

site B

polic

er

site A

• will probably be service level agreements among transit networks allowing for a fixed amount of priority traffic – so the resource manager does minimal checking and no authorization

• will do policing, but only at the full bandwidth of the service agreement (for self protection)

resourcemanager

auth

oriz

atio

n

resourcemanager

resourcemanager

allocation will probably be

relatively static and ad hocbandwidth

broker

• A DOE Network R&D funded project

usersystem2

Phase 1

Phase 2

Page 27: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

27

Network Monitoring System

• Alarms & Data Reductiono From June 2003 through April 2004 the total number

of NMS up/down alarms was 16,342 or 48.8 per day.

o Path based outage reporting automatically isolated 1,448 customer relevant events during this period or an average of 4.3 per day, more than a 10 fold reduction.

o Based on total outage duration in 2004, approximately 63% of all customer relevant events have been categorized as either “Planned” or “Unplanned” and one of “ESnet”, “Site”, “Carrier” or “Peer”

• Gives us a better handle on availability metric

Page 28: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

28

2004 Availability by MonthU

nava

ilabl

e M

inut

esJan. – June, 2004 – Corrected for Planned Outages

(More from Mike O’Connor)

>99.9% available <99.9%available

Page 29: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

29

ESnet Abilene Measurements

• 3 ESnet Participants

o LBL

o FERMI

o BNL

• 3 Abilene Participants

o SDSC

o NCSU

o OSU

• We want to ensure that the ESnet/Abilene cross connects are serving the needs of users in the science community who are accessing DOE facilities and resources from universities or accessing university facilities from DOE labs.

• Measurement sites in place:

• More from Joe Metzger

Page 30: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

30

OWAMP One-Way Delay Tests Are Highly Sensitive

• NCSU Metro DWDM reroute adds about 350 micro seconds

Fiber Re-Route

42.041.941.841.741.641.5

ms

Page 31: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

31

ESnet Trouble Ticket System

• TTS used to track problem reports for the Network, ECS, DOEGrids, Asset Management, NERSC, and other services.

• Running Remedy ARsystem server and Oracle database on a Sun Ultra workstation.

• Total external ticket = 11750 (1995-2004), approx. 1300/year

• Total internal tickets = 1300 (1999-2004), approx. 250/year

Page 32: 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael

32

Conclusions• ESnet is an infrastructure that is critical to DOE’s

science mission and that serves all of DOE

• Focused on the Office of Science Labs

• ESnet is working on providing the DOE mission science networking requirements with several new initiatives and a new architecture

• QoS service is hard – but we believe that we have enough experience to do pilot studies

• Middleware services for large numbers of users are hard – but they can be provided if careful attention is paid to scaling