us cms tier1 facility networkcd-docdb.fnal.gov/0041/004157/001/uscms-t1-chep10-v1.2.pdfsla monitor,...

20
US CMS Tier1 Facility Network Andrey Bobyshev (FNAL) Andrey Bobyshev (FNAL) Phil DeMar (FNAL) CHEP 2010 Academia Sinica Ti iTi T aipei, T aiwan

Upload: others

Post on 20-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

US CMS Tier1 Facility Network

Andrey Bobyshev (FNAL)Andrey Bobyshev (FNAL)Phil DeMar (FNAL)

CHEP 2010Academia Sinica

T i i T iTaipei, Taiwan

Page 2: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Outline of the talk :

USCMS Tier1 Facility Resources

Data Model/requirements

Current status

Circuits/CHIMAN/USLHC Net

Network tools graphs and snapshots Network tools graphs and snapshots

Page 3: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Summary of CMS resources

Tape (TB) Disk (TB)

CPU (kHS06)

Switzerland CH CERN 21600 4500 106100Switzerland CH-CERN 21600 4500 106100France FR-

CCIN2P33876 1342 11146

Germany DE KIT 5700 1950 15000Germany DE-KIT 5700 1950 15000Italy IT-INFN-

CNAF5200 2200 15000

S i ES PIC 2424 902 6971Spain ES-PIC 2424 902 6971Taiwan TW-ASGC 2000 1800 16000UK UK-T1-RAL 4192 1560 12058USA US-FNAL-

CMS21000 6500 56000

http://gstat-wlcg.cern.ch/apps/pledges/

Page 4: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

CMS resources (2011 pledges)

106100100000

200000 CPU(HEP-SPEC06)

56000

6971 15000 15000 11146 12056 16000

0

100000

10000150002000025000

Disk(TB)

05000

10000 Tape(TB)

http://gstat-wlcg.cern.ch/apps/pledges/

Page 5: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Model of USCMS-T1 Network traffic

T0 Tier2s/Tiers1

2.2Gbps 3.2Gbps3-10Gbps

cmsstor/dCache nodesFederated File System

EnStore Tape Robots

CMS-LPC/SLB Clusters

QoS

30-80Gbps

BlueArc NAS

Interactive users

10-20Gbps 1GbpsQoS

Data processing /~1600 Worker nodes

Page 6: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Upgrade of Core Fabric Recently Completed

Cisco 6509-based core

Cisco Nexus 7000-based coreCisco Nexus 7000-based core

Page 7: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

USCMS-T1 Network in 2010-11S C

20G

40G

20G

r-s-core-fccr-s-bdrr-s-starlight-fnal

r-s-core-gccDCNCHIMAN

USLHCNETLHCOPNUSCMS-T2sCMS-T1s

Tape Robot

80-160GbE L2

20G

10GbE

C4948E Nexus 7000

20GbE L320GbE vPC-peer

80G

10GbE80G

Nexus 7000 C4948E40G

2x1GbE

GCC-CRCFCC2FCC3

80G40G

GCC CRB

80G40G 40G 40G 40G

$AB, 2010-10-01

GCC-CRA GCC-CRB

Page 8: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

USCMS Tier1 Network in 2013-14

r-s-bdr

End-To-End CircuitsSite Network

site-core1 site-core2ESnet / 100GE

?

100GbE

?

10GbEN x 100GbE

10GbEvPC

Page 9: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

QoS Classes of Traffic in USCMS-T1

1.6Gbps

Network use (2%)Interactive (2%)

Real-Time (Database,monitoring) (2%)1.6Gbps

1.6Gbps

Critical (34%)

27 2Gbps

1.6Gbps

NAS (10%)8 Gbps

27.2Gbps

Best effort (50%)40Gbps

Page 10: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Redundancy Within the Tier-1

Today: FCC2 Nexus is core switch fabric

• GCC-CRB Nexus = redundant core Switches connected at 4 (or 8) x 10GE

• Virtual Port channel (vPC)( )• Gateway Load-balancing protocol (gLPB)

for failover to redundant core

Near Future: Interconnected Nexus’s @ 80-160Gb/s

Function as distributed fabric• Function as distributed fabric Switches still configured w/ vPC & gLPB

• Most connections to nearest core switch

Page 11: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Off-Site Traffic: End-to-End Circuits

Circuit Country Affiliation BWLHCOPN Switzerland T0 8.5GLHCOPN Secondary Switzerland T0 8.5G

USCMS-T1 has a long history of using ESNet/SDN and Internet2 DCN circuits

yLHCOPN Backup Switzerland T0 3.5G

DE-KIT Germany T1 1GIN2P3 France T1 2x1GASNet/ASGC Taiwan T1 2.5G

CALTECH USA T2 10GPurdue USA T2 10GUWISC USA T2 10GUFL USA T2 10GUNL USA T2 10GMIT USA T2 10GUCSD USA T2 10G

TIFR India T2 1GUTK USA T3 1G

McGill Canada CDF/D0 1GCesnet, Prague Czech D0 1G

SLA monitor, IOS Track objects to automatically fail over traffic if circuit is down

Page 12: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

PerfSonar Monitoring

Monitoring status of circuitsg Alert on a change of link status Utilization PingEr RTT measurements PerfSonar-BUOY – Active measurements, BWCTL & OWAMPOWAMP

o Two NPI Took kit boxes

Two LHCOPN/MDM monitoring boxes

Page 13: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Circuits SLA monitoring (Nagios)

•Each circuit has an SLA monitor running icmp-echo application.

• Status of SLA monitor is tracked by SNMP

Page 14: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Weather Map Tools (I)

SNMP-Based

Page 15: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Weather Map Tools (II)

Flow Data BasedFlow Data-Based

Drill-downCapabilitiesp

Page 16: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Work in progress:

New monitoring approaches Any-to-Any/cloud-to-cloud performance Any production host can become an element of monitoring

infrastructure Combination of passive and active measurementsp

10GBase-T/IEEE 802.3an for end system Intel E10G41AT2 NIC PCI-Express AristaNetworks DCS-7120T-4S (as ToR solution) Directly to Nexus7K (via fiber at the moment) Cat6E regular physical infrastructure deployed ~100mg p y p y

Page 17: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Summary

Two buildings, four computing rooms (fifth one is coming)

T N 7000 it h f 10G ti Two Nexus 7000 switches for 10G aggregation, interconnected 80-160Gbps

2 x10Gbps to the Site Network (read/write data to tapes) 10Gbps to the Border Router (non-US Tier2s other LHC- 10Gbps to the Border Router (non US Tier2s, other LHCrelated traffic)

20Gbps to ard ESNET CHIMAN and USLHCNET 20Gbps toward ESNET CHIMAN and USLHCNET, SDN/DCN/E2E circuits

Page 18: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Summary (continued)

~200 dCache nodes with 2x1GE ~1600 worker nodes with 1GE 150 i ~150 various servers 2X 20G for BlueArc NAS storage C6509 access switches connected by 40-80GE Redundancy/loadsharing at L2 (vPC) and L3 (GLBP) IOS based Server Load Balancing for interactive clusters 19 SDN/DCN End-To-End Circuits 19 SDN/DCN End To End Circuits Virtual port channelling (vPC) QoS, 5 major classes of traffic

Page 19: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

Additional slidesAdditional slides

Page 20: US CMS Tier1 Facility Networkcd-docdb.fnal.gov/0041/004157/001/USCMS-T1-CHEP10-v1.2.pdfSLA monitor, IOS Track objects to automatically fail over traffic if circuit is down PerfSonar

USCMS Tier1: Data Analysis Traffic

R CMS FCC2 2

R-CMS-FCC2-3 18 of 20G29.8 of 30G

R-CMS-FCC2 R-CMS-FCC2-2

30 of 40G

65 of 80G65 of 80G18 of 30G

R-CMS-N7K-GCC-B20 of 20G

20 0f 30G30G

S CMS GCC 4

R CMS N7K GCC B

25 of 30G

S-CMS-GCC-1S-CMS-GCC-3

S CMS GCC 5

S-CMS-GCC-4

S-CMS-GCC-6

https://fngrey fnal gov/wm/uscms S-CMS-GCC-5https://fngrey.fnal.gov/wm/uscms