Nouvelle architecture réseau du CC
Guillaume Cessieux
Équipe réseaux CC-IN2P3
Séminaire CC
2010-12-03
Latest major network upgrade: Mid 2009
Backbone 20G → 40G
– No topology change, only additional bandwidth
Linking abilities tripled
– Distribution layer added
2010-12-03GCX 2
ComputingStorage
FC+TAPE
Storage
SATA
ComputingStorage
FC+TAPE
Storage
SATA
Previous network architecture: Mid 2009
2010-12-03GCX 3
36 computing racks
34 to 42 server per rack
1x10G uplink
1G per server
Data FC
(27 servers)
Data SATA
816 servers
in 34 racks
10G/server
Tape
10 servers
2x1G per server
10G/server
1 switch/rack
(36 access switches)
48x1G/switch
3 distributing switches
Linked to backbone with 4x10G
Computing
…
24 servers per switch
34 access switches with
Trunked uplink 2x10G
Linked to backbone with 4x10G
…
2 distributing switches
Storage
Backbone 40GWAN
Reaching limits (1/2)
2010-12-03GCX 4
Same 40G path 1 year later
20G uplink of a distribution switch:
20G40G
Reaching limits (2/2)
Clear traffic increase
– More hosts exchanging more
– More remote exchanges
Limits appearing
– Disturbing bottlenecks
– Long path before being routed
Upcoming challenges– New computing room, massive data transfers, virtualization, heavy Grid
computation
2010-12-03GCX 5
Usage of one 40G backbone etherchannel
10G direct link with CERN
Sample host average: 620M on 1G
Complete network analysis performed
Inventory
– 20 network devices found
• Thanks discovery protocols…
– Software and features not harmonised
Topology
– A map worth anything
Usage
– Traffic patterns, bottlenecks
2010-12-03GCX 6
switch>show cdp neighborsCapability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
s1.in2p3.fr.
Ten 2/1 146 R S I WS-XXXXX- Ten 1/50
s2.in2p3.fr.
Ten 3/3 130 R S I WS-XXXXX- Ten 7/6
s3.in2p3.fr.
Ten 3/4 150 R S I WS-XXXXX- Ten 6/6
Requirements for new network architecture
More bandwidth!
Able to scale for next years
– Allowing non disruptive network upgrade
– Particularly with new computing room
Ease exchanges betwen major functional areas
As usual: Good balance between risks, costs and
requirements
2010-12-03GCX 7
Main directions
Target non blocking mode
No physical redundancy, meshes etc. – “A single slot failure in 10 years on big Cisco devices”
– Too expensive and not worth for us
– High availability handled at service level (DNS…)
– Big devices preferred to meshed bunch of small
Keep it simple
– Ease configuration and troubleshooting
– Avoid closed complex vendor solutions
• e.g things branded “virtual”, “abstracted”, “dynamic”
2010-12-03GCX 8
Services
(db, Grid, monitoring…)
AFS
GPFS
TSM
Remote workers
dcache
Xrootd
SRB
HPSS
New network architecture
2010-12-03GCX 9
60G
x4 40G
10G
WAN (generic not LHCOPN)
60G Workers
x3
10G
20G
60G
20G
Area Old bandwidth New bandwidth
AFS 30G shared 30G
GPFS TSM 40G shared 60G
Dcache Xrootd srb HPSS 40G shared 120G
Workers 40G shared 170G
WAN 10G 20G
From 160G often
shared to 400G wirespeed
Current status - http://netstat.in2p3.fr/
2010-12-03GCX 10
Edge
Thumpers Thumpers
AFS
Services
Thors
Workers
CINES
Thors & DellThors
Thors
40G
9x20G
4x40G4x10G
20G
4x60G
5x20G
WAN upgrade
2010-12-03GCX 11
RENATER
LHCOPN
Circuits to IN2P3 Laboratories
10G+100M+10M+4M+2M
GÉANT2
Internet
NRENs
MCU VPN
2x1G
100M1G
LAN
Dual homing of hosts doing
massive data transfers
(dcache for WLCG)
ccpn-inter
x3
New 10G links
NRENs
NRENs
Chicago
2x1G/host
ccpn-opn
2x1G/host
RENATER GÉANT2
2010-12-03GCX 12
LHCOPN: LHC Optical Private Network
2010-12-03GCX 13
CC - CERNCC – DE-KIT
2010-12-03 14
workers CINES
Storage
GPFS
Storage
Services
LHCOPN
Storage and
LHCOPN services
(dache, fts, vobox)
Offices &
telecom
services
10G
2x1G
Fermilab
dcache
Xrootd
SRB
HPSS
…
IN2P3 Laboratories
RENATER
Lyon
CC IN2P3
Outside
20G
20G
10G
GCX
Current status
GCX 15
120G60G
30G
80G 160G20G
Workers - 160G
INTER – 20G
Services - 80G
AFS - 30G
GPFS - 30G
dcache - 120G
~40%
~25%
~18%
~2%
~1%
~16%
Main network devices and configurations used
New core device: Nexus 7018– High density device, really scalable
– Very modular: slots & switching engine cards
– 80G backplane per slot (8x10G non blocking)
• Initial configuration: 6 slots, 3 switching engines (3x48G)
– This device is vital
• 4 power supplies on 2 UPS, 2 managements slots
2010-12-03GCX 16
Compatibility check is done:
Mod boot Impact Install-type
------ ------ ------------------ -------------
1 yes non-disruptive rolling
2 yes non-disruptive rolling
3 yes non-disruptive rolling
4 yes non-disruptive rolling
5 yes non-disruptive rolling
6 yes non-disruptive rolling
9 yes non-disruptive reset
10 yes non-disruptive reset
2010-12-03GCX 17
10 remaining slots!
2 extra switching
engines possible
2010-12-03GCX 18
32x10G (8x10G non bloquants)
Switching engine 48G
Main network devices and configurations used
• 24x10G (12 blocking)
+ 96x1G
+ 336x1G blocking (1G/8ports)
• 48x10G (24 blocking)
+ 96x1G
• 64x10G (32 blocking)
48x1G + 2x10G
6509
6513
4948
4900 16x10G
Core & Edge
Distribution
Access
2010-12-03 19GCX
2010-12-03GCX 20
4900: 16x10G
2010-12-03GCX 21
4948: 48x1G + 2x10G
2010-12-03GCX 22
Uplink 1x10G
Timeline
2010-12-03GCX 23
June July August Sept. October 2010
Sept. 21st
19h 20h 21h 22h 23h
Reload of border routers
Capacity upgrade and reconfiguration
Reload of core routers
Reconfiguration of core network
Reconfiguration of satellite devices
Software upgrade on satellite devices
Fixing remaining problems !
4 people during 5h
Border
~5 devices
Core
~15 devices
Satellite
~150 devices
Testing new software and upgrade process
Testbed with new core configuration
Wiring, final preparation and organisation
Nexus received
Offline preparation and final design
A 3 months preparation
Testing, preconfiguring, scripting, checklist
Optical wiring: Fully done and tested before
2010-12-03GCX 24
~60 new fibres
> 1km of fibre deployed!
Feedbacks
No major surprise!– Heavy testing phase was fruitful
– Main issue: Some routes not correctly announced
• Not detected nor understood, but workaround found
Keep monitoring, but deactivate alarms– Spare 800 SMS and 6k e-mails to each team members
Do not parallelize actions too much
– Hard to isolate faults or validate actions
2010-12-03GCX 25
• Routing resilience hiding such problem
• Snapshot routing tables with a
traceroute to each route!
Key benefits
Increased core capacity by 2.5
Isolated areas, delivering wire speed,
removed bottlenecks, shortened paths
Seamless capacity upgrade now possible
Harmonised softwares and features on 170
devices
– From 37 different versions to 13
2010-12-03GCX 26
100G test (1/3)
2010-12-03GCX 27
This slide was for eyes only.
100G test (2/3)
2010-12-03GCX 28
1
3
2
4
9
5
7
6
8
Lyon, CCIN2P3 Geneva, CERN150 km
10
b1
b2
b3
b4
b5
1
3
2
4
9
5
7
6
8
1λ 100G
10
1
3
2
4
9
5
7
6
8
10
1
2
3
4
ccteng01
ccteng02
ccperfsonar
cccata-test100g
b1
b2
b3
b4
ccteng02-2
10G
10G
10G
10G
10x10G
100G test (3/3)
This slide was for eyes only.
AOB
Nouvelle salle machine
– 2em Nexus, switching, 160G
– Non autonome: extension
Serveur VPN SSL
– En cours de validation...
https://cctelecom.in2p3.fr/netacl/
2010-12-03GCX 30
LAN room
A
LAN room
B
Target: 160G
WAN
20G
Conclusion
Flat to starred network architecture
– Closely matching our needs
Average network usage down from 60% to
~15%
Ready to face traffic increase for some more
time
– How long?
2010-12-03GCX 31