nexus 5000 architecture - networks, wan / lan casa · cap-ex and op-ex each connection adds ... 7...
TRANSCRIPT
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
1
© 2008 Cisco Systems, Inc. All rights reserved. Cisco PublicBRKDCT-282514651_05_2008_c1 2
Nexus 5000 Architecture
BRKDCT-2825
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
2
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 3BRKDCT-282514651_05_2008_c1
Session Objectives
Understand the rationale behind I/O consolidation
Understand the Nexus 5000 architecture
Describe the data path inside a Nexus 5000
At the End of the Session, the Participants Should Be Able to:
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 4BRKDCT-282514651_05_2008_c1
Agenda
I/O Consolidation, the Reasons
I/O Consolidation, the Solution
Nexus 5000, System Hardware Overview
Nexus 5000, Internal Architecture
Nexus 5000, Fabric Data Path
Nexus 5000, Forwarding and Policy Enforcement
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
3
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 5BRKDCT-282514651_05_2008_c1
Before I/O Consolidation
Parallel LAN/SAN InfrastructureInefficient use of Network Infrastructure5+ connections per server—higher adapter and cabling costs
Adds downstream port costs; cap-ex and op-exEach connection adds additional points of failure in the fabric
Longer lead time for server provisioningMultiple fault domains—complex diagnosticsManagement complexityEthernet FC
LAN SAN BSAN A
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 6BRKDCT-282514651_05_2008_c1
LAN SAN BSAN A
I/O Consolidation
Reduction of server adaptersSimplification of access layer and cablingGateway free implementation—fits in installed base of existing LAN and SANL2 Multipathing Access—DistributionLower total cost of ownershipFewer cablesInvestment protection (LANs and SANs)Consistent operational model
Enhanced Ethernet and FCoE Ethernet FC
Nexus 5000Nexus 5000
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
4
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 7BRKDCT-282514651_05_2008_c1
Recipe for a Consolidated Access Layer
Converged NetworkAdapter (CNA)
Converged NetworkAdapter (CNA)
An Enhanced 10GE Adapter Capable of Fibre Channel over
Ethernet Encapsulation
An Enhanced 10GE Adapter and a Software Layer for FCoE
Encapsulation
Unified FabricUnified Fabric
A 10 Gigabit Ethernet Switch with Native Fibre Channel over
Ethernet Support
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 8BRKDCT-282514651_05_2008_c1
Cable TransceiverLatency
Power(each side)DistanceTechnology
Twinax ~0.25μs~0.1W10mSFP+ CUCopper
MM OM1MM OM3 ~0.1μs1W33m
300mSFP+ SRshort reach
MM OM2MM OM3 ~0.1μs1W10m
100mSFP+ USR
ultra short reach
Cat6Cat6a/7Cat6a/7
2.5μs2.5μs1.5μs
~8W~8W~4W
55m100m30m
10GBASE-T
100Mb 1Gb 10Gb
UTP Cat 5 UTP Cat 5SFP Fiber
10Mb
UTP Cat 3
Mid 1980’s Mid 1990’s Early 2000’s Late 2000’s
X2SFP+ Cu (BER better than 10 )
SFP+ FiberCat 6/7
-18
Evolution of Ethernet Physical Media
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
5
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 9BRKDCT-282514651_05_2008_c1
SFP+ Ethernet Interconnect
Smallest 10GE form factorHot swappableOptical SFP+ interoperates with other 10GE modules
XFPXENPACX2
Nexus 5000 support followingSFP+ Copper “direct connect”
1m, 3m, 5m (10m future)TwinAx cable (thin)Cables are pre-terminated (lower cost)
Optical FiberSR opticsLR (future)
SFP+ Optical Module
SFP+ Copper
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 10BRKDCT-282514651_05_2008_c1
Management protocol for Enhanced Ethernet capabilities
Enable consistent management of Quality of Service at the network level by providing consistent scheduling
Enable multiple traffic types to share a common Ethernet link without interfering with each other
Benefit
Data Center Bridging Exchange, DCBX
Class of Service Based Bandwidth ManagementIEEE 802.1Qaz
Lossless Ethernet IEEE 802.1Qbb
Feature/Standard
“Enhanced” Ethernet?
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
6
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 11BRKDCT-282514651_05_2008_c1
Class-Based Fabric Services
Priority Based Bandwidth Management
Priority Based Bandwidth Management
Enables intelligent sharing of bandwidth between traffic classes control of bandwidth802.1Qaz Enhanced Transmission
Offered Load
t1 t2 t3
Realized Load
3G/s HPC Traffic3G/s
2G/s
3G/sStorage Traffic3G/s
3G/s
LAN Traffic4G/s
5G/s3G/s
t1 t2 t3
3G/s 3G/s
3G/s 3G/s 3G/s
2G/s
3G/s 4G/s 6G/s
Priority Based Flow ControlPriority Based Flow Control
Enables lossless behavior for each class of servicePAUSE sent per priority when buffers limit exceeded
Transmit QueuesEthernet Link
Receive Buffers
ZeroZero ZeroZero
OneOne OneOne
TwoTwo TwoTwo
FiveFive FiveFive
FourFour FourFour
SixSix SixSix
SevenSeven SevenSeven
ThreeThree ThreeThreeSTOP PAUSE
20%
30%
50%
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 12BRKDCT-282514651_05_2008_c1
Nexus 5000: System Hardware Overview
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
7
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 13BRKDCT-282514651_05_2008_c1
Industry’s First I/O Consolidation Virtualization Fabric for Enterprise Data Center
Industry’s First I/O Consolidation Virtualization Fabric for Enterprise Data Center
Nexus 5020Switch
56-Port L2 Switch40 Ports 10GE/FCoE, fixed2 Expansion Modules
FC + Ethernet 4 Ports 10GbE/FCoE 4 Ports 1/2/4G FC
Fibre Channel 8 Ports 1/2/4G FC
ExpansionModules
Ethernet 6 Ports 10GE/FCoE
Nexus 5000, Product Portfolio
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 14BRKDCT-282514651_05_2008_c1
Replaceable Components on the Front for Easy AccessReplaceable Components on the Front for Easy Access
Front Panel
N+1 Redundant FansDual Redundant Power Supplies
NX5020NX5020
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
8
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 15BRKDCT-282514651_05_2008_c1
Rear Panels
Expansion Modules
Cables Connect in the Rear for Ease of Server WiringCables Connect in the Rear for Ease of Server Wiring
Power Entry
Base 10GE 10/100/1000
Out of Band Management Console
All 10GE Ports Are FCoE Capable!Nx5020Nx5020
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 16BRKDCT-282514651_05_2008_c1
Internal Architecture
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
9
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 17BRKDCT-282514651_05_2008_c1
10 GE& FC
Intel 3100 PCI C
ontroller
IntelLV Xeon
(1.66 GHz)
FLASH
NVRAM
Serial
PCIe
1GE
10 GE Interfaces
10 GE Interfaces 10 GEInterfaces
RS-232 Console
10/100/1000 Management
XAUI
XFI
Dual NICXAUI
XFI
UnifiedPort
Controller
XAUI
XFI
UnifiedPort
Controller
UnifiedPort
Controller
10 GE
UnifiedPort
Controller
UnifiedPort
Controller
UnifiedPort
Controller
1/2/4 Gbps Fibre Channelto Storage Network
Memory
UnifiedCrossbar
Fabric
UnifiedPort
Controller
Dual NIC
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFPxcvr
SFPxcvr
SFPxcvr
SFPxcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
SFP+xcvr
Hardware Architecture
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 18BRKDCT-282514651_05_2008_c1
Unified Crossbar Fabric
58 port crossbar and schedulerThree unicast and one multicast crosspoints
Central tightly coupled schedulerRequest, propose, accept, grant, acknowledge semanticsPacket enhanced iSLIP scheduler
Distinct unicast and multicast schedulersEight classes of service
Egress buffer creditsDWRR class of serviceDWRR ingress interface
7Metal Layers
24.6 MbitsTotal SRAM
232 @ 3.75GbpsSerDes
1286Signal Pins
~200 MillionTransistors
12.4 MillionGates
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
10
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 19BRKDCT-282514651_05_2008_c1
Unified Port Controller
Media access controllers1/10G Ethernet and 1/2/4G Fibre Channel
Packet buffering and queuingTotal of 1.875 MBytes used in four slices
Forwarding controllerEthernet and Fibre ChannelLayered policy engine
Four data path slicesOne 1/10G Ethernet or two 1/2/4G Fibre Channel portsConnects to one UCF port
All switching done in UCF crossbar480 KBytes of buffering
1 MbitTotal TCAM
7Metal Layers
35 MbitsTotal SRAM
32 @ 3.75GbpsSerDes
900Total Pins
~300 MillionTransistors
18 MillionLogic Gates
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 20BRKDCT-282514651_05_2008_c1
Unified Crossbar Fabric
Unified Port Controller
Slice 4Slice 2 Slice 3
Switch ASIC Architecture
Slice 1
Forwarding
?
1/10G MAC
Transceiver
VirtualQueues
EgressQueues
PacketBuffer
VirtualQueues
EgressQueues
PacketBuffer
Unified Port Controller
Slice 2 Slice 3Slice 1
VirtualQueues
EgressQueues
PacketBuffer
Slice 4
4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps
XAUI – 10 Gbps4 @ 3.125G
Transceiver
XAUI – 10 Gbps4 @ 3.125G
1/10GE Attached Server
10GE LAN Uplink
58 source busses in total
Parsing &Editing
Forwarding
?Forwarding
Parsing &Editing
1/10G MACFCMAC
SAN B
FCMAC
Fibre ChannelSAN Uplinks
1/2/4G Fibre Channel1 @ 1.0625/2.125/4.25G
Parsing &Editing
Fabric Buffer Fabric Buffer Fabric BufferUnicast and
Multicast Schedulers
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
11
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 21BRKDCT-282514651_05_2008_c1
Unified Crossbar Fabric
Unified Port Controller
Slice 4Slice 2 Slice 3
Switch ASIC Architecture
Slice 1
Forwarding
?
1/10G MAC
Transceiver
VirtualQueues
EgressQueues
PacketBuffer
VirtualQueues
EgressQueues
PacketBuffer
Unified Port Controller
Slice 2 Slice 3Slice 1
VirtualQueues
EgressQueues
PacketBuffer
Slice 4
4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps
XAUI – 10 Gbps4 @ 3.125G
Transceiver
XAUI – 10 Gbps4 @ 3.125G
1/10GE Attached Server
58 source busses in total
Parsing &Editing
Forwarding
?Forwarding
Parsing &Editing
1/10G MACFCMAC
FCMAC
Fibre ChannelSAN Uplinks
1/2/4G Fibre Channel1 @ 1.0625/2.125/4.25G
Parsing &Editing
Fabric Buffer Fabric Buffer Fabric BufferUnicast and
Multicast Schedulers
1. Decode, align, synchronize bytesDecrypt, verify, authenticate frames
2. Extract frame fieldsAdd/remove headers and edit frame contents
SAN B
10GE LAN Uplink
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 22BRKDCT-282514651_05_2008_c1
Unified Crossbar Fabric
Unified Port Controller
Slice 4Slice 2 Slice 3
Switch ASIC Architecture
Slice 1
Forwarding
?
1/10G MAC
Transceiver
VirtualQueues
EgressQueues
PacketBuffer
VirtualQueues
EgressQueues
PacketBuffer
Unified Port Controller
Slice 2 Slice 3Slice 1
VirtualQueues
EgressQueues
PacketBuffer
Slice 4
4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps
XAUI – 10 Gbps4 @ 3.125G
Transceiver
XAUI – 10 Gbps4 @ 3.125G
1/10GE Attached Server
58 source busses in total
Parsing &Editing
Forwarding
?Forwarding
Parsing &Editing
1/10G MACFCMAC
FCMAC
Fibre ChannelSAN Uplinks
1/2/4G Fibre Channel1 @ 1.0625/2.125/4.25G
Parsing &Editing
Fabric Buffer Fabric Buffer Fabric BufferUnicast and
Multicast Schedulers
3. Evaluate frame fields for forwarding, filtering, and editing
4. Store frame content when waiting
5. Queue frames and manage crossbar service requests
SAN B
10GE LAN Uplink
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
12
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 23BRKDCT-282514651_05_2008_c1
Unified Crossbar Fabric
Unified Port Controller
Slice 4Slice 2 Slice 3
Switch ASIC Architecture
Slice 1
Forwarding
?
1/10G MAC
Transceiver
VirtualQueues
EgressQueues
PacketBuffer
VirtualQueues
EgressQueues
PacketBuffer
Unified Port Controller
Slice 2 Slice 3Slice 1
VirtualQueues
EgressQueues
PacketBuffer
Slice 4
4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps
XAUI – 10 Gbps4 @ 3.125G
Transceiver
XAUI – 10 Gbps4 @ 3.125G
1/10GE Attached Server
58 source busses in total
Parsing &Editing
Forwarding
?Forwarding
Parsing &Editing
1/10G MACFCMAC
FCMAC
Fibre ChannelSAN Uplinks
1/2/4G Fibre Channel1 @ 1.0625/2.125/4.25G
Parsing &Editing
Fabric Buffer Fabric Buffer Fabric BufferUnicast and
Multicast Schedulers
6. Match requests, available outputs, and fairness criteria
7. Landing place for frames in flight
8. Extract frame fieldsAdd/remove headers and edit frame contents
SAN B
10GE LAN Uplink
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 24BRKDCT-282514651_05_2008_c1
SAN B
Unified Crossbar Fabric
Unified Port Controller
Slice 4Slice 2 Slice 3
Switch ASIC Architecture
Slice 1
Forwarding
?
1/10G MAC
Transceiver
VirtualQueues
EgressQueues
PacketBuffer
VirtualQueues
EgressQueues
PacketBuffer
Unified Port Controller
Slice 2 Slice 3Slice 1
VirtualQueues
EgressQueues
PacketBuffer
Slice 4
4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps 4 @ 3.75G – 12Gbps
XAUI – 10 Gbps4 @ 3.125G
Transceiver
XAUI – 10 Gbps4 @ 3.125G
1/10GE Attached Server
58 source busses in total
Parsing &Editing
Forwarding
?Forwarding
Parsing &Editing
1/10G MACFCMAC
FCMAC
Fibre ChannelSAN Uplinks
1/2/4G Fibre Channel1 @ 1.0625/2.125/4.25G
Parsing &Editing
Fabric Buffer Fabric Buffer Fabric BufferUnicast and
Multicast Schedulers
9. Evaluate frame fields for, filtering,
and editing
10. Encrypt frames and encode bytes
10GE LAN Uplink
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
13
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 25BRKDCT-282514651_05_2008_c1
Switch Fabric Data Path
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 26BRKDCT-282514651_05_2008_c1
Media Access Controllers
Each Unified Port Controller (UPC) slice has…One 1 Gigabit Ethernet MAC
Cisco MDS and Catalyst 4000 lineage
One 10 Gigabit Ethernet MAC
Purchased from “More-than-IP”
Validated by University of New Hampshire testing
Two 1/2/4 Gigabit Fibre Channel MACs
Cisco MDS lineage
Two of the slices in each UPC have an 802.1AE LinkSecencryption engine
Integrated Flow Control handlingEthernet—802.3X “PAUSE” and Cisco Priority Flow Control
Fibre Channel—BB_credits
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
14
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 27BRKDCT-282514651_05_2008_c1
Crossbar Overview
Tightly coupled scheduler and crosspoint20% link speedup (12 Gbps)
Unicast schedulerVirtual output queuing3x fabric speed up
3 crosspointsMultiple frames transferred per scheduling event
“Superframing”
Multicast schedulerSystem class queuingSeparate crosspointFanout splitting, grant coalescing, and retry
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 28BRKDCT-282514651_05_2008_c1
SwitchFabric
Unicast Virtual Output Queuing
Eliminates head-of-line blockingFrames for idle outputs bypass congested outputs
Effective use of crossbar resourcesScheduler “maximally matches”desired connectivity
Ingress stores frame is packet bufferKeeps list of packets to each egressPort and system class
448 queues for each ingress port
Scheduler notified about desired connectionsScheduler maximizes throughput
Egress schedulingFairness among ingress portsCrossbar usage
Port 1Packet Buffer
Port
2
Port 3
Packet Buffer
Pack
et
Buf
fer
Port 4Packet B
uffer
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
15
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 29BRKDCT-282514651_05_2008_c1
Day in the Life of a Unicast Frame
1. Frame pointer posted to virtual output queue
Frame data in packet buffer
2. VOQ posts request to scheduler3. Scheduler arbitrates and
grants accessAllocates crossbar
4. Frame sent to fabric buffer5. Fabric buffer sends to egress
Notifies dcheduler
6. Egress sends frame on wire7. Egress indicates freed buffer resources
SwitchFabric
Port 1Packet Buffer
Port
2
Port 3
Packet Buffer
Pack
et
Buf
fer
Port 4Packet B
uffer
Scheduler
1
2
3
4
5
6
7
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 30BRKDCT-282514651_05_2008_c1
Scheduler Overview
Request Proposal/mandateGrant
Accept
EgressScheduler
EgressScheduler
EgressScheduler
IngressScheduler
IngressScheduler
IngressScheduler
VOQs IF1
VOQs IF2
VOQs IF3
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
16
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 31BRKDCT-282514651_05_2008_c1
Unicast Scheduler Algorithm
Egress Scheduler A priority is selected
Fixed priority, orDWRR
An ingress is selected within that priorityHighest priority “current preferred” ingress is given a “mandate”iSLIP maximally matches remaining requesters
Ingress SchedulerEgress Schedulers make a proposalIngress Scheduler selects an egress
Fixed Round Robin selectionThe selected Egress Scheduler updates its own “current preferred”
In multi-pass scheduling, this step happens only for first-pass selections
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 32BRKDCT-282514651_05_2008_c1
Multicast, Fabric Replication
Use Cases
Ethernet multicast
CC
Ingress Fabric Egress
BMMcast
AUcast
BMcast
C
Mcast
AU-VOQU-VOQ
BBU-VOQU-VOQ
U-VOQU-VOQ
M-VOQM-VOQAA
Mcast
A
Mcast
A
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
17
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 33BRKDCT-282514651_05_2008_c1
Multicast MAC Lookups
MAC table32K entries total (unicast, multicast, Fibre Channel)
1K entries (software setting) for multicast
Populating multicast MAC tableIGMP snooping
Static
Multicast MAC lookup missSource only multicast (for L3 multicast)
Forward frame to interfaces linked to multicast routers
Learned via PIM snooping
Flooding (for L2 multicast)
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 34BRKDCT-282514651_05_2008_c1
Port 1MC Ingress
PriorityQueues
EgressPriorityQueues
Port 2M
C Ingress
PriorityQ
ueues
EgressPriorityQ
ueues
Port
4M
C In
gres
s Pr
iorit
yQ
ueue
s
Egre
ssPr
iorit
yQ
ueue
s
MC Ingress PriorityQueues
EgressPriorityQueues
Port 3
Multicast Class Queuing
Separates class contention and flow control
Ingress stores frame in packet buffer and keeps list of packets on each class
Scheduler notified about connection set at head of queue
Scheduler maximizes throughputFairness among ingress ports
Crossbar usage
Allows multiple transfers for non-overlapping connection sets
Eight queues per ingress port
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
18
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 35BRKDCT-282514651_05_2008_c1
Ingress 0Scheduler
Multicast Scheduling Algorithm
Ingress Scheduler selects a class for serviceDWRR priority selection
Sends request to required Egress SchedulersExact egresses selected by required packet fan-out
Egress Scheduler evaluates all Ingress requestsGlobal Multicast Round Robin pointer sets priority
All Egress select same Ingress with same weightMulticast Round Robin moved on each grant
Egress Scheduler checks path availabilityOutput buffer credit in UPCMulticast Fabric Buffer empty
Generate proposal to selected Ingress SchedulerIngress Scheduler collects proposals
If proposals matches all requested, generate full grantIf fan-out split is enabled
Generate grant for partially matching subsetStart timer to collect the rest of the required proposals
Request vector
Egress 57 Scheduler
Egress 1 Scheduler
Priorityscheduler
Proposal vector
Ingress 0 class queues
Egress 0 Scheduler
Multicast
Round R
obin
==
Request
Proposal
Multicast Round Robin
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 36BRKDCT-282514651_05_2008_c1
Forwarding
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
19
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 37BRKDCT-282514651_05_2008_c1
Forwarding Pipeline
Wire rate “fixed” latency
Parsed frame fields, configuration, and control plane state are evaluated to determine destination(s)
Policy engine filters based on configuration, bindings, and layered ACLs
Layered equal cost multi path expansion
Fibre Channel
EtherChannel/PortChannel
Parsed Packet
Collect Interface Configuration and
State
Virtual Interface Table (512)
Vlan Translation Table (4K)
Vlan State Table (1K)
Determine Destination
(ingress only)
Fibre Channel Switch Table (4K)
EthernetLearning
Policy EnforcementACL Search Engine(2K)
MultipathExpansion
(ingress only)
Zoning Table(2K)
RBACL Label Table(2K)
Binding Table(2K)
Fibre Channel Multipath Table (1K)
PortChannel Table(16)
Multicast Vector Table (4K)Station Table
(16K)
Editing Instructions &Virtual Output Queue List
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 38BRKDCT-282514651_05_2008_c1
Parsed Packet
Interface StateVirtual
Interface Table (512)
VlanTranslation Table (4K)
Vlan State Table (1K)
Forwarding(ingress only)
Fibre Channel Switch Table
(4K)
EthernetLearning
Policy Enforcement
ACL TCAM(2K)
MultipathExpansion
(ingress only)
Zoning Table(2K)
RBACL Label Table(2K)
Binding Table(2K)
Fibre Channel Multipath Table
(1K)
PortChannelTable(16)
Multicast Vector Table
(4K)Station Table
(16K)
Virtual Output Queue List
Destination address
Source address
Ethertype = 2
FCS
Ethertype = .1Q VLANCoS d
TOS Total lenVer IHLIdentification Flg Frgm offset
TTL Proto Header cksumSource address
Destination address
IP options
Src port Dst portSeq numberAck number
Hdr len Flags Win sizeCksum Urgent ptr
TCP options and data
checksum check
FCS check
Parsing Ethernet IP Packets
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
20
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 39BRKDCT-282514651_05_2008_c1
Parsed Packet
Interface StateVirtual
Interface Table (512)
VlanTranslation Table (4K)
Vlan State Table (1K)
Forwarding(ingress only)
Fibre Channel Switch Table
(4K)
EthernetLearning
Policy Enforcement
ACL TCAM(2K)
MultipathExpansion
(ingress only)
Zoning Table(2K)
RBACL Label Table(2K)
Binding Table(2K)
Fibre Channel Multipath Table
(1K)
PortChannelTable(16)
Multicast Vector Table
(4K)Station Table
(16K)
Virtual Output Queue List
Destination address
Source address
FCS
Ethertype = .1Q VLANCoS d
Ethertype = FCoE Ver
ReservedSOF
EOF Reserved
r_ctl d_id
seq_id df_ctl seq_cntox_id rx_id
Payload
Parameters
cs_ctl s_idtype f_ctl
CRC
CRC check FCS check
Fibre Channel frames are FCoE encapsulated prior to
forwarding
Parsing FCoE Packets
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 40BRKDCT-282514651_05_2008_c1
Acquiring Interface State
Physical Interface TableDefault priorityExpected encapsulations (802.1Q, FCoE)FCoE encapsulations for Fibre Channel physical ports
Virtual Interface TableDefault VLANInterface Security and QoS ACL labelsBinding check configurationEthernet learningSecure Group Tag assignment
VLAN State TableVLAN membership list
Virtual Interface granularityVLAN flood vectors
Unknown unicast, multicast, and broadcastVLAN Security and QoS ACL labels
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
21
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 41BRKDCT-282514651_05_2008_c1
Ethernet Forwarding
16K Entry dLeft hash table (StationTable)Searched by {VLAN, destination address}
SelectsLocal port
Multicast index
Unknown addresses forwarded by VLAN multicast vectorsUnknown unicast
Unregistered multicast
Broadcast
IP Multicast forwarded by MAC addressIP multicast groups registered by IGMP v1, v2, v3 snooping
Multicast vectors allocated dynamically based on destination membership
Same mechanism forwards Fibre Channel in the local domain and N_port Virtualizer
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 42BRKDCT-282514651_05_2008_c1
Modern Hardware Hash Searches
Many switches use single function “bucket hash”
Little or no overflow support
“dLeft” hash search uses two hash functions to increase occupancy
Data correlated with one hash is uncorrelated in other
Random occupancy ~90%
Nexus 5000 adds TCAM for bucket overflow
1–3% of capacity
Hash is “right sized” CRC division
Four polynomials for each searchTwo are selected
hash A
==
TCAM for overflow
searchkey
== == ==
hash B
==
Items keyed on hash B
== == ==
Priority
Items keyed on hash A
Associated Data
“dLeft” hash searchtwo uncorrelated hash functions and overflow
hash
==
Items keyed on hash
Associated Data
“traditional” hash searchone hash function and no overflow
searchkey
== == == == == == ==
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
22
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 43BRKDCT-282514651_05_2008_c1
Ethernet Address Learning
Ingress and Egress learning searchesLine rate on for all frames
Facilitates distributed table population
Ingress notifies supervisor to develop database
Supervisor pushes new addresses to all Unified Port Controllers
Adds entries if missed
Re-enforces existing entries
Supervisor queries tables to check for consistencyMaintains aging state
CPU removes entries that are obsolete
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 44BRKDCT-282514651_05_2008_c1
Fibre Channel Forwarding
4K Entry dual index search tableSearched by {VSAN, domain_id}
Misses are Fibre Channel exceptions
SelectsLocal port or PortChannel
Remote Fibre Channel switch
Locally attached hosts and N_port Virtualizer forwarded same as Ethernet
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
23
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 45BRKDCT-282514651_05_2008_c1
Policy Engine
Frames evaluated by multi-stage engine
Combination of arrays, hash tables, and Ternary CAMs
Searches occur in parallelResults evaluated in pipeline
Diagnostics and control plane “tap” pipeline at any point
Permit
VLAN Membershipcheck
Interface, VLAN, and MAC Binding
MAC and L3 Binding(IP & Fibre Channel)
Fibre Channel Zone membership check
Port ACLs
VLAN ACLs (ingress)
QoS ACLs (ingress)
Role Based ACLs (egress)
Control P
lane Redirect/Snooping
Sw
itch Port Analyzer (SPAN
) and Diagnostic Sam
pling
failpass
pass fail
pass
pass
permit
permit
permit
permit
fail
fail
deny
deny
deny
policerdrop
to Supto SPANsession
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 46BRKDCT-282514651_05_2008_c1
Access Control List Search Engine
2048 Ternary match Access Control Entries (ACEs)Each entry available to all functions
Labels allow sharing of Access Control EntriesAccess control lists have a labelPolicy definition points select a label
Interfaces, VLANs, rolesLabels and frame fields form search keys
Flexible region assignmentTune access control list resource allocation to network policies
Access Control Lists scopeVLAN and control plane are global scope—same on all Unified Port ControllersPort, QoS, role based, and SPAN are local scope—specific to each Unified Port Controllers
TCAM2Kx432
Priority
Search KeyPort ACLs
(768)QoS ACLs
(64 ingress)Role Based ACLs
(egress)
Vlan ACLs(1024)
SPAN and Diagnostic ACLs (64)
Control Plane ACLs (128)
Priority
Priority
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
24
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 47BRKDCT-282514651_05_2008_c1
Multipath Expansion
Two stage expansion processEach can lead to the next
Same mechanism for all expansions
Configuration of expansion unique to each expansion
Fibre Channel SwitchingSelects a path to a target Fibre Channel switch
Fibre Channel Shortest Path First (FSPF)
1K entries each selects up to sixteen Fibre Channel ports, Ethernet ports, or Ethernet/SAN PortChannels
EtherChannel and SAN PortChannelSelects a path to a physically adjacent device
Sixteen multipath entries each selects up to sixteen Ethernet ports or sixteen FC ports
Fibre Channelswitching?
FC MultipathExpansion
EtherChannel/PortChannelExpansion
List of Virtual Output Queues
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 48BRKDCT-282514651_05_2008_c1
Expansion Algorithm
Relevant frame fieldsEthernet Source Address and Destination Address always available
IP frames allows inclusion of IP v4/v6 Source and Destination Address
TCP/UDP frames can include source and destination ports
Fibre Channel frames can include D_ID and S_ID
OX_ID can also be included per VSAN
Each field is divided by one of two CRC-8 polynomials
Result of field CRC division is combined via bitwise XOR
Result selected using modulo division by number of equal cost paths256 possibilities are reduced to avoid bias
Ethernet DA
Ethernet SA
IP DA or FC D_ID
IP SA or FC S_ID
TCP DP
TCP SP or FC OX_ID
Galois Field 2 ÷
CRC-8 Ax8 + x5 + x4 + 1
Field selectPolynomial select
XOR Modulo
Number of equal paths
256possibilities Selected
Path
CRC-8 Bx8 + x5 + x3 + x2 + x1 + 1
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
25
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 49BRKDCT-282514651_05_2008_c1
Parsed Packet
Interface StateVirtual
Interface Table (512)
VlanTranslation Table (4K)
Vlan State Table (1K)
Forwarding(ingress only)
Fibre Channel Switch Table
(4K)
EthernetLearning
Policy Enforcement
ACL TCAM(2K)
MultipathExpansion
(ingress only)
Zoning Table(2K)
RBACL Label Table(2K)
Binding Table(2K)
Fibre Channel Multipath Table
(1K)
PortChannelTable(16)
Multicast Vector Table
(4K)Station Table
(16K)
Virtual Output Queue List
Internal Destination address
Internal Source address
FCS
Ethertype = DTAG TTL
Ethertype = .1Q VLANCoS d
FTAGDestination address
Source address
r_ctl d_id
seq_id df_ctl seq_cntox_id rx_id
Payload
Parameters
cs_ctl s_idtype f_ctl
CRCEthertype = FCoE Ver
ReservedSOF
EOF Reserved
r_ctl d_id
seq_id df_ctl seq_cntox_id rx_id
Payload
Parameters
cs_ctl s_idtype f_ctl
CRC
Editing FC Packets for VOQ
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 50BRKDCT-282514651_05_2008_c1
Session Objectives Summary
Understood the rationale behind I/O consolidation
Understood the Nexus 5000 architecture
Saw the data path inside a Nexus 5000
In This Session, You:
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
26
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 51BRKDCT-282514651_05_2008_c1
Key Takeaways
Nexus 5000 hardware overview
Nexus 5000 internal architecture
Nexus 5000 data path
The Key Takeaways of This Presentation Are:
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 52BRKDCT-282514651_05_2008_c1
Q and A
© 2006, Cisco Systems, Inc. All rights reserved.Presentation_ID.scr
27
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 53BRKDCT-282514651_05_2008_c1
Recommended Reading
Continue your Cisco Live learning experience with further reading from Cisco Press
Check the Recommended Reading flyer for suggested books
Available Onsite at the Cisco Company Store
© 2008 Cisco Systems, Inc. All rights reserved. Cisco Public 54BRKDCT-282514651_05_2008_c1
Complete Your Online Session Evaluation
Give us your feedback and you could win fabulous prizes. Winners announced daily.
Receive 20 Passport points for each session evaluation you complete.
Complete your session evaluation online now (open a browser through our wireless network to access our portal) or visit one of the Internet stations throughout the Convention Center.
Don’t forget to activate your Cisco Live virtual account for access to all session material on-demand and return for our live virtual event in October 2008.
Go to the Collaboration Zone in World of Solutions or visit www.cisco-live.com.