network processors evolution and current trends may...
TRANSCRIPT
Network Processors Evolution and Current Trends
May 1, 2008
Nazar Zaidi RMI Corporation, USA
INCC - May 1, 2008 2
Network Processors: Evolution & Trends
• Overview of Network Processing• Drivers & Demands for Network Processing• What is a “Network Processor”?• Characteristics of Network Processors• Few Examples & Trends
• Acknowledgements: Primary source of the material is www….– Pretty pictures are courtesy of RMI’s marketing team
INCC - May 1, 2008 3
Enterprise NetworkInternet Access Firewall/Routing Caching & Load-Balancing Access Application Database
Storage
IndividualInformationCustomer
InternetAnd/OrIntranet
Router/ Firewal
lInternetAccess
INCC - May 1, 2008 4
Mobile Communication and Information Convergence
Uu – Between Node B and User Equipmentlub – Between Node B and RNClur – Between RNClu-CS – Between RNC and Circuit Switched
Core Networklu-PS – Between RNC and Packet Switched
Core NetworkGn – Between SGSN and GGSNlur-g – Between BSC and RNC
RNC
Node B
SGSN
Node B IP
MGW MGW PSTN
GGSNPS
CS
PS – Packet Switch
CS – Circuit Switch
MSCBTS
RNC
BSC
GERANUm
Uulub
lur-g
lurGn
lu-PS
lu-CS
Abis
UTRAN
GERAN – GSM Edge Radio Access Network (2G/2.5G)UTRAN – Universal Terrestrial Radio Access Network (3G)BTS – Base Transceiver Station (Base Station in 2G)Node B – Base Station in 3GRNC – Radio Network Controller (3G)BSC – Base Station Controller (2G/2.5G)
A
Gb
CORE Network Packet Switching Domain
CORE Network Circuit Switching Domain
SGSN – Serving GPRS Support NodeGGSN –Gateway GPRS Support NodeMGW – Circuit Switched Media GatewayMSC – Mobile Switching Center
3G Interfaces
Note – MSC and MGW have one to many relationship
INCC - May 1, 2008 5
““Bit ShufflingBit Shuffling”” ““Information ShufflingInformation Shuffling””
Service Delivery Networks
WIRED
WIRELESS
INCC - May 1, 2008 6
Trends
Network Traffic(2x / 12 Months)
Compute Capacity(2x / 18 Months)
User Traffic2x / 12months
Nor
mal
ized
Gro
wth
si
nce
1990
1990 201020001
10000
1000
100
10
100000
Source: Prorating Data from Nick McKeown (Stanford University)
Traffic Growth
INCC - May 1, 2008 7
Qualcomm Oppenheimer May 2007 Qualcomm Jefferies Conference Oct ‘07
• Consumer and Service provider demand, as well as increase in available bandwidth, are driving data traffic growth in the infrastructure
• Processors need to evolve to meet the needs of content traffic growth in the Emerging Infrastructure
Wireless Traffic Growth
INCC - May 1, 2008 8
Today’s Networks
• Continuously growing traffic• Higher rates• Rich set of features
• Security– VPN/IPSec, SSL, Firewalls
• Application Awareness• Deep Packet Inspection• Traffic Engineering – QoS/SLA
etc.
More processing/inspection/decision making needed at much higher rates
GH
z an
d G
bps
GH
z an
d G
bps
19901990 201019951995 20002000 20032003 20052005 20072007
Moore’s Law
Network bandwidth
.01.01
0.10.1
11
1010
100100
1010
4040
100100
11
0.10.1
1.E-03
1.E-02
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
1.E+0419
96
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Processor Clock Frequency M obile Data Rate
Mobile Data Rate Mobile Data Rate –– MbpsMbps
Processor Frequency Processor Frequency –– MHzMHz
INCC - May 1, 2008 9
Early “Network Processors”
• Interface Message Processor (IMP)– Connected Computers to ARPANET
• Honeywell DDP-516 minicomupter
Wikipedia: Leonard Kleinrock and the first IMP. Taken from http://www.lk.cs.ucla.edu/personal_history.html
INCC - May 1, 2008 10
Data Plane Requirements• Low latency/low packet loss/high throughput• High performance packet processing• High I/O bandwidth• Variety of high-speed I/O interfaces• Low latency/high bandwidth memory access
Processing Requirements
Control Plane Requirements• High performance • High I/O bandwidth• Large memory capacity• Low-latency memory access• Power performance
ControlPath
Data PathIngress Egress
INCC - May 1, 2008 11
Processing Elements used in Networking
• General Purpose Processors– Control Plane Applications– Dataplane (albeit, “lower” performance) applications– Examples
• X86 – Intel, AMD• PowerPC – FreeScale, AMCC• MIPS, ARM – Various Vendors
• General Purpose Processor + Co-Processor– Security Co-Processors, Search Engines, Traffic Managers, TOE..
• ASICs– Forwarding, Policing, Shaping, QoS/Traffic Management……
• Single/Multi-Core System-on-a-chip (SOC)– Traditionally referred to as Network Processor
INCC - May 1, 2008 12
Motivation for Multiple Cores
• A new 64-byte packets arrives every 64ns on a 10G port– Including IPG and Preamble
• With zero overhead for packet reception and transmission functions, a 1Ghz processor provides a 64 cycle budget
Processor(Ghz) Cycles Inst/Cycle (IPC) Instructions3 192 0.75 1443 192 1.25 2405 320 0.75 2405 320 1.25 40010 640 0.75 48010 640 1.25 800
INCC - May 1, 2008 13
Motivation for Multiple Cores
• IPC (Instructions/Cycle) is affected by memory subsystem performance
– DRAM latency does not scale as the processor frequency– Cycles are wasted waiting for memory
• % loss is higher at higher clock frequencies
Trends
Network Traffic(2x / 12 Months)
Compute Capacity(2x / 18 Months)
DRAM Access Time(1.1x / 18 Months)
User Traffic2x / 12months
Nor
mal
ized
Gro
wth
si
nce
1990
1990 201020001
10000
1000
100
10
100000
Source: Prorating Data from Nick McKeown (Stanford University)
INCC - May 1, 2008 14
Frequency is not the answer
• A theoretical 10Ghz processor with a “reasonable” IPC provides an instruction budget in 100s of instructions
• Not sufficient for today’s packet processing functions
• Higher Frequency of Operation for CPU will result in “hot” CPUs
• System Power constraints alone prevent pushing the operating frequency
INCC - May 1, 2008 15
Multiple Cores/Processing Elements
• Packet Processing lends itself very well to parallel processing– Work on multiple packets/flows simultaneously
• N processing elements increase the cycle budget by factor of N– Referring to earlier 10G, 64-byte packet example:
• 1 CPU = 64ns budget• 2 CPU = 128ns budget• 8 CPU = 512ns budget
– Not accounting for other overheads
• Pipelining of functions achieves similar results
INCC - May 1, 2008 16
Multi-threading Improves ThroughputSingle CPU - Single Thread
Single CPU - Four Threads
Performance Improvement
MemoryLatency
Processing
MemoryLatency
MemoryLatency
MemoryLatency
TimeTime
Thread 2Thread 2
Thread 3Thread 3
Thread 4Thread 4
Thread 1Thread 1
Stalled
• Networking Applications benefit significantly from multi-threaded architectures
– Memory latencies are more effectively hidden– Cache-miss penalties are reduced or eliminated– Branch mis-predict penalties avoided
• Better computation density and power performance ratio
INCC - May 1, 2008 17
Network Processor Attributes
• Number of Cores and/or Processing Elements– Homogenous vs. Heterogenous Resources
• Hardware Accelerators and “off-load” capabilities• Types of Interfaces
– Memory, 1G, 10G, SPI, PCI, HT etc.
• Programmability– General Purpose Programming– Changing Standards– New Features– Bug Fixes
• Ease of programming– Very critical for adoption– Code Maintenance
Few Examples
INCC - May 1, 2008 19
Intel – IXP2800
Source: Microprocessor Report
INCC - May 1, 2008 20
IXP - MicroEngine
Source: Microprocessor Report
INCC - May 1, 2008 21
IBM - PowerNP
Source: IBM Journal of Research & Development, Vol47, No.2/3
INCC - May 1, 2008 22
AMCC – nP3700
Source: AMCC Website
INCC - May 1, 2008 23
Agere PayloadPlus
• FPP = Fast Pattern Processor
• RSP = Routing Switch Processor
Source: Hot Chips 2001
INCC - May 1, 2008 24
Motorola - C5
Source: Motorola Website
INCC - May 1, 2008 25
EZChip – NP1/2
Function Type of TOP
Packet parse TOPparse
Lookup and classify TOPsearch
Forwarding and QoS TOPresolve
decisions
Packet modify TOPmodify
Source: EZchip Website
TOP = Task Optimized Processor
INCC - May 1, 2008 26
Cisco - Toaster 3
Source: Microprocessor Report
INCC - May 1, 2008 27
Toaster TMC
Source: Microprocessor Report
INCC - May 1, 2008 28
Xelerated – Xelerator X10q
Source: Microprocessor Report
INCC - May 1, 2008 29
Bay MicroSystem - Chesapeake
Source: CommDesign
INCC - May 1, 2008 30
Broadcom - 1250
Source: Microprocessor Report
INCC - May 1, 2008 31
FreeScale - 8641D
Source: FreeScale Website
INCC - May 1, 2008 32
PASemi - PWRficient PA6T-1682M
Source: Microprocessor Report
INCC - May 1, 2008 33
Broadcom - 1480
Source: Broadcom Website
INCC - May 1, 2008 34
Multiple BRCM 1400 Connectivity
Source: Microprocessor Report
INCC - May 1, 2008 35
Cavium – Octeon Plus
Source: Cavium Network’s website
INCC - May 1, 2008 36
RMI - XLR• Up to 8 MIPS64TM XLRTM Cores
– Up to 1.2GHz Operation– 32 Fine Grain Threads– 32KB/32KB L1 DedicatedPlusTM cache– Unaligned memory access acceleration– 8-banked 2MB L2 Cache, 8 trans/clk– Fully cache-coherent design
• Full speed Interconnects – Fast Messaging Network– Memory Distributed Interconnect– I/O Distributed Interconnect
• Quad 800MHz DDR2 Controllers• Autonomous Security Engines
– Quad Parallel CryptoCores– 10Gbps Bulk Encryption + RSA– Single pass multi-operations– DES/3DES/AES/GCM/ARC4, Kasumi f8– SHA-1/256/384/512, MD5, Kasumi f9
• Dual 10Gbps Ports– SPI-4.2 and Native 10GE interfaces– SPI-4.2 Pass-Though
• Quad 1GE Ports• Native Hypertransport Port• Native QDRSRAM/LA-1 Port
RGMII RGMII RGMII RGMIIXGMIISPI-4.2
Fast Messaging NetworkFast Messaging Network
XGMIISPI-4.2
Packet Distribution EnginesPacket Distribution Engines
HT PCI-X 11 22 33 44
SecuritySecurityEngineEngine
DMAEngine
Eight Banked 2MB LevelEight Banked 2MB Level--2 Cache2 Cache
Mem
ory
and
I/O B
ridge
Mem
ory
Brid
ge
DR
AM
Dua
l-Cha
nnel
s
DR
AMD
ual-C
hann
els
QD
RSR
AM
/LA
1 Q
uad-
Cha
nnel
Gen
eral
Purp
ose
I/O
RRSSAA
Core 1Core 1
11223344
Core 2Core 2
55667788
Core 3Core 3
99101011111212
Core 4Core 4
1313141415151616
Core 5Core 5
1717181819192020
Core 6Core 6
2121222223232424
Core 7Core 7
2525262627272828
Core 8Core 8
2929303031313232
I/O Distributed Interconnect
Memory Distributed Interconnect
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
32K
B I
32K
B I
32K
B D
32K
B D
INCC - May 1, 2008 37
SUN - Niagara 2• 8 SPARC Cores, 4MB shared L2 cache• 64-bit SPARC V9 instruction set • Each SPC has the following features:
• Supports concurrent execution of 8 threads• 1 load/store, 2 Integer execution units• 1 Floating point and Graphics unit • 8-way, 16 KB I$; 32 Byte line size • 4-way, 8 KB D$; 16 Byte line size • 64-entry fully associative ITLB • 128-entry fully associative DTLB • MMU supports 8K, 64K, 4M, 256M page
sizes; Hardware Table walk • Advanced Cryptographic unit
• Two 10G Ethernet (XAUI) ports on chip• Cryptographic coprocessor – 1 per Core• On-chip PCI-Express, Ethernet, and
FBDIMM memory interfaces are SerDes based
INCC - May 1, 2008 38
Tilera – TILE64
Source: Microprocessor Report
INCC - May 1, 2008 39
FreeScale – MultiCore SOC Roadmap
Source: Microprocessor Report
INCC - May 1, 2008 40
Differentiating NP Attributes
• Processing Elements– Special/Fixed function– General Purpose CPUs– Multi-threading
• Instruction Set– Custom – General Purpose
• MIPS, PowerPC• RISC, VLIW
• Memory Subsytem– Bus, Ring, Mesh– Coherent– Buffer forwarding
• Acceleration Functions
INCC - May 1, 2008 41
ASICsASICs
Prog
ram
mab
ility
Throughput
IXPx28xx
IXP4xx
General PurposeProcessors
Control PlaneProcessors
FPGAs
NPUs
+ Accelerators
Custom IP
XLR/XLSXLR/XLS ProcessorProcessor
RMI
The company and product names shown above may be trademarks or servicemarks of their respective owners.
Network Processors
INCC - May 1, 2008 42
Summary
• General Purposes Processors were the early network processors
• Today’s networks are “intelligent” with very high bandwidth requiring significantly more packet processing
• Network Processors are multi-engine/core, multi-threaded SOCs with varying degree/ease of programmability and hardware acceleration
• Recent NPs are general-purpose coherent multi-processor, multi-threaded SOCs employing general purpose programming techniques