Download - ECE 526 – Network Processing Systems Design
ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems
DesignDesignNetwork Processor Tradeoffs and Examples
Chapter: D. E. Comer
Ning Weng ECE 526 2
OutlineOutline• Network Processor design tradeoffs• Sample Network Processor
Ning Weng ECE 526 3
NP ArchitectureNP Architecture• Numerous different design goals
─ Performance─ Cost─ Functionality─ Programmability
• Numerous different system choices─ Use of parallelism─ Types of memories─ Types of interfaces─ Etc.
• We consider─ Design tradeoffs on high level (qualitative tradeoffs)─ Commercial Network Processors
Ning Weng ECE 526 4
Processor TopologiesProcessor Topologies• How can processors be arranged on NP?
─ Consider heterogeneity of processing resources and workload
• Multiprocessor─ Parallel processors with shared interconnect─ Problems?
• Pipeline─ Multiple processors per data path─ Problems?
• Data Flow Architecture─ Extreme form of pipelining─ Problems?
• Heterogeneous Architectures
Ning Weng ECE 526 5
Design Tradeoffs (1)Design Tradeoffs (1)• Low development cost vs. performance
─ ASICs give higher performance, but take time to develop─ NPs allow faster development, but might give lower
performance
• Programmability vs. processing speed─ Similar to tradeoff between ASIC and NP─ Co-processors pose the same tradeoffs─ Complexity of instruction set
• Performance: packet rate, data rate, and bursts─ Difficult to assess the performance of a system─ Even more difficult to compare different systems
• Per-interface rate vs. aggregate data rate─ NP usually limited to one port
Ning Weng ECE 526 6
Design Tradeoffs (2)Design Tradeoffs (2)• NP speed vs. bandwidth
─ How much processing power per bandwidth is necessary?─ Depends on application complexity
• Coprocessor design: look aside vs. flow-through─ Look aside: “called” from main processor, need state
transfer─ Flow-through: all traffic streams through coprocessor
• Pipelining: uniform vs. synchronized─ Pipeline stages can take different times─ Tradeoff between slowing down or synchronization
• Explicit parallelism vs. cost and programmability─ Hidden parallelism is easier to program─ Explicit parallelism is cheaper to implement
Ning Weng ECE 526 7
Design Tradeoffs (3)Design Tradeoffs (3)• Parallelism: scale vs. packet ordering
─ Why is packet order important?─ Giving up packet order constraint gives better
throughput
• Parallelism: speed vs. stateful classification─ Shared state requires synchronization─ Limits parallelism
• Memory: speed vs. programmability─ Different types of memories give performance─ Increases difficulty in programming
• I/O performance vs. pin count─ Packaging can be major cost factor─ More pins give higher performance
Ning Weng ECE 526 8
Design Tradeoffs (4)Design Tradeoffs (4)• Programming languages
─ Ease of programming vs. functionality vs. speed
• Multithreading: throughput vs. programmability─ Threads improve performance─ Threads require more complex programs and
synchronization
• Traffic management vs. blind forwarding at low cost─ Traffic management is desirable but requires processing
• Generality vs. specific architecture role─ NPs can be specialized for access, edge, core─ NPs can be specialized towards certain protocols
• Memory type: special-purpose vs. general-purpose─ SRAM and DRAM vs. CAM
Ning Weng ECE 526 9
Design Tradeoffs (5)Design Tradeoffs (5)• Backward compatibility vs. architectural
advances─ On component level: e.g., memories DDR DRAM─ On system level: NP needs to fit into overall router
system
• Parallelism vs. pipelining─ Depends on usage of NP
• Summary:─ Lots of choices─ Most decisions require some insight in expected NP
usage─ Tradeoffs are all qualitative
• Lets look at the commercial design
Ning Weng ECE 526 10
Novel Areas of NP UseNovel Areas of NP Use• TCP/IP offloading on high-performance servers• Security processing: SSL offloading• Storage area networks• Many others: IDSs and etc.
Ning Weng ECE 526 11
Performance BottlenecksPerformance Bottlenecks• Memory
─ Bandwidth available, but access time too slow─ Increasing delay for off-chip memory
• I/O─ High-speed interfaces available─ Cost problem with optical interfaces─ Otherwise no problem
• Processing power─ Individual cores are getting more complex─ Problems with access to shared resources─ Control processor can become bottleneck
Ning Weng ECE 526 12
Limitations on ScalabilityLimitations on Scalability• What are the limitations on how fast NPs need to
get?─ Link rates (optical bandwidth limits)─ Application complexity (core vs. edge)
• What are the limitations on how fast NPs can get?─ Parallelism in networks─ Power consumption─ Chip area
Ning Weng ECE 526 13
Commercial Network Commercial Network ProcessorsProcessors
• Commercial NPs─ Large variety of architectures─ Different applications and performance spaces─ Lots of implementation details and practical issues
• General Themes─ Type and number of processors
• Homogeneous vs. heterogeneous
─ Type and size of memories─ Internal and External communications channels─ Mechanisms of scalability: parallelism and pipelining─ Generality vs. specialization
Ning Weng ECE 526 14
Intel IXP1200: external Intel IXP1200: external connectionconnection
Ning Weng ECE 526 15
Intel IXP1200: internal Intel IXP1200: internal architecturearchitecture
Ning Weng ECE 526 16
Cisco PXFCisco PXF
Ning Weng ECE 526 17
Motorola C-Port: conceptual Motorola C-Port: conceptual designdesign
Ning Weng ECE 526 18
Motorola C-Port: internal Motorola C-Port: internal architecturearchitecture
Ning Weng ECE 526 19
Motorola C-Port: channel Motorola C-Port: channel processorprocessor
Ning Weng ECE 526 20
IXP2400IXP2400• XScale (ARM compliant) embedded control
processor─ Instruction and data caches
• 8 microengines─ 400 or 600 MHz
• 8 threads per microengine• Multiple instruction stores with 4k instructions• 256 general purpose registers• 512 transfer registers• 2GB addressable DDR-DRAM memory (19.2 Gbps)• 32MB addressable QDR-SRAM memory (12 Gbps
r+w)• 16 words of Next Neighbor Registers• 16kB scratchpad
Ning Weng ECE 526 21
IXP2400IXP2400• Interconnects
─ Coprocessor bus added (incl. access to T-CAM)
─ Flow control bus for two-chip configurations (e.g., ingress and egress)
• Switch Fabrics─ No IX bus─ Utopia 1, 2, 3─ CSIX-L1─ SPI-3 (POS-PHY
2/3)
Ning Weng ECE 526 22
Two-Chip ConfigurationsTwo-Chip Configurations• Flow control needed between ingress and
─ 1Gbps over flow control bus (not shown)
Ning Weng ECE 526 23
IXP2400 Internal IXP2400 Internal ArchitectureArchitecture
Ning Weng ECE 526 24
IXP2400 MicroengineIXP2400 Microengine• Enhancements over IXP1200 microengines:
─ Multiplier unit─ Pseudo-random number generator─ CRC calculator─ 4 32-bit timers and timer signaling─ 16-entry CAM for inter-thread communication─ Time stamping unit─ Generalized thread signaling─ 640 words of local memory─ Simultaneous access to packet queues without mutual
exclusion─ Functional units for ATM segmentation and reassembly─ Automated byte-alignment─ uE divided into two clusters with independent command
and SRAM buses
Ning Weng ECE 526 25
SoftwareSoftware• Support for software pipelining
─ “Reflector Mode Pathways” for communication─ Next Neighbor Registers as programming abstraction
• SDK 4.0─ Simulator, debugger, profiler, traffic generator─ Portable modules─ Provides better infrastructure support─ C compiler
Ning Weng ECE 526 26
SummarySummary• Network Processor design space is big due to
─ Varying design goals─ Varying implementation choices
• Qualitative tradeoffs• Survey commercial NPs• Network processors are getting more features• Main architecture characteristic is still parallelism• Software support is becoming more important