cse 58x: networking practicum
DESCRIPTION
Instructor: Wu-chang Feng TA: Francis Chang. CSE 58x: Networking Practicum. About the course. Prerequisite: CSE 524 or the equivalent Implementation-focused course Intel's IXA network processor platform Contents Brief lecture material on network processors and the IXP - PowerPoint PPT PresentationTRANSCRIPT
CSE 58x: Networking Practicum
Instructor: Wu-chang FengTA: Francis Chang
About the course● Prerequisite: CSE 524 or the equivalent● Implementation-focused course
– Intel's IXA network processor platform
● Contents– Brief lecture material on network processors and the IXP– 5 weeks of designed laboratories– 3 weeks of final projects
Modern router architectures● Split into a fast path and a slow path● Control plane
– High-complexity functions– Route table management– Network control and configuration– Exception handling
● Data plane– Low complexity functions– Fast-path forwarding
Router functions● RFC 1812 plus...
– Error detection and correction– Traffic measurement and policing– Frame and protocol demultiplexing– Address lookup and packet forwarding– Segmentation, fragmentation, reassembly– Packet classification– Traffic shaping– Timing and scheduling– Queuing– Security
Design choices for network products● General purpose processors● Embedded RISC processors● Network processors● Field-programmable gate arrays (FPGAs)● Application-specific integrated circuits (ASICs)
General purpose processors (GPP)● Programmable● Mature development environment● Typically used to implement control plane● Too slow to run data plane effectively
– Sequential execution– CPU/Network 50x increase over last decade– Memory latencies 2x decrease over last decade
● Gigabit ethernet: 333 nanosecond per packet budget● Cache miss: ~150-200 nanoseconds
Embedded RISC processors (ERP)● Same as GPP, but
– Slower – Cheaper– Smaller (require less board space)– Designed specifically for network applications
● Typically used for control plane functions
Application-specific integrated circuits (ASIC)
● Custom hardware● Long time to market● Expensive● Difficult to develop and simulate● Not programmable● Not reusable● But, the fastest of the bunch● Suitable for data plane
Field Programmable Gate Arrays (FPGA)● Flexible re-programmable hardware● Less dense and slower than ASICs● Cheaper than ASICs● Good for providing fast custom functionality● Suitable for data plane
Network processors● The speed of ASICs/FPGAs● The programmability and cost of GPPs/ERPs● Flexible● Re-usable components● Lower cost● Suitable for data plane
Network processors● Common features
– Small, fast, on-chip instruction stores (no caching)– Custom network-specific instruction set programmed at
assembler level● What instructions are needed for NPs? Open question.● Minimality, Generality
– Multiple processing elements– Multiple thread contexts per element– Multiple memory interfaces to mask latency– Fast on-chip memory (headers) and slow off-chip memory
(payloads)– No OS, hardware-based scheduling and thread switching
Why network processors?● The propaganda● Take the current vertical network device market● Commoditize horizontal slices of it● PC market
– Initially, an IBM custom vertical– Now, a commodity market with Intel providing the chip-set
● Network device market– Draw your own conclusions
Network processing approaches
Programming/Development Ease
Spe
ed
ASIC
Network processor
FPGA
GPP
Embedded RISC Processor
Network processor architectures● Packet path
– Store and forward● Packet payload completely stored in and forwarded from off-chip
memory● Allows for large packet buffers● Re-ordering problems with multiple processing elements● Intel IXP, Motorola C5
– Cut-through● Packet held in an on-chip FIFO and forwarded through directly● Small packet buffers● Built-in packet ordering● AMCC
Network processor architectures● Processing architecture
– Parallel● Each element independently performs entire processing function● Packet re-ordering problems● Larger instruction store needed per element
– Pipelined● Each element performs one part of larger processing function● Communicates result to next processing element in pipeline● Smaller code space● Packet ordering retained● Deterministic behavior (no memory thrashing)
– Hybrid
Network processor architectures● Processing hierarchy
– ASICs– Embedded RISC processors– Specialized co-processors– See figure 13.7 in book
Network processor architectures● Memory hierarchy
– Small on-chip memory● Control/Instruction store● Registers● Cache● RAM
– Large off-chip memory● Cache● Static RAM● Dynamic RAM
Network processor architectures● Internal interconnect
– Bus– Cross-bar– FIFO– Transfer registers
Network processor architectures● Concurrency
– Hardware support for multiple thread contexts– Operating system support for multiple thread contexts– Pre-emptiveness– Migration support
Increasing network processor performance● Processing hierarchy
– Increase clock speed– Increase elements
● Memory hierarchy– Increase size– Decrease latency– Pipelining– Add hierachies– Add memory bandwidth (parallel stores)– Add functional memory (CAMs)
Focus of this class...● Network processors
– Intel IXA
IXP 1200 features● One embedded RISC processor (StrongARM)
– Runs control plane (Linux)
● 6 programmable packet processors (-engines)– Runs data plane (-engine assembler or -engine C)
● Central hash unit● Multiple, bus interconnects
– IXBus (4.4Gbps) to overcome PCI's 2.2Gbps limit
● Small on-board memory● Serial interface for control● External interfaces for memory
IXP12xx -engine
IXP2xxx -engine
-engine functions● Packet ingress from physical layer interface● Checksum verification● Header processing and classification● Packet buffering in memory● Table lookup and forwarding● Header modification● Checksum computation● Packet egress to physical layer interface
-engine characteristics● Programmable microcontroller
– Custom RISC instruction set– Private 2048 instruction store per -engine (loaded by
StrongARM)– 5-stage execution pipeline
● Hardware support for 4 threads and context switching– Each -engine has 4 hardware contexts (mask memory latency)
-engine characteristics● 128 general purpose registers
– Can be partitioned or shared– Absolute or context-relative
● 128 transfer registers– Staging registers for memory transfers– 4 blocks of 32 registers
● SDRAM or SRAM● Read or Write
● Local Control and Status Registers (CSRs)– USTORE instructions, CTX, etc. (p. 315)
-engine characteristics● FBI unit
– Scratchpad memory– Hash unit– FBI CSRs– IXBus control– IXBus FIFOs
● Transmit and Receive FIFOs to external line cards
-engine opcodes● ALU instructions
– ALU, ALU_SHF, DBL_SHIFT
● Branch/Jump instructions– BR, BR=0, BR!=0, BR_BSET, BR=BYTE, BR=CTX,
BR_INP_STATE, BR_!SIGNAL, JUMP, RTN, etc.
● Reference instructions– CSR, FAST_WR, LOCAL_CSR_RD, R_FIFO_RD, PCI_DMA,
SCRATCH, SDRAM, SRAM, T_FIFO_WR, etc.
● Local register instructions– FIND_BST, IMMED, LD_FIELD, LOAD_ADDR,
LOAD_BSET_RESULT1, etc.
-engine functions● Miscellaneous
– CTX_ARB– NOP– HASH1_48, HASH1_64, etc.
1. Packet received on physical interface (MAC)2. Ready-bus sequencer polls MAC for mpacket Updates receive-ready upon a full mpacket3. -engine polls for receive-ready4. -engine instructs FBI to move mpacket from MAC to RFIFO5. -engine moves mpacket directly from RFIFO to SDRAM6. Repeat 1-5 until full packet received
7. -engine or StrongARM processing8. Packet header read from SDRAM or RFIFO into m-engine and classified (via SRAM tables)9. Packet headers modified10. mpackets sent to interface11. Poll for space on MAC Update transmit-ready if room for mpacket12. mpackets transferred to MAC
8 9
8
8 9
Programming the IXP● Focus of this course on steps 7, 8, and 9● 2 programming frameworks
– Command-line, IXA Active Computing Engine (ACE) framework
– Graphical microengine C development environment
Programming the IXP● Command-line, IXA Active Computing Engine (ACE)
framework– Re-usable function blocks chained together to build an
application (Chapters 22-24)– New functions implemented as new blocks in chain
● Core ACEs (StrongARM)– Written in C
● Microblock ACEs (microengines)– Written in assembler
Programming the IXP● Graphical microengine C development environment
– Monolithic microengine C code (can not be used on IXP1200 hardware)
– Demos forthcoming