ece 526 – network processing systems design
DESCRIPTION
ECE 526 – Network Processing Systems Design. Programming Model Chapter 21: D. E. Comer. Overview. Recalled Network processors is complicated and heterogeneous architecture Hard to program it Need understand fine details of architecture Current approach assembly or subset of C language - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/1.jpg)
ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems
DesignDesignProgramming Model
Chapter 21: D. E. Comer
![Page 2: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/2.jpg)
Ning Weng ECE 526 2
OverviewOverview• Recalled
─ Network processors is complicated and heterogeneous architecture
─ Hard to program it• Need understand fine details of architecture• Current approach assembly or subset of C language
• Programming Model─ Filling the gap between application and architecture─ Natural interface (e.g. domain-specific language for
programmer)─ Abstraction of underlying hardware
• Enough architecture details to write efficient code• Not too complicated for programmer
• Two models─ Hardware specific model: IXP Programming Model─ General Models: NP–Click and ADAG
![Page 3: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/3.jpg)
Ning Weng ECE 526 3
IXP Programming ModelIXP Programming Model• What kind of software abstractions are used on
IXP?• Active Computing Element (ACE):
─ Fundamental software building block─ Used to construct packet processing system─ Runs on XScale, uE, host─ Handles control plane and fast or slow path packet
processing─ Coordinates and synchronizes with other ACEs─ Can have multiple outputs─ Can serve as part of pipeline
• Protocol processing is implemented by combining multiple ACEs
![Page 4: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/4.jpg)
Ning Weng ECE 526 4
ACE Terminology • Library ACE:
─ ACE that has been provided by Intel for basic functions
• Conventional ACE or Standard ACE:─ ACE build by customer─ Might make use of Intel’s Action Service Libraries
• Micro ACE─ ACE with two components:
• Core component (runs on XScale)• Microblock component (runs on uE)
• Terminology for microblocks:─ Source microblock: initial point that receives packets─ Transform microblock: intermediate point that accepts
and forwards packets─ Sink microblock: last point that sends packets
![Page 5: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/5.jpg)
Ning Weng ECE 526 5
ACE PartsACE Parts• An ACE contains four conceptual parts:• Initialization:
─ Initialization of data structures and variables before code execution
• Classification:─ ACE classifies packet on arrival─ Classification can be chosen or use default
• Actions:─ Based on classification an action is invoked
• Message and event management:─ ACE can generate or handle messages─ Communication with another ACE or hardware
![Page 6: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/6.jpg)
Ning Weng ECE 526 6
ACE BindingACE Binding• ACE can be bound together to implement protocol
processing:
• Binding happens when loading ACE into NP• Binding can be changed dynamically• Unbound targets perform silent discard
![Page 7: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/7.jpg)
Ning Weng ECE 526 7
ACE DivisionACE Division
![Page 8: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/8.jpg)
Ning Weng ECE 526 8
Microengine AssignmentMicroengine Assignment• Packet processing involves several microblocks• How should microblocks be allocated to
microengines?─ One microblock per micorengine─ Multiple microblocks per microengine (in pipeline)─ Multiple pipelines on multiple microengines
• What are pros and cons?─ Passing packets between microengines incurs overhead─ Pipelining causes inefficiencies if blocks are not equal in
size─ Multiple blocks per microengine causes contention and
requires more instruction storage
• Intel terminology: “microblock group”─ Set of microblock running on one microengine
![Page 9: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/9.jpg)
Ning Weng ECE 526 9
Microblocks GroupsMicroblocks Groups
• Microblock groups can be replicated to increase parallelism
![Page 10: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/10.jpg)
Ning Weng ECE 526 10
Microblock Group Replication • Performance Critical Groups can be replicated
![Page 11: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/11.jpg)
Ning Weng ECE 526 11
Control of Packet FlowControl of Packet Flow• Packets require different processing blocks
─ IP requires different microblocks than ARP─ Special packets get handed off to core
• “Dispatch Look” control packet flow among microblocks─ Each thread runs its own dispatch loop─ Infinite loop that grabs packets and hands them to
microblocks─ Return value from microblock determines the next step
• Invocation of microblockis similar to function call
![Page 12: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/12.jpg)
Ning Weng ECE 526 12
Dispatch LoopDispatch Loop• Example:
─ Three microblocks─ Ingress, IP, egress
![Page 13: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/13.jpg)
Ning Weng ECE 526 13
Click Model of IPv4Click Model of IPv4
NP-Click: A Programming Model for the Intel IXP1200 by Niraj Shah and etc, UC Berkeley
![Page 14: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/14.jpg)
Ning Weng ECE 526 14
My Approach: ADAGMy Approach: ADAG• Architecture-
independent workload representation
• ADAG (Annotated Directed Acyclic Graph)─ Node: processing task
• 3-tuple: the number of instructions, the number of memory reads and writes.
─ Edge: the dependency • edge weight: the amount of
data communicated between nodes.
![Page 15: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/15.jpg)
Ning Weng ECE 526 15
Profiling: Profiling: Trace GenerationTrace Generation• PacketBench [Ramaswamy 2003]• Data dependencies between registers and
memories• Control dependency for conditional branch
![Page 16: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/16.jpg)
Ning Weng ECE 526 16
Clustering AlgorithmClustering Algorithm• Ratio Cut [ Wei 1991]
─ identify the natural cluster without a-priori knowledge of the final number of clusters
─ cluster nodes together such that rij is minimized
─ top down approach─ NP-complete
• MLRC (Maximum Local Ratio Cut) ─ bottom-up─ merge the nodes that should be least separated and
recursively apply the process─ computation complexity O(n3)
![Page 17: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/17.jpg)
Ning Weng ECE 526 17
ADAG Mapping onto NPsADAG Mapping onto NPs• Goal: to generate a high
performance schedule• Mapping is NP-complete
problem• Using randomized mapping to
solve this NP-complete• Evaluate the randomized
mapping by an analytical performance model
B. A. Malloy, E. L. Lloyd, and M. L. Souffa. Scheduling DAG’s for asynchronous multiprocessor execution. IEEE Transactions on Parallel and Distributed Systems, 5(5):498–508, May 1994.
A0
A1
A2
A3
A4 A5 A6 A7
E0
E1
E2
E3
E4 E5 E6
E7
B0
B1
B2
B3
B4 B5 B6
B7
C0
C1
C2
C3
C4 C5
C6 C7
D0
D1
D2
D3
D4
D5 D6
D7
PE
ADAGNode
![Page 18: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/18.jpg)
Ning Weng ECE 526 18
Mapping Quality IMapping Quality I• Simulation setup: pipeline depth 1, width 8.• Performance model of ideal mapping:
![Page 19: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/19.jpg)
Ning Weng ECE 526 19
Mapping Quality IIMapping Quality II
• Exhaustive search: enumerates all possible mappings• Randomized search: randomly chooses a mapping
![Page 20: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/20.jpg)
Ning Weng ECE 526 20
SummarySummary• NP programming for high performance is hard
problem• Programming model is solution
─ Intel ACE ─ NP Click ─ ADAGs
![Page 21: ECE 526 – Network Processing Systems Design](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a83550346895db793a7/html5/thumbnails/21.jpg)
Ning Weng ECE 526 21
For Next Class and For Next Class and ReminderReminder
• Read Chapter 23• Lab 3• Project