hardware-accelerated signaling
DESCRIPTION
Hardware-Accelerated Signaling. ―Design, implementation. and Implications. Haobo Wang November 15, 2004. Outline. Background and problem statement OCSP: a performance-oriented signaling protocol and its hardware implementation - PowerPoint PPT PresentationTRANSCRIPT
Hardware-Accelerated Signaling
Haobo WangNovember 15,
2004
―Design, implementationand Implications
223/4/22
Outline Background and problem statement OCSP: a performance-oriented signaling
protocol and its hardware implementation A subset of RSVP-TE signaling protocol and its
hardware implementation Comparison of signaling transport options Implications of hardware-accelerated signaling Conclusions and future work
323/4/22
Outline Background and problem statement OCSP: a performance-oriented signaling
protocol and its hardware implementation A subset of RSVP-TE signaling protocol and its
hardware implementation Comparison of signaling transport options Implications of hardware-accelerated signaling Conclusions and future work
423/4/22
Background Signaling protocol
Set up and tear down connections in connection-oriented networks
Control-plane protocol Signaling protocols are primarily implemented in
software Two reasons: complexity and the requirement for flexibility Price paid: poor performance
RSVP-TE for GMPLS Support a wide range of connection-oriented networks Being implemented by switch vendors
523/4/22
Network and node views
Control plane
User plane . .
Line card Line card
.
.
.
. Line card
Line card
.
. Switch Fabric
Output Input Interfaces Interfaces
Routing process Signaling
process . . .
.
.
. NIC
P
Hardware Signaling Accelerator NIC NIC
NIC Input signaling Interfaces Output signaling
Interfaces
LMP process
623/4/22
Two questions Question 1: why connection-oriented (CO)?
Inherent support for QoS Can connectionless networks provide QoS? Yes, but
Over-provisioning -> low utilization Question 2: What the drawbacks of CO?
Call setup overhead – signaling message propagation delay, processing delays, and transmission delays
Call handling capacities of today’s switches are limited
723/4/22
Problem statement How to overcome the drawbacks of CO ―
Hardware-accelerated signaling? Determine whether signaling protocols can be
implemented in hardware and demonstrate it with an actual implementation
Study how to reduce signaling message trans-mission delays
Explore the impact of hardware-accelerated signaling protocol implementations
823/4/22
Related work How to achieve fast signaling?
New, simplified signaling protocols: YESSIR, PCC Hardware implementation: FRP (ASIC, not
flexible) A simplified version of RSVP-TE intended for
hardware implementation “Keep It Simple” Signaling – still on the blueprint
Other comparable protocols implemented in hardware TOE: TCP/IP Offload Engine TCP switching
923/4/22
Outline Background and problem statement OCSP: a performance-oriented signaling
protocol and its hardware implementation A subset of RSVP-TE signaling protocol and its
hardware implementation Comparison of signaling transport options Implications of hardware-accelerated signaling Conclusions and future work
1023/4/22
Optical Circuit-switching signaling Protocol - OCSP Performance-oriented, optimized for
hardware implementation Specifically designed for SONET switches Implemented on WILDFORCE FPGA board
1123/4/22
Hardware platform of the implementation
P C IBUS
P C IC HI P
F I F O 05 1 2 b y 3 6
F I F O 15 1 2 b y 3 6
F I F O 45 1 2 b y 3 6
S R AM3 2 k b y 3 2
C P E 0
M E Z Z A N I N EC A R D
D PMC 0
C R O S S BAR
PE 1
M E Z Z A N I N EC A R D
D PMC 1
PE 2
M E Z Z A N I N EC A R D
D PMC 2
PE 4
M E Z Z A N I N EC A R D
D PMC 4
L O C AL BUS
L O C AL BUS
HO S T
3 2
3 2
PE 3
M E Z Z A N I N EC A R D
D PMC 3
3 2 3 2FIFO 0 FIFO 1
SignalE ngine
M e zzanine M e m o ry
1 5
3 62
FIFO 0C o ntro l le r
M e m o ryC o ntro l le r
FIFO 1C o ntro l le r
C P E 0
Fro m H o s t To H o s t
3 6
R o uting, C AC ,C o nne c tivi ty table s
S tate , Switc hM apping table s
2 4
1 5
P E 1
2
3 28
2
1223/4/22
Simulation and implementation results for OCSP
Assuming a 25 MHz clock Total setup and teardown time: 5.9 to 6.8 us Call handling capacity of 150,000 calls/sec
Setup Setup Success Release Release
ConfirmClockcycles 77-101 9 51 10
Device Resource Eq.GatesCPE0 XC4036XL
A 62% 22,000
PE1 XC4013XLA
8% 1,000
1323/4/22
Outline Background and problem statement OCSP: a performance-oriented signaling
protocol and its hardware implementation A subset of RSVP-TE signaling protocol and its
hardware implementation Comparison of signaling transport options Implications of hardware-accelerated signaling Conclusions and future work
1423/4/22
Challenges for hardware implementation of RSVP-TE
A large number of messages, objects Maintaining state information Many data tables Support for timers Global connection reference Flexible TLV style object……
L e ngth (1 6 ) C las s -N um (1 ) C -Type (7 )IP v4 tunne l e nd po int addre s s
M us t be ze ro Tunne l IDE xte nde d Tunne l ID
A SESS ION o bje c t de f ine d in R FC 3 2 0 9 (R SVP -TE )
L e ngth (1 2 ) C las s -N um (1 ) C -Type (1 )IP v4 D e s t Addre s s
P ro to c o l ID Flags D s t P o r t
A SES S ION o bje c t de f ine d in R FC 2 2 0 5 (R SVP )
TypeLength
Value
Can be overcomeby defining a sub-set of RSVP-TE
1523/4/22
Processing of Path message
IP 5.7.1.1
IP 7.4.1.4
IP 5.7.1.3 IP 7.4.1.2
IP 4.8.1.1
Index ReturnNext_IP_Addr_User Next_IP_Addr_Ctrl
7.4.1.2 7.4.2.2
Incoming Connectivity table
Int.#1
Int.#5 Int.#3Int.#10
Index ReturnDest_IP_Addr. Next_Hop_Addr_User
7.4.1.4 7.4.1.2
Routing tableIndex1 Return
Out.I/F ID Avail. BW.3 0101 0011
1111
Outgoing CAC table
State tableUser/Control Mapping table
Index ReturnPrev_IP_Addr_User Prev.I/F ID In.I/F ID
5.7.1.3 1 5
Outgoing Connectivity tableIndex Return
Next_IP_Addr_User Seq.# Out.I/F ID7.4.1.2 1 3
Index ReturnGlobal Conn
RefCtrl plane
info.User plane
infoTraffic State
… … … … … …
1623/4/22
Architecture of the hardware signaling accelerator (FPGA)
M e s s ageAs s e m ble r
O bje c tD is patc he r
Inc o m ingM s g B uf
P C I L o c al B us Inte r fac e
R e s o urc eM anage m e nt
D ata TableM anage m e nt
R e gis te rB ank
C AC Table
R e trans m is s io nM anage m e nt
GbE
Inte
rfac
e
F IFO Inte r fac e C ro s s -c o nne c t Inte r fac e
TCA
M &
SRA
M Intefaces
SRAM
I/F
Message parsing Message processing
Message assembling
1723/4/22
Functional modules of the hardware signaling accelerator Incoming message buffer
Two-level message buffering and FIFO interface Object dispatcher
Two-level dispatching and distributed decoding – TLV challenge
Data table management Table access arbiter and TCAM/SRAM interfaces
Resource management Hierarchical resource allocator and CAC table
Retransmission management Retransmission buffers, timers, exponential back-off
1823/4/22
Architecture of the prototype board
FPGA(XC2V3000)
TCAM(IDT75P52100)
SRAM(IDT71V2556)
FIFO(IDT72V36110)
GbE MAC(L8104)
Switch Fabric(VSC9182)
SerDes(HDMP-1636A)
15-bit
72-bit
Command Bus
Req. Data Bus
21-bitIndex Bus
CE#
OE#
Data Bus36-bit
D[35:0]
RxD[35:0]
TxD[35:0]
RX[9:0]
TX[9:0]
RxEn#
TxEn#
RxSOF
TxSOF
RxWM2
TxWM2
CS#
Data
Address
ALE
WR#
RD#
CONFIG
INT#
AD
[31:
0]
C/B
E#[3
:0]
PAR
Fram
e#TR
DY
#IR
DY
#St
op#
DEV
SEL#
IDSE
L
PER
R#
SER
R#
REQ
#G
NT#
CLK
RST
#
Q[35:0]
EF#
FF#
RCLK
WCLK
REn#
WEn#
REGA[7:0]
REGD[15:0]
REGCLK REGCS#
REGINTREGWR#REGRD#
RxD[63:0]+/- TxD[63:0]+/-
SYNCP#
SYSCLK+/-
DO
UT-
DIN
+
RBC0RBC1
REFCLK
Optical Transceiver
(HDMP-1636A)
Duplex SCConnector
PCI Connector
Clock OSC(CO43S)
3 REFCLKD
IN-
DO
UT+
Message Buffer
SRAM
1923/4/22
Main on-board modules Hardware signaling accelerator: FPGA
957-pin BGA, 6 separate clock signals, 3 I/O levels High-speed (100MHz)
1Gbps signaling channel: GbE MAC+SerDes+ optical transceiver Demonstrate 250,000 calls/sec call handling rate High-speed interface: 125MHz
Incoming message buffer: FIFO Hardware/software interface
Data tables: TCAM and SRAM User plane device: switch fabric
High-speed LVDS signals
2023/4/22
Organization of the data tables
7 2 - b it
6 4 K
3 2 -b it
D es t_ I P _ Ad d r3 2 K
6 4
6 4
2 K
S r c _ I P _ Ad d r ( 3 2 -b it) + LS P ID (1 6 - b it) + D es t_ IP _ Ad d r ( 3 2 - b it) + T u n n e l I D ( 1 6 - b it)+ Ex ten d ed T u n n el ID (3 2 -b it)1 K
1 2 8 - b it
3 8 - b it
3 8 - b it
3 2 - b it
6 4 -b it
6 4
0 x 0 0 0 0
0 x 0 BC D1 2 8 .2 3 8 .3 4 .1 1 7 1 2 8 .1 4 3 .7 4 .1 9 3
3 2 -b itTC A M ( I n de x ) S R A M (R e t u rn v a lu e )
N ex t_ I P _ Ad d r _ Us er
S tate in f o r m atio n
0 x F F F F
0 x 0 0 0 0
0 x F F F F
( 2 2 8 - b it)
1 2 - b it6 4
FPG A
Un u s e d
Un u s e dUn u s e d
N ex t_ I P _ Ad d r_ Us er N ex t_ I P _ Ad d r _ C tr l
P r e_ I P _ Ad d r+ I n ter f ac e_ I D
N ex t_ I P _ Ad d r+ In ter f ac e_ I D
CAC table Routing table
Incoming Conn tbl
Outgoing Conn tblUser/Ctrl
Mapping tblState table
2123/4/22
Clock and power distribution schemes
1 0 0 M H z
7 8 M H z
D C M2
100M
Hz
5 0 M H z
D C M 5 0 M H z
3 3 M H zL 8 1 0 4SC L K
ID T7 2 V3 6 1 0 0
X C 2 V3 0 0 0
ID T7 1 V2 5 5 6 A
ID T7 1 V2 5 5 6 A
ID T7 5 P 5 2 1 0 0VSC 9 1 8 2
GC L KD C M
D C M
5 0 M H z
C L K 2 X
W C L K /R C L K
T C L K
C L K
C L K
SY SC L K P /N
H 1 6 3 6 AR E F C L K 1 2 5 M H z T B C
F r o m P C I bus3 3 M H z
P C L K
R E GC L K
1 2 5 M H z
Del
aylin
e
D e la y lin e
T C A M
SR AM
SR AM
FP G A
S w itc h fa br icF IF O
S e rD e s
G bE M A C
Clock distribution scheme
Power distribution scheme Two extra power supplies
2223/4/22
Processing of signaling message — simulation results
P re vio uss tage r e ady
TC AMinte r fac e
SR AMinte r fac e
Switc h fabr icinte r f ac e
R e s vm e s s age
R e ad Statetable
P ro grams witc h fabr ic
E nd o fpro c e s s ing
P athm e s s age
TC AMinte r f ac e
SR AMinte r fac e
C ACtable
P re vio uss tage re ady
R o utingtable
Inc o m ingC o nn tab le
O utgo ingC o nn table
U 2 C M appingtable
S tatetable
E nd o fpro c e s s ing
TC AMinte r fac e
SR AMinte r f ac e
C ACtable
P athTe arm e s s age
R e ad Statetable
R e le as e al lo c ate dt im e s lo t
P re vio uss tage r e ady
E nd o fpro c e s s ing
2323/4/22
Implementation and simulation results
Device PCI core Resource
Eq.Gates
Max freq.
XC2V3000 w/o PCI 12% 360,000 90MHzw/ PCI 21% 630,000 50MHz
Implementation results
Path Resv PathTear/ResvTear
Clock cycles 40 32 19
Simulation results (@50MHz)
2423/4/22
Outline Background and problem statement OCSP: a performance-oriented signaling
protocol and its hardware implementation A subset of RSVP-TE signaling protocol and its
hardware implementation Comparison of signaling transport options Implications of hardware-accelerated signaling Conclusions and future work
2523/4/22
In-band signaling and out-of-band signaling
S W 1
S W 2 S W 3
S W 6
S W 4 S W 5
S W 2 S W 3
S W 4 S W 5
S W 1 S W 6
IP netw o rk
S ignalingU s er
R 1
R 2 R 3
R 6R 4 R 5
( a) In-band s ignal ing (b) O ut-o f -band s ignal ing
2623/4/22
Models for in-band signaling and out-of-band signaling
proc
1IBtx
2IBtx
nIBtx
1q
2q
nq
( a) In-B and S ignal ing, whe re1
1n
ii
q
OOBtx
(b) O ut-o f -B and S ignal ing
proc
2723/4/22
Set-up delay analysis
2
2 30
0
1 ((1 ) 2 (1 ) 3 (1 ) ...)
( (1 ) 3 (1 ) 7 (1 ) ...)1 1
1 1 2
n txproc
n txproc
E T T E T p p p p p
T p p p p p ppT E T T
p p
Total delay = processing delay + network delay + transmission delay (retransmissions)
Assuming T0 = 3Tn, M/D/1 queue for E[Ttx], we have
11 1 1 12[ ] ,1 1 1 2 n
proc tx
pE T Tp p
tx
: aggregate signaling message arrival rate
: service rate of the
signaling processor : service rate of the
signaling transmitter
proc
p
tx
: packet loss rate : transmission time : one-way network delay : initial time-out valuen
o
TTT
2823/4/22
In-band/out-of-band signaling with hardware signaling, metro area, μtx=μproc
Numerical results In-band/out-of-band signaling with hardware signaling, wide area,
μtx=μproc
In-band/out-of-band signaling with hardware signaling, metro area, μtx<<μproc
In-band/out-of-band signaling with hardware signaling, wide area, μtx<<μproc
In-band/out-of-band signaling with software signaling, metro area In-band/out-of-band signaling with software signaling, wide area
2923/4/22
A sum-up of the comparison With hardware signaling accelerators
In-band signaling is the way to go Network delays dominate the total delay
With software signaling processors In wide area –> in-band signaling In metro area –> out-of-band signaling
is a good choice
3023/4/22
Outline Background and problem statement OCSP: a performance-oriented signaling
protocol and its hardware implementation A subset of RSVP-TE signaling protocol and its
hardware implementation Comparison of signaling transport options Implications of hardware-accelerated signaling Conclusions and future work
3123/4/22
End-to-end circuits for large file transfers End-to-end circuits only justifiable for large files
( [ | ( )]) /[ ] ( [ | ( )]) /
cc
setup c
E X X ruE T E X X r
[ ] : avg file size[ ] : avg call setup delay
: crossover file size : circuit rate
setup
c
E XE T
r
Assuming 10Mbps signaling link, 100Mbps data link, 20 switches, in order to achieve 90% per-circuit utilization
Crossover file size
HW sig. (4us)
SW sig. (200ms)
Metro area (0.1ms)
40KB 80MB
Wide area (50ms) 330KB 80MB
Define a crossover file size , per-circuit utilization is given by
3223/4/22
Fractional offered load Assuming file size follows pareto distribution
Define fractional offered load 1' ( ) [ ] | ] ( )
( )kP X E X X
E X
' % of40KB 81%330KB 71%80MB 51%
With hardware-accelerated signaling In metro area, 81% of the offered load can be
transferred through end-to-end circuits In wide area, 71% of the offered load can be
transferred through end-to-end circuits With software signaling, this number is 51%
3323/4/22
Hardware-accelerated signaling and network survivability Two approaches for network survivability
Protection requires pre-allocated resources and reacts to failure rapidly SONET APS requires 100% resource redundancy (1+1) and can
recover in 50ms Restoration dynamically sets up a secondary path after a
failure - less resource redundancy but longer recovery delay Hardware-accelerated signaling
Less resource redundancy and acceptable recovery delay (<200ms)
A sample network with 13 nodes and 22 links Recover from one link failure in 200ms with 9% extra resources
3423/4/22
Outline Background and problem statement OCSP: a performance-oriented signaling
protocol and its hardware implementation A subset of RSVP-TE signaling protocol and its
hardware implementation Comparison of signaling transport options Implications of hardware-accelerated signaling Conclusions and future work
3523/4/22
Conclusions and future work Hardware-accelerated signaling is feasible and
our implementation demonstrates a 100x-1000x speedup vis-à-vis software implementations
Applications like large file transfer and network restoration can benefit from hardware signaling and a well-devised signaling transport scheme
Future work Finish the design and testing of the prototype board New architectures and applications that can fully
utilize the benefit of hardware-accelerated signaling
3623/4/22
Thank you!