federico alessio, cern zbigniew guzik, ipj, swierk, poland richard jacobsson, cern a 40 mhz...
TRANSCRIPT
Federico Alessio, CERNZbigniew Guzik, IPJ, Swierk, Poland
Richard Jacobsson, CERN
A 40 MHz Trigger-free Readout Architecture
for the LHCb experiment
16th IEEE-NPSS Real Time Conference, 10-15 May 2009, Beijing, China
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
LHCb @ LHC Instantaneous luminosity in IP: tunable from 2x1032 cm-2s-1 to 5x1032 cm-2s-1 (factor 50 less than nominal LHC lumi)
Future of LHCb“The LHCb Readout System and Real-Time Event Management”, TDA2-1, Thursday , 8h30
Expected ∫L = 10 fb-1 collected after 5 years of operations Probe/measure NewPhysics at 10% level of sensitivity Measurements limited by statistics and detector itself , NOT BY LHC
Federico Alessio 2
LHCb @ S-LHC
Collect ∫L = 100 fb-1 a factor 10 increase in data sample and in reasonable time probe NewPhysics down to a percent level
Increase luminosity by a factor 10 @ LHCb, up to 2x1033 cm-2s-1 assuming same bunch structure, 30 MHz S-LHCb effective interaction rate vs. 10 MHz LHCb1 MHz bb-pair rate @ S-LHCb vs. 100 KHz @ LHCb
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
How to survive?
How?
Original LHCb performance as a baseline, new technologies for sub-detectors to be replaced
More radiation hardReduced spill-overImproved granularity
Continuous 40 MHz Trigger-free Readout Architecture all detector data passed through the readout network all detector data available for High-Level Trigger (HLT) In practice
Federico Alessio 3
Pile-up problem: Current LHCb not designed for multiple interactions per crossing
<N> = 0.5 @ 2x1032 cm-2s-1 and <N> = 4 @ 20x1032 cm-2s-1 Higher radiation damages over time Spill-over not minimized completely First-level trigger limited for hadronic modes at >2x1032 cm-2s-1 25% efficiency vs. 75% for muonic modes
Increase hadron trigger efficiency by at least a factor 2 At S-LHCb 1 MHz bb-pair rate Trigger-free
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
SWITCH
HLT farm
Detector
Timing & Fast Control System
SWITCHSWITCH SWITCH SWITCH SWITCH SWITCH
READOUT NETWORK
LHC clock
MEP Request
Event building
Front-End
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Readout Board
VELO ST OT RICH ECal HCal Muon
SWITCH
MON farm
CPU
CPU
CPU
CPU
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
L0 trigger
L0 Trigger
LHCb Readout SystemUpgraded
Rethink/Redraw/Adapt/
Upgrade/Replace
Federico Alessio 4
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
No L0-trigger
Point-to-point bidirectional high-speed optical links Same technology and protocol type for readout, TFC and throttle Reducing number of links to FE by relaying ECS and TFC information via ROB
THROTTLE
TFC DATA BANK
EVENT REQUESTS
FE FARM
TFC
ROB
L0TRIGGER
TFC
DATATFC
DATA
LHC CLOCK
ECSTHROTTLE
TFC DATA BANK
EVENT REQUESTS
FE FARM
TFC
ROB
L0TRIGGER
TFC
DATATFC
DATA
LHC CLOCK
ECS
Architectures, Old vs New
Federico Alessio 5
S-FE S-FARM
S-TFC
DATA
TFC DATA BANK
EVENT REQUESTS
S-ROB
S-ECS
TFC, DATA, ECS
TFC, THROTTLE
S-LHC CLOCK
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
Need to define protocols. Very likely the readout link FE-ROB and the protocol will be based on CERN-GigaBitTransceiver (GBT)
Need to define buffer sizes and truncation scheme to be compliant with the worst scenario possible (big consecutive events which could overflow memories).
Need to fully control the phase of the recovered clock at the FE. Necessary reproducibility of the clock phase each time the system is switched off/on
The jitter of the reconstructed clock must be very small (< 10ps RMS).
Need to control the rate in order to allow a “staged” installation
Partitioning as a crucial aspect for parallel stand-alone tests and sub-detectors development (test-bench support)
Overall Requirements
Federico Alessio 6
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
The S-FE records and transmits data @ 40 MHz, via optical link @ 4.8 Gb/s (3.2 Gb/s data)
Implications for Front-End
It is necessary that Zero Suppression is performed in rad-hard FE
Asynchronous data transfer Data has to be tagged with identifiers in header Realigned in Readout Boards
Federico Alessio 7
Court
esy
Ken W
ylli
e, LH
Cb
NZS data, event size is 400kB@40MHz = ~16TB/s!!
D ADC ZERO SUPPRESS
DERANDOMIZING BUFFER
S-FE logical scheme
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
Timing and Fast Control (1)
Federico Alessio 8
Readout system requires timing, synchronization and various synchronous and asynchronous commands
Receive, distribute and align LHC clock and revolution frequency to readout electronics
Transmit synchronous reset commands, calibration sequences and control the latency of commands
Back-pressure mechanism from S-ROB to handle network congestion
1. Effectively, throttle the readout rate
2. Possibly implementing an “intelligent” throttle mechanism, capable of distinguish interesting physics events locally in each S-ROB
S-FE S-FARM
S-TFC
DATA
TFC DATA BANK
EVENT REQUESTS
S-ROB
S-ECS
TFC, DATA, ECS
S-LHC CLOCK
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
Farm has to grow in size, speed and bandwidth
Destination Control for the event packets in order to let the S-ROBs know where to send the event (to which IP address)
Request Mechanism (EVENT REQUESTS) to let the destination controller in the TFC system know if a node is available or not. The definition of such a readout scheme is a “push protocol with a passive pull”
A data bank has to contain info about the identity of an event and trigger source information. This info is added to each event (TFC DATA BANK)
New TFC system (prototype of S-TFC) has to be ready well before the rest of the electronics in order to allow development and testing, and validate conformity with the overall specs
Timing and Fast Control (2)
Federico Alessio 9
S-FE S-FARM
S-TFC
DATA
TFC DATA BANK
EVENT REQUESTS
S-ROB
S-ECS
TFC, DATA, ECS
S-LHC CLOCK
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
S-TFC Architecture(i.e. the new s-heartbeat of the LHCb experiment)
Federico Alessio 10
SWITCH
HLT farm
Detector
TFC System
SWITCHSWITCH SWITCH SWITCH SWITCH SWITCH
READOUT NETWORK
LHC clock
MEP Request
Event building
Front-End
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Readout Board
VELO ST OT RICH ECal HCal Muon
SWITCH
MON farm
CPU
CPU
CPU
CPU
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
S-LHC Timing & Info
Clock Fanout
CLK
TFC SERVER
S-ECS
PH
YProgrammable Switch layer
(Partitioning)
Built-in GX Transceivers layer
PH
Y S-FARM
6
6
#links = #LHCb sub-systems ~20m distance
2.4-3.0 Gb/s optical
S-TFC Master
S-TFC Interface
TF
C, T
hrot
tle
TFC-Master logic
FAN-OUT/FAN-IN Logic
+optional
TFC Master Logic
Switch Logic
From S-ROBsThrottle @ 1.6Gb/s
via SERDES
#S-ROB/crateBui
lt-in
S
ER
DE
S la
yer
To S-ROBsTFC @ 2.4-3.0Gb/svia GX transceiver & electrical FAN-OUT
Master FPGA (STRATIX IV GX) Slave FPGA (NIOS II on
CYCLONE III)
TFC-Master Instantiations (x6)
Master FPGA (ARRIA II GX)
FA
N-O
UT
D
TF
C+
CO
NF
THROTTLE
Readout Logic
TFC ENCODER/DECODER
TFC
SE
RD
ES
DA
TA
+M
ON
CE
RN
-GB
T
D
DA
TA
+M
ON
CE
RN
-GB
T
D
DA
TA
+M
ON
CE
RN
-GB
T
DD
AT
A+
MO
N
CE
RN
-GB
T
D
S-ROB (crate of i.e. 20)
S-FE (single slice)
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
S-TFC Master S-TFC Interface linkTFC control fully synchronous 60bits@40MHz 2.4 Gb/s (max 75 bits@ 40 MHz 3.0 Gb/s)
1. Reed Solomon-encoding used on TFC links for maximum reliability (header ~16 bits) (ref. CERN-GBT)
2. Asynchronous data TFC info must carry Event ID
Throttle(“trigger”) protocol
1. Must be synchronous (currently asynchronous) Protocol will require alignment
TFC control protocol incorporated on link between S-FE and S-ROB (i.e. CERN GBT)
Federico Alessio 11
S-TFC Protocols
S-TFC Interface S-ROBCopper or backplane technology (In practice 20 HI-CAT bidirectional links)TFC synchronous control protocol same as S-TFC Master S-TFC Interface
One GX transmitter with external transmitter 20x-fan-out (PHYs - electrical)Throttle(“trigger”) protocol using 20x SERDES interfaces <1.6 Gb/s
EVENT ID (4-12 bits)
TFC information (40-32 bits)
ReedSolomon-FEC (16 bits)
THROTTLE information (20 bits)
OTHERSEVENT ID (4-12 bits)
ReedSolomon-FEC (16 bits)
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
Need to define protocols. Very likely the readout link FE-ROB and the protocol will be based on CERN-GigaBitTransceiver (GBT)
Need to define buffer sizes and truncation scheme to be compliant with the worst scenario possible (big consecutive events which could overflow memories).
Need to fully control the phase of the recovered clock at the FE Necessary reproducibility of the clock phase each time the system is switched off/on
The jitter of the reconstructed clock must be very small (< 10ps RMS).
Need to control the rate in order to allow a “staged” installation
Partitioning as a crucial aspect for parallel stand-alone tests and sub-detectors development (test-bench support)
Overall Requirements
Federico Alessio 12
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
S-LHC Timing
Clock Fanout
CLK
TFC SERVER
S-ECS
PH
Y
Programmable Switch layer (Partitioning)
Built-in GX Transceivers layer
PH
Y S-FARM
6
6
#links = #LHCb sub-systems ~20m distance
2.4-3.0 Gb/s optical
S-TFC Master
S-TFC Interface
TF
C, T
hrot
tle
TFC-Master logic
FAN-OUT/FAN-IN Logic
+optional
TFC Master Logic
Switch Logic
From S-ROBsThrottle @ 1.6Gb/s
via SERDES
#S-ROB/crateBui
lt-in
S
ER
DE
S la
yer
To S-ROBsTFC @ 2.4-3.0Gb/svia GX transceiver & electrical FAN-OUT
Master FPGA (STRATIX IV GX) Slave FPGA (NIOS II on
CYCLONE III)
TFC-Master Instantiations (x6)
Master FPGA (ARRIA II GX)
FA
N-O
UT
S-ROBT
FC
+C
ON
F
THROTTLE
Readout Logic
TFC ENCODER/DECODER
TFCS
ER
DE
S
DA
TA
+M
ON
CE
RN
-GB
T
D
DA
TA
+M
ON
CE
RN
-GB
T
D
DA
TA
+M
ON
CE
RN
-GB
T
D
DA
TA
+M
ON
CE
RN
-GB
T
D
Federico Alessio 13
Reaching the requirements: phase control
Use of commercial electronics:
Clock fully recovered from data transmission (lock-to-data mode)
Phase adjusted via register on PLL Jitter mostly due to transmission over
fibres, could be minimized at sending side
FPGA S-TFC Master FPGA S-ROBFPGA S-TFC Interface
TX[0] TX[1]RX[0] RX[1]
WABIT
SLIPWA
PLL
# of bit-slips
WA bitslip
out
Clock phase shift (left)
Phase shift
(right)
External clock
Clock output
BIT X-2 BIT X-1 BIT X BIT X+1
1. Use commercial or custom-made Word-Aligner output
2. Scan the phase of clock within “eye diagram”
BIT X-1 BIT X BIT X+1
Still investigating feasibility and fine precision
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
Federico Alessio 14
Simulation
Full simulation framework to study buffer occupancies,
memories sizes, latency,
configuration and logical blocks
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
Federico Alessio 15
Summary
New approach towards a 40 MHz trigger-free architecture Evaluated old system and carry over experience
Use of point-to-point optical link technology for the entire readout system
Maximum level of flexibility reached by using FPGA-based boards no need of complex routing, no need of big number of “different” boards GX transceiver as IP cores from Altera
No First-Level Trigger and no direct link to FE from TFC
Essential to fully control the phase and latency of the clock and TFC infoValidation with prototype is underway
TFC System prototypes must be ready before any other
Developing full simulation framework
Thanks for your attention
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
BackupBackup
Federico Alessio 16
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
Intro: Giving out Numbers
From today (2009) to the near future (2013): Clock rate of 40 MHz, effective rate of events of 10MHz Expects to collect 10 fb-1, which allows for wide range of analysis, with high sensitivity to new physics. Foreseen spectacular progress in heavy flavour physics Readout Supervisor based on 4-FPGAs fully programmable (total of ~25k logical elements) and customizable (40/80 MHz clock speed and output based on 1 Gbit/s cards) Readout network based on optical links (200 MB/s, ~400 links) ~16000 CPU cores foreseen for the LHCb Online Farm; ~1000 3GHz Intel Harpertown quad-cores (~4500 individual cores) at present Storage system of > 50 TB at 400/500 MB/s Uninterrupted readout of data at 1MHz, effective reduction of factor 10 in selecting events. Event size of 3,5x104 Bytes “dumped” in the Grid. Full (and 100% reliable) Readout Control System in place (ETM PVSS II). It is able to start, configure and control some ~20000 elements between FARM and FEE and ROBs
Federico Alessio 17
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
Board with one big central FPGA (Altera Stratix IV GX or alt. Stratix II GX for R&D)
Instantiate a set of TFC Master cores to guarantee partitioning control for sub-detectors
TFC switches is a programmable patch fabric: a layer in FPGA no need of complex routing, no need of “discrete” electronics
Shared functionalities between instantiations (less logical elements) More I/O interfaces based on bidirectional transceivers
depend on #S-ROBs crates No direct links to FE Common server that talks directly to each instantiation:
TCP/IP server in NIOS II Flexibility to implement (and modify any protocol)
GX transceiver as IP cores from Altera Bunch structure (predicted/measured) rate control State machines for sequencing resets and calibrations Information exchange interface with LHC
Federico Alessio 18
S-TFC Master, specs
16th IEEE NPSS Real Time Conference, 10-15 May 2009, Beijing, China.
Board with FPGA entirely devoted to fan-out TFC information/fan-in throttle info
Controlled clock recovery Shared network for Throttling (Intelligent) & TFC distribution All links bidirectional
1 link to S-TFC Master, 2.4 - 3.0 Gb/s, optical 1 link per S-ROB, 20 max per board (full crate)
Technology for S-ROBs links could be backplane (ex. xTCA) or copper HI-CAT
Protocol flexible: compatibility with flexibility of S-TFC Master We will provide the TFC transceiver block for S-ROBs’ FPGA to bridge
data to FE through readout link S-FE S-ROB
For stand-alone test benches, the Super-TFC Interface would do the work of a single TFC Master instantiation
Federico Alessio 19
S-TFC Interface, specs