data acquisition at cbm(fair) walter f.j. müller, gsi, darmstadt for the cbm collaboration irtg...
TRANSCRIPT
Data Acquisition at Data Acquisition at CBM(FAIR)CBM(FAIR)
Walter F.J. Müller, GSI, Darmstadtfor the CBM Collaboration
IRTG Lecture Week 2007 Bergen, Norway, 11-15 April 2007
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
2
OutlineOutline
FAIR (very briefly) CBM (briefly)
observables setup
FEE/DAQ/Trigger requirements/challenges 3 case studies
n (cbm) –xyter InfiniBand Cell and SIMDization
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
3
storage and cooler rings
• beams of rare isotopes• e – A Collider• 1011 stored and cooled antiprotons 0.8 - 14.5 GeV
primary beams
• 5∙1011/s; 1.5-2 GeV/u; 238U28+
• factor 100-1000 increased intensity• 4x1013/s 90 GeV protons• 2x109/s 238U 35 GeV/u ( Ni 45 GeV/u)
secondary beams
• rare isotopes 1.5 - 2 GeV/u; factor 10 000 increased intensity • antiprotons 3(0) - 30 GeV
accelerator technical challenges
• Rapidly cycling superconducting magnets• high energy electron cooling• dynamical vacuum, beam losses
FFacility for acility for AAntiproton and ntiproton and IIon on RResearchesearch
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
4
Rare isotope beams: nuclear structure and nuclear astrophysics nuclear structure far off stability
nucleosynthesis in stars and supernovae
Beams of antiprotons: hadron physics quark-confinement potential
search for gluonic matter and hybrids hypernuclei
Nucleus-nucleus collisions: compressed baryonic matter baryonic matter at highest densities (neutron stars) phase transitions and critical endpoint in-medium properties of hadrons
Short-pulse heavy ion beams: plasma physics matter at high pressure, densities, and temperature fundamentals of nuclear fusion
Atomic physics, FLAIR, and applied research highly charged atoms low energy antiprotons radiobiology
Research ProgramsResearch Programs
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
5
BBaseline aseline TTechnical echnical RReporteport
Volume 1 Executive Summary
Volume 2 Accelerator and Scientific Infrastructure
Volume 3A Experiment Proposals on QCD Physics 3.1 CBM
Volume 3B Experiment Proposals on QCD Physics 3.2 PANDA
3.3 PAX 3.4 ASSIA
Volume 4 Experiment Proposals on Nuclear Structure & Astro Physics (NUSTAR) 4.1 LEB-SuperFRS 4.2 HISPEC/DESPEC
4.3 MATS 4.4 LASPEC 4.5 R3B 4.6 ILIMA
4.7 AIC 4.8 ELISe 4.9 EXL
Volume 5 Experiment Proposals on Atomic, Plasma & Applied Physics (APPA) 5.1 SPARC 5.2 HEDgeHOB
5.3 WDM 5.4 FLAIR 5.5 BIOMAT
Volume 6 Civil Construction and Safety
Official project description: 66 Volumes with morethan 3500 3500 pages and more than 26002600 authors
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
6
StatusStatus Construction cost of the FAIR project: ~ 1 Billion € International project: 25% from foreign partners German Federal Government has approved construction budget
FAIR is in budget plan for the next 10 years MoU to build and operate FAIR signed by 14 states:
AU CN DE ES FI FR GB
GR IN IT PL RO RU SE
FAIR Joint Core Team established Next steps (within 12 month) :
Joint declarations with partner states Conclude negotiations with partner states Signature of FAIR Convention and Final Act Formation of FAIR GmbH
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
7
ConstructionConstruction
Time lines:Start: 2008Finish: 2015
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
8
CBM
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
9
CBM Physics caseCBM Physics case
Compressed Baryonic Matter @ FAIR – high mB, moderate T:
searching for the landmarks of the QCD phase diagram• first order deconfinement phase transition • chiral phase transition• QCD critical point
Investigate: A+A 10-45 AGeV p+A 10-90 GeV
Physics program complementary to ALICE
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
10
CBM Physics Topics and CBM Physics Topics and ObservablesObservables Equation of state at high ρB
collective flow of hadrons particle production at threshold energies
measure: D0, D±
Deconfinement phase transition at high ρB
excitation function and flow of strangeness measure: K, , , ,
excitation function and flow of charm measure: J/, ', D0, D±, c
sequential melting of J/ and ', charmonium suppression measure: J/, '
QCD critical point excitation function of event-by-event fluctuations
measure: π, K
Onset of chiral symmetry restoration at high ρ in-medium modifications of hadrons
measure: , , e+e- or μ+μ-
CBM Physics Book
in preparation
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
11
CBM Detector RequirementsCBM Detector Requirements
measure: π, K
measure: K, , , ,
measure: D0, D±, Ds, c
measure: J/, ' e+e- or μ+μ-
measure: , , e+e- or μ+μ-
measure: γ
Hadron identification
Vertex detector
Good e/π separation
Good μ/π separation
Low cross sections
→ High rates
→ Selective Triggers
Hadrons
Leptons
Photons Simulations indicate:J/, ' better in μ+μ- , , better in e+e-
try to do both
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
12
CBM Layout (e-mode)CBM Layout (e-mode)
1m
500 mrad
50 mrad
STS
RICH
TRD
TOFECAL
PSD
MVD
STS Silicon trackerMVD Micro vertex detectorRICH Ring Imaging ChamberTRD Transition Rad. DetectorTOF Time-of-flight (RPC)ECAL Photon armsPSD Projectile spectator detector
10 m
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
13
CBM Layout (CBM Layout (μμ-mode)-mode)
1m
500 mrad
50 mrad
STS
TRD
TOFECAL
PSD
MVD
STS Silicon trackerMVD Micro vertex detectorRICH Ring Imaging ChamberTRD Transition Rad. DetectorTOF Time-of-flight (RPC)ECAL Photon armsPSD Projectile spectator detector
10 m
MUCH
RICH
STS Silicon trackerMVD Micro vertex detectorMUCH Muon ChambersTRD Transition Rad. DetectorTOF Time-of-flight (RPC)ECAL Photon armsPSD Projectile spectator detector
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
14
CBM Layout CBM Layout {ongoing tuning}{ongoing tuning}
1m
500 mrad
50 mrad
STS
TRD
TOFECAL
PSD
MVD
STS Silicon trackerMVD Micro vertex detectorRICH Ring Imaging ChamberTRD Transition Rad. DetectorTOF Time-of-flight (RPC)ECAL Photon armsPSD Projectile spectator detector
10 m
RICHTRD
ITS
STS Silicon trackerMVD Micro vertex detectorRICH Ring Imaging ChamberTRD Transition Rad. DetectorITS Intermediate TrackerTOF Time-of-flight (RPC)ECAL Photon armsPSD Projectile spectator detector
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
15
The CBM ExperimentThe CBM Experiment
MVD + STSaim: optimize setup to include both, electron and muon ID (not necessarily simultaneously)
• electron ID: RICH & TRD p suppression 104
• muon ID: absorber + detector sandwich move out absorbers for hadron runs
• tracking, momentum determination, vertex reconstruction: radiation hard silicon pixel/strip detectors (STS) in a magnetic dipole field
• hadron ID: TOF (& RICH)• photons, p0, m: ECAL
• PSD for event characterization• high speed DAQ and trigger → rare probes!
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
16
CBM and HADES 2005 CBM and HADES 2005 → → 20072007All you want to know about CBM:
Technical Status Report (400 p)http://www.gsi.de/documents/DOC-2005-Feb-447-1.pdf
CBM Progress Report 2006 (60p)http://www.gsi.de/documents/DOC-2007-Mar-137-1.pdf
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
17
Meson Production in central Meson Production in central Au+AuAu+AuW. Cassing, E. Bratkovskaya, A. Sibirtsev, Nucl. Phys. A 691 (2001) 745
10 MHz interaction rateneeded for 10-15 A GeV
SIS300
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
18
CBM Trigger RequirementsCBM Trigger Requirements
measure: π, K
measure: K, , , ,
measure: D0, D±, Ds, c
measure: J/, ' e+e- or μ+μ-
measure: , , e+e- or μ+μ-
measure: γ
Hadrons
Leptons
Photons
trigger <10 AGeV
trigger
trigger e+e-
offline
offline >10 AGeV
offline ?
offline for e+e-
trigger for μ+μ- ?
assume archive rate:few GB/sec20 kevents/sec
trigger on high pt e+ - e- pair
trigger ondisplaced vertex
drives FEE/DAQarchitecture
trigger μ+μ-
μ identification
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
19
Open Charm DetectionOpen Charm Detection Example: D0 K-+ (3.9%; c = 124.4 m) reconstruct tracks find primary vertex find displaced tracks find secondary vertex
target
first two planesof vertex detector
few 100 μm
5 cm
high selectivity because combinatorics is reduced
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
20
A Typical Au+Au CollisionA Typical Au+Au Collision
Central Au+Au collision at 25 AGeV:URQMD + GEANT
160 p 170 n360 - 330 + 360 0 41 K+ 13 K- 42 K0
up to 107 Au+Au interactions/sec
109 tracks/sec to reconstruct for first level event selection
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
21
CBM DAQ Requirements ProfileCBM DAQ Requirements Profile
D and J/Ψ signal drives the rate capability requirements D signal drives FEE and DAQ/Trigger requirements
Problem similar to B detection, like in LHCb or BTeV (rip)
Adopted approach:
displaced vertex 'trigger' in first level, like in BTeV (rip)
Additional Problem:
DC beam → interactions at random times
→ time stamps with ns precision needed
→ explicit event association needed Current design for FEE and DAQ/Trigger:
Self-triggered FEE Data-push architecture
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
22
Conventional FEE-DAQ-Trigger Conventional FEE-DAQ-Trigger LayoutLayout Detector
Cave
Shack
FEE
Buffer
L2 Trigger L1 Trigger
DAQ
L1 A
ccep
t
L0 Trigger
fbunch
Archive
Trigger
Primitives
Especially
instrumented
detectors
Dedicated
connections
Specialized
trigger
hardware
Limited
capacity
Limited
L1 trigger
latency
Modest
bandwidth
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
23
Limits of Conventional Limits of Conventional ArchitectureArchitecture
Decision time for first level trigger limited.
typ. max. latency 4 μs for LHC
Only especially instrumenteddetectors can contribute to
first level trigger
Large variety of veryspecific trigger hardware
Not suitable for complexglobal triggers like secondary
vertex search
Limits future triggerdevelopment
High development cost
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
24
L1 Trigger
High
bandwidth
The way out .. use Data Push The way out .. use Data Push ArchitectureArchitectureDetector
Cave
Shack
FEE
Buffer
L2 Trigger L1 Trigger
DAQ
L1 A
ccep
t
L0 Trigger
fbunch
Archive
Trigger
Primitives
Especially
instrumented
detectors
Dedicated
connections
Specialized
trigger
hardware
Limited
capacity
Limited
L1 trigger
latency
Modest
bandwidth
fclock
L2 Trigger
Timedistribution
FIFOBuffer
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
25
L1 Trigger
High
bandwidth
The way out .. use Data Push The way out .. use Data Push ArchitectureArchitectureDetector
Cave
Shack
FEE
DAQ
Archive
fclock
L2 Trigger
Buffer
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
26
L1 Select
High
bandwidth
The way out .. use Data Push The way out .. use Data Push ArchitectureArchitecture Detector
Cave
Shack
FEE
DAQ
Archive
fclock
L2 Select
Self-triggered front-end
Autonomous hit detection
No dedicated trigger connectivity
All detectors can contribute to L1
Large buffer depth available
System is throughput-limited
and not latency-limited
Use term: Event Selection
Buffer
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
27
Front-End for Data Push Front-End for Data Push ArchitectureArchitecture Each channel detects autonomously all hits An absolute time stamp, precise to a fraction of the
sampling period, is associated with each hit All hits are shipped to the next layer (usually
concentrators) Association of hits with events done later using time
correlation
Typical Parameters: with few 1% occupancy and 107 interaction rate:
some 100 kHz channel hit rate few MByte/sec per channel whole CBM detector: 1 Tbyte/sec
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
28
Typical Self-Triggered Front-Typical Self-Triggered Front-EndEnd Average 10 MHz interaction rate Not periodic like in collider On average 100 ns event spacing
0 5 10 15 20 25 30 time
ampl
itude
50
100
a: 126 t: 5.6
a: 114 t: 22.2
Use sampling ADCon each detector
channel running withappropriate clock
Time is determinedto a fraction of thesampling period
threshold
2005
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
29
Toward Multi-Purpose FEE Toward Multi-Purpose FEE Chain Chain
PreAmpPreAmp preFilter
preFilter ADCADC Hit
Finder
Hit
Finderdigital
Filter
digital
FilterBackend
& Driver
Backend
& Driver
Pad GEM's PMT APD's
analogAnti-
AliasingFilter
Sample rate:10-100 MHz
Dyn. range:8...12 bit
digital'Shaping'
1/t Tailcancellation
Baselinerestorer
Hit parameterestimators:
Amplitude
Time
Clustering
Buffering
Link protocol
All potentially in one mixed-signal chip
2005
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
30
Self-triggered DAQ – View Self-triggered DAQ – View beyond HEPbeyond HEP Using a 'trigger' is considered natural in the HEP world
large and complex events one wants to select events of interest
In other areas, using same detector technologies, the revere is true very simple events (single hits, 2-fold coincidence) no trigger possible (e.g. singles), or needed (e.g. all events relevant) raw statistics is the key factor
Examples: thermal neutron scattering PET scanners
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
31
Si-StripDetector
300 μm
Si-StripDetector
300 μm
Detection of thermal neutronsDetection of thermal neutrons
key observable is θ-φ distribution of scattered neutrons to detect neutron use converter
157Gd 157Gd+n → 158Gd + γ's + ce's [ σ = 255000 b; λn = 1.3 μm] 10B 10B+n → 7Li + α + ~2 MeV [ σ = 3838 b; λn = 20 μm]
to determine θ-φ combine converter with a position sensitive detector Example:
neutron
conversionelectrons
157Gd
2 μm
strip inx direction
strip iny direction
neutron hit
↔coincidence of X and Y
strip
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
32
The DETNI ProjectThe DETNI Project
Hahn Meitner Institut, Berlin GSI, Darmstadt Phys. Inst., Univ. Heidelberg Forschungszentrum Jülich
DETNIDETNI – DetDetectors for NNeutron IInstrumentation
Mission:Develop and prototype three different advanced area sensitive detector systems including read-out ASIC within EU FP-6
Gd / Si-Microstrip Gd – CsI / MSGC B / GEM (CASCADE)
Goals: very high rates (100 MHz) mm and sub-mm resolution highest possible detection efficiency
Read-out ASCI common for Silicon
and gas detectors
Read-out ASCI common for Silicon
and gas detectors
Read-out ASCI common for Silicon
and gas detectors
AGH Univ. of Science and Tech., Krakow IFN-Polish Academy of Sciences, Krakow INFN Milano INFN Perugia
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
33
DETNI-A DETNI-A 157157Gd/Si Mircostrip Gd/Si Mircostrip DetectorDetector
• Eth(5s) 10 keV
• ENC 550 e- (20 - 30 pF)
• cps (global) 2.5 x 107
• cps/strip 7.5 x 104
• dT (x/y) 4 ns
• Size 51 mm x 51
mm
• No. of strips 640
• Pitch 80 µm
• dx 50 - 100 µm
• ES 29 - 250 keVCaterina Petrillo et al., Peruggia and Milano slide courtesy C.J.Schmidt
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
34
DETNI-A DETNI-A 157157Gd/Si Detector ModuleGd/Si Detector Module
slide courtesy C.J.Schmidt
100 mm
GoalsGoals 108 n/sec in 100 cm2
with 2 views, 2 hit/strip:400 MHz strip hit rate
with 5 Byte/hit:2 GByte/sec data
ConsequencesConsequences 128 channel ASIC 20 chip/module 20 MHz/chip 100 MByte/chip
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
35
DETNI-C DETNI-C CASCADE: Boron on GEM CASCADE: Boron on GEM MultilayerMultilayer
FWHM (Diffusion)G 2G 2 G 1
3 m m G 14 . 5 m m
E
4 . 5 m m
Eei - b u t a n e : 2 0 m b a rg r i dC s I e x t r a c t i o n g r i dc o l u m n a r C s I ( 2 * ~ 1 µ m )1 5 7 G d ( 2 * 0 . 5 - 1 . 5 µ m )s u p p o r t f o i l1 5 7 G d
( n , e - ) - c o n v e r t e rc a t h o d e1 0 0 µ m a n o d e2 5 µ m
s u p p o r t f o i l
570 mm 5 7 0 m m C s I 1 5 7 G d
anode
635 µm
n
SCP SCPcathodeSCP
SCP SCP 2510050cathode 56025 50 m e t a l 1d A - S C P SCP SCP
cathode
6 3 5
cathode635
FWHM (Diffusion)G 2G 2 G 1
3 m m G 14 . 5 m m
E
4 . 5 m m
Eei - b u t a n e : 2 0 m b a rg r i dC s I e x t r a c t i o n g r i dc o l u m n a r C s I ( 2 * ~ 1 µ m )1 5 7 G d ( 2 * 0 . 5 - 1 . 5 µ m )s u p p o r t f o i l1 5 7 G d
( n , e - ) - c o n v e r t e rc a t h o d e1 0 0 µ m a n o d e2 5 µ m
s u p p o r t f o i l
570 mm 5 7 0 m m C s I 1 5 7 G d
anode
635 µm
n
SCP SCPcathodeSCP
SCP SCP 2510050cathode 56025 50 m e t a l 1d A - S C P SCP SCP
cathode
6 3 5
cathode635
slide courtesy C.J.Schmidt
GEMs can be operated to be transparent for charges!
they can be cascaded!
Each one can carry two Boron layers
Last one operated as amplifier
Cumulate 5% single layer detection efficiency to give 50% for thermal neutrons (1.8Å) need 10 cascaded GEM-foils
Christian Schmidt et al., Heidelberg, Darmstadt
DETNI-B is Gd/CsI+MSGC
B. Gebauer et al, HMI
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
36
n-XYTER – The DETNI Readout n-XYTER – The DETNI Readout ASICASIC
NameName n-XYTER: NNeutron - XX, YY, TTime and EEnergy ... RR
Front-endFront-end 128(32) channels, charge sensitive pre-amplifier, both polarities 30 pF detector capacitance, ENC 1000 e self-triggered, autonomous hit detection time stamping with 1 ns resolution (needed to correlate x-y views)
ReadoutReadout energy (peak height) and time information for each hit data driven, de-randomizing, sparsifying readout 32 MHz average hit rate
128 channel version (Si,GEM): ~ 250 kHz hit / channel 32 channel version (MSGC): ~ 1 MHz hit / channel
Goal: Common readout ASIC for silicon strip and gas detectors
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
37
32 MHzreadoutrate
n-XYTER Architecturen-XYTER Architecture
PreAmpPreAmp fast shaperfast shaper
peaking timefast: 20 nsslow: 140 ns
slow shaperslow shaper
timewalk
comp.
comparatorcomparator
tokencell
timestamp
counter
tokenmanager
1 ns step
32 MHzreadoutrate
self-trigger:latch amplitudelatch timestore in FIFOs
outputdrivers
digitalFIFO
analogFIFO
peakdetect& hold
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
38
tokencell
digitalFIFO
analogFIFO
tokencell
digitalFIFO
analogFIFO
Token Ring Token Ring ReadoutReadout
tokenmanager
outputdrivers
Token Cell Processes:on token, check for data,either initiate readout in clockcycle or pass forward token Token asynchronously passes from
channel to channel in search of data Within one clock cycle token could
pass through all channels use 2 stage logic design to keep logic
path short and allow scan of 128 channels in one clock cycle
If token encounters occupied channel, data readout is initiated (1 clock cycle)
After readout of one hitof one hit the token passes to the next occupied channel.
Token manager ensures that there is one and only one token is circling
Readout clock: 32 MHz
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
39
Token Ring: Architectural Pros Token Ring: Architectural Pros & Cons& ConsProsPros handles sparsification provides together with FIFO de-randomization
100% of output bandwidth can be used for data automatic bandwidth focusing
one (or few) channels can use all the bandwidth fair distribution of bandwidth
most busy channel will loose data first in case of overload a hot channel will not block readout of the rest (only fills the bandwidth)
ConsCons data is not time sorted at output
usually resorting needed somewhere in readout and processing chain
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
40
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
0
0
0
h
T=0
h
h
Output:t=0,c=5
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
41
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
1
0
1
1
0
Output:t=0,c=5t=0,c=6
h
h
h
T=1
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
42
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
1
2
2 1
1
0
Output:t=0,c=5t=0,c=6t=2,c=7
h
h
T=2
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
43
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
3 1
3
3
2 1
1
0
Output:t=0,c=5t=0,c=6t=2,c=7t=1,c=8
h
h
h
T=3
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
44
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
3 1
3
3
2
1
0
Output:t=0,c=5t=0,c=6t=2,c=7t=1,c=8t=1,c=9
T=4
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
45
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
3 1
3
3
2
0
Output:t=0,c=5t=0,c=6t=2,c=7t=1,c=8t=1,c=9t=0,c=11
T=5
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
46
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
3 1
3
3
2
Output:t=0,c=5t=0,c=6t=2,c=7t=1,c=8t=1,c=9t=0,c=11t=3,c=0
T=6
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
47
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
3 1
3
2
Output:t=0,c=5t=0,c=6t=2,c=7t=1,c=8t=1,c=9t=0,c=11t=3,c=0t=1,c=1
T=7
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
48
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
3
3
2
Output:t=0,c=5t=0,c=6t=2,c=7t=1,c=8t=1,c=9t=0,c=11t=3,c=0t=1,c=1t=3,c=2
T=8
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
49
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
3
2
Output:t=0,c=5t=0,c=6t=2,c=7t=1,c=8t=1,c=9t=0,c=11t=3,c=0t=1,c=1t=3,c=2t=2,c=8
T=9
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
50
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
3 Output:t=0,c=5t=0,c=6t=2,c=7t=1,c=8t=1,c=9t=0,c=11t=3,c=0t=1,c=1t=3,c=2t=2,c=8t=3,c=2
T=10
token position in last cycle
token here, active
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
51
Token Ring at WorkToken Ring at Work
5
4
3
2
1
0
11
10
9
8
7
6
h
T=0
h
h
h
h
h
T=1
h
h
T=2
h
h
h
T=3
Output:t=0,c=5t=0,c=6t=2,c=7t=1,c=8t=1,c=9t=0,c=11t=3,c=0t=1,c=1t=3,c=2t=2,c=8t=3,c=2
Output not time ordered, cluster broken apart
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
52
.
Putting it together: n-XYTER Putting it together: n-XYTER ASICASIC
128 channels timestamp
counter
tokenmanager
outputdrivers
.
.
I2Cinterface
DACsslow
control
outputs:1 analog differential 8 digital LVDS (4*32 MHz)
Note: there is also a 32 channel version for MSGC readout called MSGCROC
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
53
n-XYTER Submission 1n-XYTER Submission 1
CBM started active collaboration with DETNI project
Submission of full chip in July 2006 Used technology: AMS 0.35 CMOS with thick metal four
250 dice now available, shared between
CBM and DETNI verification ongoing, no problems found n-XYTER will be
basis for detector R&D in next 2-3 years basis CBM-XYTER, a next generation chip
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
54
n-XYTER Statusn-XYTER Status
The n-XYTER is quickly becoming the designated read-out solution for many CBM detector system prototypes: obvious cases:
STS - Silicon strip detectors MUCH/TRD – GEM chambers
plausible cases: MUCH Silicon Pad chambers RICH PMT
potential cases: MUCH/TRD - MWPC chambers
Setting up N-XYTER read-out chain to support simple lab desk-top setups medium-size test beam configurations with multiple sub-systems.
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
55
Basic n-XYTER Readout ChainBasic n-XYTER Readout ChainDetector
FEB ROCX
YTER
XY
TER
XY
TER
AD
C
XY
TER
Tag data
Tag data
Tag data
Tag data
ADC data
clockFP
GA
control
SFPMGT
ABB
FP
GA
SFP MGT
MGTMGT
Front-EndBoard
Read-OutController
Active BufferBoard
Bond orcable
connection
up to 8 N-XYTER1024 ch.
LVDSsignalcable
2.5 Gbpsoptical
link
1-4 lanePCIe
interface
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
56
Scalable n-XYTER Readout Scalable n-XYTER Readout ChainChainDetector
FEB ROCX
YTER
XY
TER
XY
TER
AD
C
XY
TER
Tag data
Tag data
Tag data
Tag data
ADC data
clockFP
GA
control
SFPMGT
DCB
FP
GA
SFP MGT
MGT
Front-EndBoard
Read-OutController
Data CombinerBoard
to otherROC's
to ABB
SFP
SFP MGT
MGT
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
57
PC
Some ConfigurationsSome Configurations
Detector FEB ROC ABB
PCDCB ABB
Detector FEB ROC
Detector FEB ROC
Detector FEB ROC
Detector FEB ROC
Minimal Configuration
Expandable Configuration
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
58
Real Hardware for ROC, ABB...Real Hardware for ROC, ABB...
ROC ABB
Build at KIP many concepts from Alice DCS Virtex-4 based (FX, PPC) Idea: FPGA+Linux core design
to be used in many apps.
→ SysCore
Build in Mannheim PCIexpress based Virtex-4 FX based
SFP PCIe mezz. board
connectors
shortcommercia
lfollows
59
Basic Components and Interfaces
Xilinx Virtex4 FPGA
320 up to 576 user I/Os
LAN interfaces
SD-Card connector
LAN, USB, JTAG programming capability via CPLD
RS232 interface
High Speed Serial Ports (MGTs)
DDR SDRAM
user definable I/O
Watchdogslide courtesy D.
Gottschalk
60
SysCore Features
(Remote) Configuration: via standard JTAG or select map configuration
via USB to JTAG bridge
via LAN Watchdog triggered
Radiation tolerant by fast configuration/reboot
Linux on FPGA
Fast Boot
All features together in one design
slide courtesy D. Gottschalk
61
Enhanced FPGA refresh technology for radiation tolerance
Mode 1: Initial configuration
Mode 2: Refresh of the configuration memory (SRAM) either: continuously overwriting with the correct configuration or: overwriting on demand (after error detection)
Mode 3: Error detection: Read back of the configuration memory Checking (compare or checksum) Virtex 4: internal Hamming functionality
Mode 4: Watchdog triggers start of the failsafe configuration if design fails.
slide courtesy D. Gottschalk
as was developed for the Alice DCS
board
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
62
Towards CBM-XYTERTowards CBM-XYTER
n-XYTER meets, surprisingly well, most of CBM requirements
However, it's not radiation hard WHY ??: it was intended for a thermal neutron environment thermal neutrons do very little damage (only process is capture....) we'll test what chip can withstand, but expectations are not high (0.35
μm !)
A new ASIC is therefore needed
Goals: Radiation-hard ( > 1 MRad, depends on MUCH detector layout)
Consequence: new technology Lower power. n-XYTER is 1.5 W/chip Integrated ADC, pure digital interface Meet requirements of silicon and gas detectors, look also beyond CBM.
Might well be again a family of chips with same architecture
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
63
Towards CBM-XYTERTowards CBM-XYTER
CBM STS CBM MuCh PANDA STS PANDA TPC
charge polarity +/- + or - +/- -
no channels 128 128 128 128
sparsification yes yes yes yes
self trigger yes yes yes yes
differential i/o yes yes yes yes
rate per channel
250 kHz 200 kHz 75 kHz 200 kHz
time stamp yes, few ns yes, few ns yes, 2 to 20 ns yes, 5 ns
double hit res 100 ns 100 ns 200 ns
energy r/o yes, 8bit yes yes, 10 bit 8 bit
ch fifo depth 16 16
rad. level 1 MRad 1 MRad 1 MRad 0.1 CMS STS
ch pitch 50 µm 100 – 200 µm 50 µm 100 – 200 µm
DC-bias, leakage
no no yes ? no
power high concern no concern 3 mW less concern
no of chips t.b.d. t.b.d. 5000 1000
CBM SiStrip CBM GEM PANDA SiStrip PANDA GEM
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
64
Towards CBM-XYTERTowards CBM-XYTER
CBM-XYTER development has started Bergen, Darmstadt, Heidelberg, Krakow, Mannheim, Moscow involved Will be based on experience with n-XYTER Currently many pre-studies:
Work out specifications ... Technology assessment
Lots of experience with 0.18 μm UMC in the developers group Questions: Do we really need enclosed geometry transistors ?
Future availability of 0.18 μm processes ? UMC or AMS ?What are advantages of 0.13 μm processes ? Can we
afford them ? Pre-amp design
best balance between detector thickness (affects sensor signal, capacitance but also radiation tolerance) and pre-amp power.
De-randomizer (FIFO) designHow many stages ? Likely more than 4 needed.
Work on building blocks, quite a few mini@SIC's and MMPW submitted Also some conceptual and system-design questions still to be
resolved Examples: 1. time sorting; 2. throttling
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
65
Self-Triggered FEE – Output Self-Triggered FEE – Output Format IFormat I
FEE
17 15 ...
68 34 ...
134 18 ...
135 19 ...
1234 33 ...
Time stamp counter is finitenumber of bits ! Will wrap !? How to express time ?
TimeStamp
Channeladdress
other values:amplitudespulse shape
Note:CBM is fixed targetexperiment. Long spills (~ 10 s).
Output of asingle
FEE chip
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
66
Handle the infinite Time AxisHandle the infinite Time Axis
Time
Epoch 1 Epoch 2 Epoch 3 Epoch 4
(2, 137 ns) (3, 314 ns)
1. Subdivide Time in Epochs
2. Express a timerelative to an epoch
3. Introduce Epoch Markers
A HitAn EpochMarker
practical epochlength about 10 μs
practical epochlength about 10 μs
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
67
Self-Triggered FEE – Output Self-Triggered FEE – Output Format IIFormat II
FEE
M 1
H 17 15 ...
H 68 34 ...
H 134 18 ...
H 135 19 ...
H 1234 33 ...
M 2
M 3
H 258 19 ...
Output of a FEE chipis a list of hits and
epoch markersEach hit has a
timestampplus other information
Recordtype
Hit
EpochMarker
Hit with effective timestamp (3, 258)
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
68
Self-Triggered FEE – Self-Triggered FEE – ConcentratorsConcentrators
FEE
M 1
H 17 15 ...
H 68 34 ...
H 134 18 ...
H 135 19 ...
H 1234 33 ...
M 2
M 3
H 258 19 ...
A concentrator merges
the data streams andeliminates redundant
epoch markers
FEE
M 1
H 18 2007 ...
M 2
H 589 2134 ...
M 3
H 258 2714 ...
M 1
H 17 15 ...
H 18 2007 ...
H 68 34 ...
H 134 18 ...
H 135 19 ...
H 1234 33 ...
M 2
H 589 2134 ...
M 3
H 258 19 ...
H 258 2714 ...
Seems prudent
to keep dataalways sorted
in time
time address
!! 2005 slide !!Where is herethe problem?
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
69
Where to re-sort data ?Where to re-sort data ?
Token ring scheme produces locally unsorted data Big advantage of token ring schema is the fair distribution of
bandwidth in case of local overload. The system is robust against hot channels ect.
n-XYTER doesn't even produce epoch markers the reading stage needs a clock cycle precise replica of the time stamp
counter to interpret the data correctly. That clearly only works if there are no additional elasticity buffers.
some form of 'time stamp expansion' and epoch marking needed re-sort data early ? Or use a form of fuzzy epoch boundaries ?
How to build concentrators ? conceptually easy of output bandwidth > sum of input bandwidth at least not feasible in early stages
read-out ASIC is in fact the first concentrator stage total bandwidth will always be smaller than sum of channel bandwidth
In other words: when and where to drop data in case of overload ?
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
70
Think Big or Throttling ?Think Big or Throttling ?
Conventional triggered systems handle overload gracefully there is some form of 'common' dead time in case of overload, whole events are discarded loosely speaking: one gets 100% of the data for 90% of the events
With self-triggered front-end the converse might happen data is dropped in an uncorrelated fashion where FIFOs overfill loosely speaking: one gets 90% of the data of 100% of the events quite unpleasant perspective
tracking systems might tolerate a few % data loss without major performance drop
in other detectors, like an ECAL, this leads immediately to a loss of efficiency
What is the proper solution ? Build and operate the system with 'enough' bandwidth headroom ?
Note: extracted beams from synchrotrons are notoriously non-poissonian ! Can that be handled with large-enough channel FIFO's alone ?
Or introduce some form of 'global throttling', to drop data in a correlated fashion. The time distribution system can easily distribute 'XOFF' and 'XON' messages Problem is to find an easy to evaluate throttle criterion.
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
71
CBM DAQ and Online Event CBM DAQ and Online Event SelectionSelection
More than 50% of total data volume might be relevant for first level event selection
Aim for simplicity Ansatz:
do (almost) all processing done after the build stage
Simple two layer approach:1. event building2. event processing
Other scenarios are possible, putting more emphasis on:
do all processing as early as possible
transfer data only then necessary
neededfor D needed
for J/μ
MVD,STS, and TRD data usedin first level event selection
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
72
Logical Data FlowLogical Data Flow
Concentrators:multiplex channelsto high-speed links
Time distribution
Buffers
Build Network
Processing resources forfirst level event selectionstructured in small farms
Connection to'high level'
selection processing
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
73
Different View on StructureDifferent View on Structure
detector
FEE buffer
readoutbuffer
switch
processorfarm
storage
L1trigger
HLT
CMS CBM
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
74
Bandwidth RequirementsBandwidth Requirements
Data flow:
~ 1 TB/sec
1st level selection:
~ 1014-15 operation/sec
Data flow:
few 10 GB/sec
to archive: few 1 GB/sec
Moore helps
Gilder helps
~ 100 Sub-Farms
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
75
Focus on BNet
Event Building
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
76
Fast Event Building NetworksFast Event Building Networks
Very tempting to look into InfiniBandInfiniBand used in many HPC clusters as interconnect offers large bandwidth at low CPU overhead
Available since some time SDR systems: 4 x 2.5 Gbps per link 1 GByte/sec bandwidth per port and
direction 288 port Switches
based on 24 port switch chips (288=24*12) non-blocking switch 288 GByte/sec switching bandwidth modest cost: ~ 400 EUR/port
Perspectives DDR just became available QDR likely to come
One 288 port QDR switch does 1 TByte/sec A few could do CBMnetwork adapter (HCA) small and low
power compared to 10 Gbit Ether
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
77
Why is InfiniBand fast & low Why is InfiniBand fast & low overhead ?overhead ? In conventional network interconnects the data is copied at least
once between user and kernel buffers This loads CPU and costs memory bandwidth
The way out avoid the copies do a DMA transfer from local user buffer to remote user buffer buzz-word: zero-copy remote direct memory access
zero-copy RDMA
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
78
Conventional NetworkingConventional Networking
library
application
library
application
driver
networkadapter
networkswitch
networkadapter
driver
User
Kernel
Hardware
data flow
control flow
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
79
Use Zero-Copy RDMAUse Zero-Copy RDMA
library
application
library
application
driver
networkadapter
networkswitch
networkadapter
driver
User
Kernel
Hardware
data flow
control flow
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
80
How does Zero-copy RDMA How does Zero-copy RDMA work ?work ? User side requirements
all buffers used for I/O must be locked in memory made known to the network adapter, which
stores virtual to physical mapping this setup involves OS and a driver
(expensive) Network adapter requirements
export two types of interfaces for kernel interactions for user interactions
the interface for user interactions is memory mapped replicated for each connected process mapped into the user process address space
Chain of events for zero-copy RDMA user process writes request descriptor directly
into the network adapter (mapped interface) adapter validates, builds scatter-gather list,
and transfers to/from user address space
library
application
networkadapter
driver
User
Kernel
Hardware
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
81
and Real World problems ...and Real World problems ...
Usually some application framework is used ROOT, XDAQ, ....
It usually has its own 'buffer management' Remember:
making a user buffer eligible for RDMA is quite expensive (locking, driver calls)
thus create/delete of a buffer is expensive A framework design with a very 'dynamic'
handling of buffers, which often creates / deletes buffers will not work well with RDMA.
Adapting the underlying buffer management in an existing framework can be quite cumbersome: ... basic execution logic problems methods may not be virtual ... successfully done to adapt XDAQ, the CMS
DAQ framework to uDAPL. (J.Adamczewski, GSI)
networkadapter
driver
User
Kernel
Hardware
application
library
framework
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
82
... and Real World throughput... and Real World throughputSmall InfiniBand test cluster at GSI 4 dual-dual Opteron server Mellanox MHES18-XT HCA (PCIe) Mellanox MTS2400 24X 24 port switch
data by J. Adamczewski, GSI
Test case XDAQ peer transport
via uDAPL (an RDMA access library for IB and iWRAP)
Results for large (100 kB) buffers
throughput approaches IB limit of 1 GB/sec
~30 kB buffers needed to get 500 MB/sec
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
83
Event BuildingEvent Building
networkswitch
node node node node node node node node
node node node node node node node node
Data collectors
Event selectors
Is classical all-to-all pattern susceptible to head of line blocking
1.a 1.b 1.c 1.d 1.e 1.f 1.g 1.h
1.a1.b1.c1.d1.e1.f
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
84
Event Building – SchedulingEvent Building – Scheduling
networkswitch
node node node node node node node node
node node node node node node node node
Data collectors
Event selectors
Is classical all-to-all pattern susceptible to head of line blocking some form of data flow orchestration needed
(aka scheduled transfers)
8.a 7.b 6.c 5.d 4.e 3.f 2.g 1.h
one of simplest schemes
is the barrel shifter
1.h 2.g 3.f 4.e 5.d 6.c 7.b 8.a
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
85
Event Building – SchedulingEvent Building – Scheduling
networkswitch
node node node node node node node node
node node node node node node node node
Data collectors
Event selectors
Is classical all-to-all pattern susceptible to head of line blocking some form of data flow orchestration needed
(aka scheduled transfers)
9.a 8.b 7.c 6.d 5.e 4.f 3.g 2.h
one of simplest schemes
is the barrel shifter
9.a 2.h 3.g 4.f 5.e 6.d 7.c 8.b
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
86
Event Building – SchedulingEvent Building – Scheduling
networkswitch
node node node node node node node node
node node node node node node node node
Data collectors
Event selectors
Is classical all-to-all pattern susceptible to head of line blocking some form of data flow orchestration needed
(aka scheduled transfers)
10.a 9.b 8.c 7.d 6.e 5.f 4.g 3.h
one of simplest schemes
is the barrel shifter
9.b 10.a 3.h 4.g 5.f 6.e 7.d 8.c
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
87
Event BuildingEvent Building Barrel-shift is in practice a too rigid scheme
e.g. works only when processing always takes same time Questions are:
How much scheduling is needed ? → Works chaotic transfer & many buffers ?
What is an 'optimal' scheme ? → Precise timing and sizing of each transfer
What is a simple and robust scheme ? → Get close to optimal with simple means
Our 4 node mini-cluster is nice to develop software Go to a larger cluster for real tests
Done: 24 nodes at FZ Karlsruhe Later: >100 nodes at Paderborn cluster
First results from FZK tests in March 2007 23 nodes, Opteron's with DDR InfiniBand HCA's and switches surprise: peer-to-peer bandwidth: 1160 MB/sec uni and 730 MB/sec bidirectional memory or PCIe is apparently here the limiting factor
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
88
First results from FZK tests in March 2007 (cont.) 23 nodes 'chaotic robin-round':
surprise Big buffers and many buffers isn't the best case one sees link level flow control and HCA request scheduling at
work
Event Building – Chaotic Event Building – Chaotic TransfersTransfers
MB/Sec 2 8 32
2k 260 256 256
8k 720 627 580
32k 625 587 568
128k 582 572 567
queue length
bu
ffer
size
Best throughput:
720 MB/sec/node
total 16.5 GB/sec
data by S. Linev, GSI
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
89
First results from FZK tests in March 2007 (cont.) 23 nodes strictly timed transfers:
now Buffer size and number now uncritical (big doesn't hurt, at
least...) Peak throughput same as before Tests on realistic size cluster needed
Event Building – Scheduled Event Building – Scheduled TransfersTransfers
MB/Sec 2 8 32
2k 255 271 272
8k 718 695 704
32k 699 698 696
128k 732 727 -
queue length
bu
ffer
size
Best throughput:
718 MB/sec/node
total 16.5 GB/sec
data by S. Linev, GSI
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
90
Focus on PNet
Event Selection
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
91
Event Selection ProcessingEvent Selection Processing
In CBM we'll have a tracking trigger certainly for open charm
requires reconstruction of tracks in STS of all events search for displaced vertices identification of open charm candidates
possibly also for muon identification again reconstruction of tracks in STS of all events forward tracking though muon absorbers
So we need high throughput STS tracking Two routes followed
Cellular automaton / Kalman filter tracker lots for floating point arithmetic better performance (simply because cuts can be narrower)
Hough tracker algorithm 'bit-oriented' and parallelizable can be implemented in programmable logic
Is it feasible to do CA/KF in L1
event selection ?
Does Hough hasthe required
performance ?
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
92
Game Processors as Supercomputers ?
Slide from CHEP'04 Dave McQueeney
IBM CTO US Federal
2005
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
93
The Cell ProcessorThe Cell Processor8 SPE: Synergistic Processing 8 SPE: Synergistic Processing
ElementsElements, each with 256 kB local memory 128 x 128 bit registers 4 SP floating ops/cycle (SIMD)
PPE: 'normal' PowerPC CPUPPE: 'normal' PowerPC CPU running Linux used to orchestrate the SPU's
Peak performancePeak performance 32 singe
precisionmultiply/add per clock cycle
runs at ~3 GHz
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
94
Cell is Motivation, not Target Cell is Motivation, not Target ArchitectureArchitectureWhat are the general aspects ?What are the general aspects ?
SIMD (single instruction – multiple data) available in essentially every mainstream system
SSE in IA-32 compatible systems Altivec in PPC systems
implemented with 128 bit registers 4 x single precision floating point 2 x double precision floating point also: 16 x 8bit, 8 x 16 bit, 4 x 32 bit int
is for example heavily usedin video codecs
QuestionQuestion:Can the CA/KF be 'SIMDizedSIMDized' ?
Answer:Answer:Yes: propagate 4 tracks parallelImplement such that scalar and SSE, Altivec, and SPU targets are supported
don't show this slidewhen IBM folks arein the audience !!
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
95
Cell is Motivation, not Target Cell is Motivation, not Target ArchitectureArchitectureWhat are the general aspects ?What are the general aspects ?
Use float, not double somewhat Cell special Cell is tuned for single precision, double is factor 10 slower the next generation will fix this (HPC people only look at DP flops....) However: when using SIMD, float gives twice the operation count:
4 x float per cycle 2 x double per cycle
QuestionQuestion:Can the CA/KF run with single precision
Answer:Answer:Yes: work on algorithm to make it numerically stable
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
96
Cell is Motivation, not Target Cell is Motivation, not Target ArchitectureArchitectureWhat are the general aspects ?What are the general aspects ?
Code and data in local store, 256 kByte sounds special, is in fact general
on normal PC's, performance of algorithm critically depends on memory locality. L1 or L2 cache missesare veryvery expensive.
A cache miss can easily cost 100 cycles Consequence:
random access in big tables is very expensivelookup tables are counter-productiveseveral dozen floating point operations mightare cheaper than a single cache miss
QuestionQuestion:Can the CA/KF fit in 256 kByte
Answer:Answer:Yes: Replace the field map by a parameterization
11-15 April 200711-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSIIRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI 9797
Kalman Filter for Track FitKalman Filter for Track Fit
detectorsmeasurements
ee--
(r, C)(r, C)
track parametersand errors
slide courtesy I. Kisel
11-15 April 200711-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSIIRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI 9898
The Kalman Filter for Track FitThe Kalman Filter for Track Fit
arbitrary large errors
non-homogeneous magnetic fieldas large map
multiple scattering in
material
small errors
weight for update
>>> 256 KB >>> 256 KB of Local Storeof Local Store
not enough accuracy not enough accuracy in single precisionin single precision
slide courtesy I. Kisel
11-15 April 200711-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSIIRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI 9999
Modifications of the Fitting AlgorithmModifications of the Fitting Algorithm
• The initial track parameters are directly estimated from the input data. • The propagation step is performed directly from measurement to measurement without intermediate steps. • Matrix multiplications have been replaced by direct operations on only non-trivial matrix elements.• Most loops have been unrolledloops have been unrolled in order to provide additional instructions for interleaving.• All branches have been eliminatedbranches have been eliminated from the algorithm to avoid branch misprediction penalty. • Calculations have been reorderedCalculations have been reordered for better use of the processors pipeline.
slide courtesy I. Kisel
11-15 April 200711-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSIIRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI 100100
SPE StatisticsSPE Statistics
slide courtesy I. Kisel
11-15 April 200711-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSIIRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI 101101
Modifications of the Fitting AlgorithmModifications of the Fitting AlgorithmIn
tel P4
Inte
l P4
Cell
Cell
slide courtesy I. Kisel
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
102
The CBM CollaborationThe CBM CollaborationChina:USTC HefeiCCNU WuhanCroatia: University of SplitRBI, ZagrebCyprus: Nikosia Univ. Czech Republic:Techn. Univ. PragueCAS, RezFrance: IPHC StrasbourgGermany: GSI, DarmstadtFZ Dresden-RossendorfUniv. Heidelberg, Phys. Inst.
47 institutions> 350 members
Univ. HD, Kirchhoff Inst. Univ. FrankfurtUniv. KaiserslauternUniv. Mannheim Univ. MünsterHungaria:KFKI BudapestEötvös Univ. BudapestIndia:IOP BhubaneswarUniv. ChandigarhIlT KharagpurVECC KolkataSAHA KolkataUniv. VaranasiKorea:Korea Univ. Seoul
Pusan National Univ.Norway:Univ. BergenPoland:Silesia Univ. KatowiceKrakow Univ.Nucl. Phys. Inst. KrakowWarsaw Univ.Portugal: LIP CoimbraRomania: NIPNE BucharestRussia:LHE, JINR DubnaLIT, JINR DubnaLPP, JINR DubnaPNPI Gatchina
ITEP MoscowMEPHI MoscowKurchatov Inst. MoscowSINP, Moscow State Univ. Obninsk State Univ.IHEP ProtvinoKRI, St. PetersburgSt. Petersburg Polytec. U.INR TroitzkUkraine: Shevchenko Univ. , Kiev
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
103
The EndThe End
Thanks for Thanks for your attentionyour attention
We acknowledge the support of the European Community-Research Infrastructure Activity under the
FP6 "Structuring the European Research Area" programme (HadronPhysics, contract number RII3-CT-
2004-506078).
11-15 April 2007 IRTG Lecture Week 2007, Bergen Norway -- Walter F.J. Müller, GSI
104
The EndThe End
Thanks for Thanks for your attentionyour attention