speed-up of the ring recognition algorithm semeon lebedev gsi, darmstadt, germany and lit jinr,...
TRANSCRIPT
Speed-up of the ring recognition algorithm
Semeon Lebedev GSI, Darmstadt, Germany and LIT JINR, Dubna, Russia
Gennady OsoskovLIT JINR, Dubna, Russia
Speed up of the ring finder Dubna, 21.05.2009 2
Motivation
• Fast algorithm -> less computers requirements• Possibility to use on-line reconstruction• Many cores CPUs -> algorithms can be parallelized
I. Kisel, March 2009,CBM Coll. Meeting
Speed up of the ring finder Dubna, 21.05.2009 3
Ring recognition algorithm
Global search. Filter: algorithm compares all ring-candidates and chooses only good rings, rejecting clone and fake rings.
Standalone ring finder.
Local search of ring-candidates, based on local selection of hits and Hough Transform.
Two steps:
99%
1%
Time consumption
Speed up of the ring finder Dubna, 21.05.2009 4
Ring recognition algorithm, local searchPreliminary selection of hitsPreliminary selection of hits Histogram of ring centersHistogram of ring centers
HoughHoughTransformTransform
Ellipse fitterEllipse fitter
Ring quality Ring quality calculation calculation
Remove hits Remove hits of found ringof found ring (only best (only best matched hits)matched hits)
Ring arrayRing array
Speed up of the ring finder Dubna, 21.05.2009 5
Time consumption
Define local area and hits Hough Transform Peak finder
30% 69% 1%Timeconsumption
• Hits search• Arrays initialization
• Triple loop of ring parameters calculation
• peak finding in 2D and 1D array
Optimize hits search and arrays sizes and dimensions, remove dynamic memory allocation
Optimize calculations inside loops, decrease combinatory
Where?
Speed up of the ring finder Dubna, 21.05.2009 6
Optimization of Hough Transform
• Divide hits into a several parts• Make Hough Transform of each part independently
First part of hits Second part of hits
Hough Transform Hough Transform
Sum up histogram
Speed up of the ring finder Dubna, 21.05.2009 7
Optimization of Hough Transform: SIMD and SSE2
SSE 128-bit registers can represent:• sixteen 8-bit signed or unsigned chars,• eight 16-bit signed or unsigned shorts,• four 32-bit integers, or• four 32-bit floating point variables.
128 bit register
Four concurrent add operations
Algorithm must work with single precision type (float)
Speed up of the ring finder Dubna, 21.05.2009 8
Ring finder and SIMD
SIMD version of CalculateRingParameters(x[3], y[3], &xc, &yc, &r) was implemented.
CalculateRingParameters(x[3], y[3], &xc, &yc, &r),where x, y, xc, yc, r are floats
CalculateRingParameters(xv[3], yv[3], &xcv, &ycv, &rv),where xv, yv, xcv, ycv, rv are F32vec4
Speed up of the ring finder Dubna, 21.05.2009 9
Optimization and performance
Time per 1 events, ms 750.4 673.4 632.0 613.0 507.4
Comments Initial version Double->floatRefactoring + base class for
HT
Remove modf()
SIMD
533.6 413.0 299.4 167.6 146.1 133.4
Hits presearch
Remove dynamic allocation
SIMDDivide hits into several parts
SIMDCalc. ring
params inside loop
115.6 97.0Change
division to multiplication
RF parameters optimization
Speed up factor: 7.7Processor Intel Pentium Core2 6400 2.13 GHz
Speed up of the ring finder Dubna, 21.05.2009 10
Electron ring finding efficiency
Au-Au central collision at 25 AGev plus 5e+ and 5e- Au-Au central collision at 25 AGev plus 5e+ and 5e- Compact RICH geometryCompact RICH geometry
Speed up of the ring finder Dubna, 21.05.2009 11
Summary
• Ring finder was significantly optimized in terms of calculation speed without loosing an efficiency
• Next step:– HT parameters optimization– Parallelization on multi core CPU– Continue investigation of SIMD version