luciano ristorilss.fnal.gov/conf/c871111/p387.pdf · luciano ristori istituto nazionale di fisica...
TRANSCRIPT
THE PATTERN MATCHING MACHINE
Mauro Dell'Orso
Universita di Pisa
Pisa - Italy
Luciano Ristori
Istituto Nazionale di Fisica Nucleare
Pisa - Italy
Abstract
We describe a very fast algori thm for track finding based on
pattern matching with a successive approximation strategy. We
discuss how this algorithm can be efficiently implemented on a
massively parallel archi tecture. Maximum speed can be achieved
implementing a large array of custom VLSI chips developed
specifically for this purpose.
Introduction
The analysis of data collected by modern High Energy Physics
experiments often requires a lot of computing power. One of the
most demanding tasks is usually track reconstruction.
The quality of the results from present and future experiments
depends to some extent on the implementation of fast and efficient
track finding algorithms. The detection of heavy flavour
production, for example, depends heavily on the reconstruction of
secondary vertices generated by the decay of long lived particles,
which in turn implies the reconstruction of the majority of the
tracks in every event.
-387-
The detector
In this discussion we will assume that our detector consists of a
number of parallel layers, each layer being segmented into a number
of bins. When charged particles cross the detector they hit one
bin per layer. This is, of course, an abstraction but it is
adequate to discuss the basic features of the track finding
algorithm and it is meant to represent a whole class of real
detectors (drift chambers, Silicon microstrip detectors etc.). For
each event we know which bins have been hit and from this
information we want to reconstruct the trajectories of all the
particles. We call this process track finding.
The pattern bank
The problem of track finding can be solved, at least conceptually,
by a "brute force" approach. We consider all the possible tracks
that go through our detector. Each track generates a hit pattern.
Since the detector has a finite spatial resolution (bin size), many
different tracks generate the same hit pattern. The number of
different hit patterns generated by all the tracks is finite and we
can imagine to store all of them in a sufficently large memory. The
collection of all these patterns defines both the space of the
tracks we are looking for and how they appear in the detector: we
will refer to this collection as the pattern bank.
For each event, a number of tracks go through the detector and a
particular configuration of hits is thus generated: we will refer
eo this configuration as the event. A conceptually simple way to
perform the track finding algorithm is to scan the pattern bank
and compare each pattern to the event. A track candidate is found
whenever all the hits in the pattern are present in the event.
Going through the totality of the patterns in the bank yields a
number of track candidates.
-388-
---
.....
...
..
-----
The number of different patterns to be stored in the bank depends
on the detector granularity and geometry, and on the
characteristics of the tracks we want to detect. As an example we
will consider the situation shown in fig.l: the detector consists
of four parallel planes and each plane is segmented into n bins. We
consider all straight tracks crossing all four planes. We want to
estimate the number of different patterns (Np ) that can be
generated by a single track.
A fairly good approximation is
(1 )
The reason for this is explained in fig.2. By selecting one bin in
plane 1 and one bin in plane 4 we define a road: there are n 2
different roads. From fig. 2 it should be obvious that all the
tracks belonging to a road generate three different patterns
corresponding to the three subroads delimited by dotted lines.
Expression (1) can be generalized as follows:
(2)
where
Np = number of patterns
m = number of detector planes
n = number of bins/plane
The main problem with this approach is that the number of patterns
to store in the bank for a practical situation may be very large.
For example, if we consider 4 planes with 256 bins/plane we obtain:
-389-
To deal with such a large number of patterns we need a lot of
memory and we e:-:pect the process of matching all the patterns
sequentially to be very time consuming. But we can make this
process much ·faster by structuring the pattern bank as explained in
the following sections.
Successive approximations
Fig.3 shows an event wi th four tracks crossing four parallel
layers. From top to bottom the spatial resolution of the detector
is improved by a factor two every step. The image is confused at
the beginning and becomes clearer as the resolution improves.
The basic idea is to follow a successive approximation strategy and
apply out pattern matching algorithm to the same event seen with
increasing spatial resolution. Lower spatial resolution is
simulated by logically GRing adjacent bins.
Fig.4 shows how a single track is seen when each detector plane is
considered as being only two bins. In this case the total number of
patterns compatible with a straight line is eight. Pattern number 3
is the one that matches. Since we have one track candidate at this
level of spatial resolution, we now double the number of
distinguishable bins in each plane and proceed to match the four
patterns shown in fig. 5. Pattern number 3 in fig. 4 is said to
generate the four sub-patterns in fig.5. Since we still have one
track candidate we go on halving the bin size. This process is
iterated until we either reach the actual resolution of the
detector (success) or we are left with no track candidate
(failure) .
Tree search
--
-
--
-
-
The pattern bank can be arranged in a tree structure as shown in
fig.6. Increasing depth corresponds to increasing spatial -
resolution. Each node represents one pattern and it is linked to
--390- -
all the subpatterns it generates when the spatial resolution is
improved by a factor two.
The pattern matching process can be implemented as a tree search.
We scan all the patterns hanging from one node and every pattern
that matches correctly with the current event is considered a track
candidate and enables the search at the next deeper level in the
tree. A track is found whenever this search reaches the bottom of
the tree.
This tree-search is obviously much faster than a purely sequential
search. The average number of patterns one has to examine to find a
track is given by:
Nm = k*NI (3)
where NI is the total number of depth levels in the tree and k is
the average number of patterns hanging from a single node. k ranges
typically from 4 to 8 depending on the particular geometrical
arrangement.
We also have:
and therefore:
k*log2(n)
where k does not depend significantly on n.
Expression (5) is to be compared to
-391-
(4)
(5)
(6)
which holds for a purely sequential search.
Since in most applications n is rather large (100+1000), the
advantage of the successive approximation approach is enormous.
Missing hits
In real experimental situations each plane detects particles with
an efficiency wich is less than one. This means that there is a
finite probability that some of the hits, in a given track, will be
missing. Usually the probability of having all the hits (no miss)
is actually rather small. Therefore we must be prepared to accept
cases where we have only a partial pattern match. For example, if
we have a detector with eight layers, we might accept also tracks
that match only seven hits or maybe six.
The tree-search algorithm may be easily modified to accept partial
pattern matches: we need to modify only the way each pattern is
compared to the event, leaving the data base structure and the
visiting stategy unchanged.
Sequential Implementation
The tree search algorithm may be implemented on a sequential
machine. In this case we believe that a depth-first method of
visiting the tree is preferable.
Every time we find a match we go down one level and start scanning
all the sub-patterns hanging from that node. If no match is found
at a given level, we go up and resume the scan of the patterns at
the nexh higher level. Every time we hit the bottom of the tree we
have a track candidate, when we go back to the root the search is
complete.
Parallel Implementation
-392-
-
---
-
-
The tree search algorithm lends itself to be implemented using a
high degree of parallelism.
We may imagine to have one process (parent searching all the
patterns hanging from one node and matching them to the current
event. Every time a match is found a new process (son) is started
in parallel to carryon the search at a lower level. This, in turn,
will start other processes and so on. The parent process does not
have to wait for all its sons to complete: it may terminate as
soon as the pattern list at the relevant level is exhausted.
Content Addressable Memory
The pattern matching algorithm described in the previous sections,
can be easily implemented on a parallel archi tecture because
different patterns can be compared to the event independently and
in any order; in particular, any number of comparisons can be
performed in parallel provided that this is allowed by the
hardware.
If our main goal is speed, we can push the degree of parallelism to
the limit and try to compare all the patterns to the event at once.
To do this we need a special type of content addressable memory
(CAM) to store the pattern bank: each cell of the memory is big
enough to hold one pattern and has enough intelligence built in to
compare its contents to the event. A possible architecture for this
device is shown in fig. 7. Each row represents one cell and is
designed to hold one pattern. Each cell is structured into a number
of words, one word per detector layer (four of them are shown).
Each word holds the address of one hit on the corresponding layer.
All the words in a cell define a pattern by specifying one hit per
layer. Each word must be big enough to identify one bin on that
layer. The Data Bus connects all the words in the same layer; this
bus is used to load the pattern data into the memory cells during
the initialization phase: this is done once for all. During normal
operation, for every event, the coordinates of all the hits in. each
-393-
layer are transmitted one after the other on the corresponding Data
Bus; all the words continuosly compare their contents to what is on
the bus and if a match is found the corresponding flip-flop (FF) is
set. After all the hits have been transmitted, any cell that has
all the flip-flops set is a track candidate because all the hits
that define that pattern are present in the event. The addresses of
all the track candidates are transmitted sequentially on the Output
Bus. If we want to account for inefficiencies, we will set a
threshold on the number of flip-flips we require to be set in a
cell before we call it a track candidate.
Typical applications require a number of cells of the order of lOOk
or more. The amount of logic involved rules out the possibility of
using standard components and requires the development of
appropriate ASIC's (Application Specific Integrated Circuits).
Using the present VLSI technology (Very Large Scale Integration) we
believe we can design a CMOS chip with 256 cells. We could then put
together a system with lOOk cells using only 400 Chips. Each chip
will include the content addressable memory and the readout logic.
The input/output architecture of the chip can be designed so that
many of them can be easily put together to implement an arbitrarily
large pattern bank.
Combining two approaches
Nle CAM approach is very fast but the amount of memory needed grows
very rapidly with the number of channels in the detector. Finer
resolution means more hardware. The tree-search approach instead is
slower, but the same machine is capable of handling any
granularity. In this case finer resolution means more time.
We will try to combine these two approaches and build a two stage
machine. The first stage is implemented with a bank of content
-394-
-
-
-
-
...
addressable memory and finds tracks with limited spatial
resolution; track candidates from the first stage are passed to the
second stage. The second stage is implemented with an array of
processors which run the tree-search algorithm to any degree of
resolution desired. Note that the tree-search does not have to
start from the root but from the appropriate level corresponding to
the degree of resolution of the track candidates output from the
first stage. The search time is thus proportionally reduced.
We believe that for any given experimental situation the right
compromise can be found in the sharing of the track finding task
between the first and the second stage. If time is critical we can
build a big content addressable memory and try to perform most or
all of the track finding in the first stage. If time is not so
critical we can save a lot in hardware (and money) by letting the
second stage do most of the work.
-395-
I~
'00\I
Pattern 1,
Pattern 3
Pattern 2
Pattern 4
• • •
" .". c::::--',-=====r-i--:--;~-~L-__~ " I
I
'I I· , " [" ii,I 'I I. ~ _
, " 'I_____.J. ~,---. "
, ,• •
----"'TI---T!-:-;-=i ; II ", ," '... "
• • "I ---------T-----------': : r I::J I: "
, I '.
Fig. 2: Roads and Subroads
4
Fig. 1: The pattern bank
I f ~ • I
IVJ\0--.]I
8
4
2
IT TTT rl
I I I rill
111111-.
ITTl IT I I I I I I I I I
1111 1111I11111I 11II
I· r r r I III I 11 I I I I I I I I
I I _ T _ I I I] I TTl I II I I I I I I I J
I J J (-1 I I I I I linD I I ( I J I I
I111 1IIInTJlrl~III_IIIJ
111111 1111111_1"11_""
1
Fig. 3: Successive Approximations
1,,'::'i'it~j'Ii'/(i'/J I
(7,fc.;y;W7?t I
ti'.Ji!M?U:::] I
ft'eWUliii:"iP' ,
1
I 1/''~'71trr/',/1Ir11
W'¥iim:-«r!il J
F1f!ll1ll!j1lTJIJ I
Rii;y/7;';'li7-a I
5
1'iI4l'llll/il///1l ,
tIVlf,f'] I
f!l!!1!!ilIf/ltl/) I
I f{i%"Wltlrll
2
I 'WIOUlHIr'/:,jI
I t/mUfijliiiiZ'J
fl!!1GIj!!l!!!!/d I
~ I
6
fl/!ljl!jUI/II/1 I
17JZZ'!ff,'l7/Jjl I
, l'i«??lt~'lj««l1
3
I fit.r/lllljlj((,j1
I ti7iffm::iUJ;';J
, fJ"'((lIW/uplt
tiZZ7iWI,W(J I
7
fffl!11!Il!ll!!l/J I
I f/!/PIfBi/f/i/J
r:==J/7Il*UP7//-iJ
I 1?i'7J1i!//«U')I
4'
I fll{lj/1!lw:.[l4
I ['lia'V;'/i::Y/if
I f!//!::'ri'fl'fill
I fij/;·;·iUj':;;'/;'74
8
Fig. 4: First Level Patterns
Depth 0
I I I I fM4.4i'I?WIJ.'·.·. '~'.'. '.=. '. '.': '. '. ';'. '. '.:.'. '.:. '....:.....;
I I I I f/!.j,.*':i%·:\·J}:,%·:,f.S(:.·;·_·.·.=.·_·:~·.·.·:·· ,',", 1r;. ". ". ~. "..". ,0;"." ,. ~ .•.: ... . _..... ,. ,. . ,. ..............'.. .' WS.liif4:ft;z·Zf I I I I
3 I 4
Depth 1
Depth 2
Depth 3I I I
I I I
I I , I
I I I
2
I I I
I I I
I I I
I I I
I I J
I I I
I J I
I J I
1
:. " ": '. '. ';'. '. ',:..... :..... ': .... ',:'. " ',:..,.. f}'!}MIfMh}t!h}M~ I I I
I 1---:----:-'
I I I
I I I
I I I
IW\000I
Fig. 5: Subpatterns Fig. 6: Tree Structure
I I I I I I • I I
Layer 1 Layer 2 Layer 3 Layer 4
Cell 0
Cell 1
0Cell 2 c:
r+-0c:
Cell 3r+
OJc:
Cell 4en
Cell 5
Data Bus 1 Data Bus 2 Data Bus 3 Data Bus 4
Fig.7: Content Addressable Memory
-399-
---
-
-
--
-
--