luciano ristorilss.fnal.gov/conf/c871111/p387.pdf · luciano ristori istituto nazionale di fisica...

THE PATTERN MATCHING MACHINE

Mauro Dell'Orso

Universita di Pisa

Pisa - Italy

Luciano Ristori

Istituto Nazionale di Fisica Nucleare

Pisa - Italy

Abstract

We describe a very fast algori thm for track finding based on

pattern matching with a successive approximation strategy. We

discuss how this algorithm can be efficiently implemented on a

massively parallel archi tecture. Maximum speed can be achieved

implementing a large array of custom VLSI chips developed

specifically for this purpose.

Introduction

The analysis of data collected by modern High Energy Physics

experiments often requires a lot of computing power. One of the

most demanding tasks is usually track reconstruction.

The quality of the results from present and future experiments

depends to some extent on the implementation of fast and efficient

track finding algorithms. The detection of heavy flavour

production, for example, depends heavily on the reconstruction of

secondary vertices generated by the decay of long lived particles,

which in turn implies the reconstruction of the majority of the

tracks in every event.

-387-

The detector

In this discussion we will assume that our detector consists of a

number of parallel layers, each layer being segmented into a number

of bins. When charged particles cross the detector they hit one

bin per layer. This is, of course, an abstraction but it is

adequate to discuss the basic features of the track finding

algorithm and it is meant to represent a whole class of real

detectors (drift chambers, Silicon microstrip detectors etc.). For

each event we know which bins have been hit and from this

information we want to reconstruct the trajectories of all the

particles. We call this process track finding.

The pattern bank

The problem of track finding can be solved, at least conceptually,

by a "brute force" approach. We consider all the possible tracks

that go through our detector. Each track generates a hit pattern.

Since the detector has a finite spatial resolution (bin size), many

different tracks generate the same hit pattern. The number of

different hit patterns generated by all the tracks is finite and we

can imagine to store all of them in a sufficently large memory. The

collection of all these patterns defines both the space of the

tracks we are looking for and how they appear in the detector: we

will refer to this collection as the pattern bank.

For each event, a number of tracks go through the detector and a

particular configuration of hits is thus generated: we will refer

eo this configuration as the event. A conceptually simple way to

perform the track finding algorithm is to scan the pattern bank

and compare each pattern to the event. A track candidate is found

whenever all the hits in the pattern are present in the event.

Going through the totality of the patterns in the bank yields a

number of track candidates.

-388-

---

.....

...

..

-----

The number of different patterns to be stored in the bank depends

on the detector granularity and geometry, and on the

characteristics of the tracks we want to detect. As an example we

will consider the situation shown in fig.l: the detector consists

of four parallel planes and each plane is segmented into n bins. We

consider all straight tracks crossing all four planes. We want to

estimate the number of different patterns (Np ) that can be

generated by a single track.

A fairly good approximation is

(1 )

The reason for this is explained in fig.2. By selecting one bin in

plane 1 and one bin in plane 4 we define a road: there are n 2

different roads. From fig. 2 it should be obvious that all the

tracks belonging to a road generate three different patterns

corresponding to the three subroads delimited by dotted lines.

Expression (1) can be generalized as follows:

(2)

where

Np = number of patterns

m = number of detector planes

n = number of bins/plane

The main problem with this approach is that the number of patterns

to store in the bank for a practical situation may be very large.

For example, if we consider 4 planes with 256 bins/plane we obtain:

-389-

To deal with such a large number of patterns we need a lot of

memory and we e:-:pect the process of matching all the patterns

sequentially to be very time consuming. But we can make this

process much ·faster by structuring the pattern bank as explained in

the following sections.

Successive approximations

Fig.3 shows an event wi th four tracks crossing four parallel

layers. From top to bottom the spatial resolution of the detector

is improved by a factor two every step. The image is confused at

the beginning and becomes clearer as the resolution improves.

The basic idea is to follow a successive approximation strategy and

apply out pattern matching algorithm to the same event seen with

increasing spatial resolution. Lower spatial resolution is

simulated by logically GRing adjacent bins.

Fig.4 shows how a single track is seen when each detector plane is

considered as being only two bins. In this case the total number of

patterns compatible with a straight line is eight. Pattern number 3

is the one that matches. Since we have one track candidate at this

level of spatial resolution, we now double the number of

distinguishable bins in each plane and proceed to match the four

patterns shown in fig. 5. Pattern number 3 in fig. 4 is said to

generate the four sub-patterns in fig.5. Since we still have one

track candidate we go on halving the bin size. This process is

iterated until we either reach the actual resolution of the

detector (success) or we are left with no track candidate

(failure) .

Tree search

--

-

--

-

-

The pattern bank can be arranged in a tree structure as shown in

fig.6. Increasing depth corresponds to increasing spatial -

resolution. Each node represents one pattern and it is linked to

--390- -

all the subpatterns it generates when the spatial resolution is

improved by a factor two.

The pattern matching process can be implemented as a tree search.

We scan all the patterns hanging from one node and every pattern

that matches correctly with the current event is considered a track

candidate and enables the search at the next deeper level in the

tree. A track is found whenever this search reaches the bottom of

the tree.

This tree-search is obviously much faster than a purely sequential

search. The average number of patterns one has to examine to find a

track is given by:

Nm = k*NI (3)

where NI is the total number of depth levels in the tree and k is

the average number of patterns hanging from a single node. k ranges

typically from 4 to 8 depending on the particular geometrical

arrangement.

We also have:

and therefore:

k*log2(n)

where k does not depend significantly on n.

Expression (5) is to be compared to

-391-

(4)

(5)

(6)

which holds for a purely sequential search.

Since in most applications n is rather large (100+1000), the

advantage of the successive approximation approach is enormous.

Missing hits

In real experimental situations each plane detects particles with

an efficiency wich is less than one. This means that there is a

finite probability that some of the hits, in a given track, will be

missing. Usually the probability of having all the hits (no miss)

is actually rather small. Therefore we must be prepared to accept

cases where we have only a partial pattern match. For example, if

we have a detector with eight layers, we might accept also tracks

that match only seven hits or maybe six.

The tree-search algorithm may be easily modified to accept partial

pattern matches: we need to modify only the way each pattern is

compared to the event, leaving the data base structure and the

visiting stategy unchanged.

Sequential Implementation

The tree search algorithm may be implemented on a sequential

machine. In this case we believe that a depth-first method of

visiting the tree is preferable.

Every time we find a match we go down one level and start scanning

all the sub-patterns hanging from that node. If no match is found

at a given level, we go up and resume the scan of the patterns at

the nexh higher level. Every time we hit the bottom of the tree we

have a track candidate, when we go back to the root the search is

complete.

Parallel Implementation

-392-

-

---

-

-

The tree search algorithm lends itself to be implemented using a

high degree of parallelism.

We may imagine to have one process (parent searching all the

patterns hanging from one node and matching them to the current

event. Every time a match is found a new process (son) is started

in parallel to carryon the search at a lower level. This, in turn,

will start other processes and so on. The parent process does not

have to wait for all its sons to complete: it may terminate as

soon as the pattern list at the relevant level is exhausted.

Content Addressable Memory

The pattern matching algorithm described in the previous sections,

can be easily implemented on a parallel archi tecture because

different patterns can be compared to the event independently and

in any order; in particular, any number of comparisons can be

performed in parallel provided that this is allowed by the

hardware.

If our main goal is speed, we can push the degree of parallelism to

the limit and try to compare all the patterns to the event at once.

To do this we need a special type of content addressable memory

(CAM) to store the pattern bank: each cell of the memory is big

enough to hold one pattern and has enough intelligence built in to

compare its contents to the event. A possible architecture for this

device is shown in fig. 7. Each row represents one cell and is

designed to hold one pattern. Each cell is structured into a number

of words, one word per detector layer (four of them are shown).

Each word holds the address of one hit on the corresponding layer.

All the words in a cell define a pattern by specifying one hit per

layer. Each word must be big enough to identify one bin on that

layer. The Data Bus connects all the words in the same layer; this

bus is used to load the pattern data into the memory cells during

the initialization phase: this is done once for all. During normal

operation, for every event, the coordinates of all the hits in. each

-393-

layer are transmitted one after the other on the corresponding Data

Bus; all the words continuosly compare their contents to what is on

the bus and if a match is found the corresponding flip-flop (FF) is

set. After all the hits have been transmitted, any cell that has

all the flip-flops set is a track candidate because all the hits

that define that pattern are present in the event. The addresses of

all the track candidates are transmitted sequentially on the Output

Bus. If we want to account for inefficiencies, we will set a

threshold on the number of flip-flips we require to be set in a

cell before we call it a track candidate.

Typical applications require a number of cells of the order of lOOk

or more. The amount of logic involved rules out the possibility of

using standard components and requires the development of

appropriate ASIC's (Application Specific Integrated Circuits).

Using the present VLSI technology (Very Large Scale Integration) we

believe we can design a CMOS chip with 256 cells. We could then put

together a system with lOOk cells using only 400 Chips. Each chip

will include the content addressable memory and the readout logic.

The input/output architecture of the chip can be designed so that

many of them can be easily put together to implement an arbitrarily

large pattern bank.

Combining two approaches

Nle CAM approach is very fast but the amount of memory needed grows

very rapidly with the number of channels in the detector. Finer

resolution means more hardware. The tree-search approach instead is

slower, but the same machine is capable of handling any

granularity. In this case finer resolution means more time.

We will try to combine these two approaches and build a two stage

machine. The first stage is implemented with a bank of content

-394-

-

-

-

-

...

addressable memory and finds tracks with limited spatial

resolution; track candidates from the first stage are passed to the

second stage. The second stage is implemented with an array of

processors which run the tree-search algorithm to any degree of

resolution desired. Note that the tree-search does not have to

start from the root but from the appropriate level corresponding to

the degree of resolution of the track candidates output from the

first stage. The search time is thus proportionally reduced.

We believe that for any given experimental situation the right

compromise can be found in the sharing of the track finding task

between the first and the second stage. If time is critical we can

build a big content addressable memory and try to perform most or

all of the track finding in the first stage. If time is not so

critical we can save a lot in hardware (and money) by letting the

second stage do most of the work.

-395-

I~

'00\I

Pattern 1,

Pattern 3

Pattern 2

Pattern 4

• • •

" .". c::::--',-=====r-i--:--;~-~L-__~ " I

I

'I I· , " [" ii,I 'I I. ~ _

, " 'I_____.J. ~,---. "

, ,• •

----"'TI---T!-:-;-=i ; II ", ," '... "

• • "I ---------T-----------': : r I::J I: "

, I '.

Fig. 2: Roads and Subroads

4

Fig. 1: The pattern bank

I f ~ • I

IVJ\0--.]I

8

4

2

IT TTT rl

I I I rill

111111-.

ITTl IT I I I I I I I I I

1111 1111I11111I 11II

I· r r r I III I 11 I I I I I I I I

I I _ T _ I I I] I TTl I II I I I I I I I J

I J J (-1 I I I I I linD I I ( I J I I

I111 1IIInTJlrl~III_IIIJ

111111 1111111_1"11_""

1

Fig. 3: Successive Approximations

1,,'::'i'it~j'Ii'/(i'/J I

(7,fc.;y;W7?t I

ti'.Ji!M?U:::] I

ft'eWUliii:"iP' ,

1

I 1/''~'71trr/',/1Ir11

W'¥iim:-«r!il J

F1f!ll1ll!j1lTJIJ I

Rii;y/7;';'li7-a I

5

1'iI4l'llll/il///1l ,

tIVlf,f'] I

f!l!!1!!ilIf/ltl/) I

I f{i%"Wltlrll

2

I 'WIOUlHIr'/:,jI

I t/mUfijliiiiZ'J

fl!!1GIj!!l!!!!/d I

~ I

6

fl/!ljl!jUI/II/1 I

17JZZ'!ff,'l7/Jjl I

, l'i«??lt~'lj««l1

3

I fit.r/lllljlj((,j1

I ti7iffm::iUJ;';J

, fJ"'((lIW/uplt

tiZZ7iWI,W(J I

7

fffl!11!Il!ll!!l/J I

I f/!/PIfBi/f/i/J

r:==J/7Il*UP7//-iJ

I 1?i'7J1i!//«U')I

4'

I fll{lj/1!lw:.[l4

I ['lia'V;'/i::Y/if

I f!//!::'ri'fl'fill

I fij/;·;·iUj':;;'/;'74

8

Fig. 4: First Level Patterns

Depth 0

I I I I fM4.4i'I?WIJ.'·.·. '~'.'. '.=. '. '.': '. '. ';'. '. '.:.'. '.:. '....:.....;

I I I I f/!.j,.*':i%·:\·J}:,%·:,f.S(:.·;·_·.·.=.·_·:~·.·.·:·· ,',", 1r;. ". ". ~. "..". ,0;"." ,. ~ .•.: ... . _..... ,. ,. . ,. ..............'.. .' WS.liif4:ft;z·Zf I I I I

3 I 4

Depth 1

Depth 2

Depth 3I I I

I I I

I I , I

I I I

2

I I I

I I I

I I I

I I I

I I J

I I I

I J I

I J I

1

:. " ": '. '. ';'. '. ',:..... :..... ': .... ',:'. " ',:..,.. f}'!}MIfMh}t!h}M~ I I I

I 1---:----:-'

I I I

I I I

I I I

IW\000I

Fig. 5: Subpatterns Fig. 6: Tree Structure

I I I I I I • I I

Layer 1 Layer 2 Layer 3 Layer 4

Cell 0

Cell 1

0Cell 2 c:

r+-0c:

Cell 3r+

OJc:

Cell 4en

Cell 5

Data Bus 1 Data Bus 2 Data Bus 3 Data Bus 4

Fig.7: Content Addressable Memory

-399-

---

-

-

--

-

--

luciano ristorilss.fnal.gov/conf/c871111/p387.pdf · luciano ristori istituto nazionale di fisica...

Documents