optical content-addressable parallel processor ...hpcat/papers/ao_06_1992.pdf · optical...

Optical content-addressable parallel processor:architecture, algorithms, and design concepts

Ahmed Louri

Associative processing based on content-addressable memories has been argued to be the natural solutionfor nonnumerical information processing applications. Unfortunately, the implementation requirementsof these architectures when one uses conventional electronic technology have been cost prohibitive;therefore associative processors have not been realized. Instead, software methods that emulate thebehavior of associative processing have been promoted and mapped onto conventional location-addressable systems. However, this does not bring about the natural parallelism of associative processing,namely, the ability to access many data words simultaneously. Optics has the advantage over electronicsof directly supporting associative processing by providing economic and efficient interconnects, massiveparallelism, and high-speed processing. The principles of designing an optical content-addressable parallelprocessor (OCAPP) for the efficient support of parallel symbolic computing are presented. Thearchitecture is designed to exploit optics advantages fully in interconnects and high-speed operations.Several parallel search-and-retrieval algorithms are mapped onto an OCAPP to illustrate its capability ofsupporting parallel symbolic computing. A theoretical performance analysis of these algorithms ispresented. This analysis reveals that the execution times of the parallel algorithms presented areindependent of the problem size, which makes the OCAPP suitable for applications in which the numberof data sets to be operated on is high (e.g., massive parallel processing). A preliminary opticalimplementation of the architecture with currently available optical components is also presented.

I. IntroductionThe information explosion seen in recent years in allfields of human endeavor has stimulated the develop-ment of computer-based information systems to as-sist in the creation, storage, modification, classifica-tion, and retrieval of mainly textual or symbolic data.For example, progress in database management sys-tems, expert systems, and intelligent knowledge-based systems is increasing demand for symbolicinformation processing such as text editing, file pro-cessing, table sorting, searching, and retrieval, whichhave no numerical meaning. In fact, a substantialproportion of the work load of modern informationprocessing systems involves searching and sortingsymbolic data.1 2 Nevertheless, a majority of comput-ers are designed mainly for numerical computations,

The author is with the Department of Electrical and ComputerEngineering, University of Arizona, Tucson, Arizona 85721.

Received 16 November 1990.0003-6935/92/173241-18$05.00/0.o 1992 Optical Society of America.

and they suffer from a fundamental handicap thatstems from the principle of addressing the memory.The notion of locating information by its address isfundamentally a weak design concept for symbolicdata processing.

When a search for a value is made through alocation-addressable memory, the entire memory mayneed to be searched one word at a time (if the data arenot sorted in memory), which consumes a great dealof time. There is no logical reason why the searchmust be done sequentially. The only reason stemsfrom the fundamental handicap of separating process-ing and memory and addressing memory one word ata time. This fundamental flaw has forced systemanalysts and programmers to develop sophisticatedsoftware techniques for symbolic information process-ing such as hashing and indexing.34 However, theimplementation of such software techniques on loca-tion-addressed computers has lead to complex, expen-sive, and inefficient information processing sys-tems.5 3

Searching, retrieving, sorting, and modifying sym-bolic data can be significantly improved by the use of

10 June 1992 / Vol. 31, No. 17 / APPLIED OPTICS 3241

content-addressable memory instead of location ad-dressability.3,6 In such a method, the absolute loca-tion of each data object has no logical significance; allaccess to data objects is by content. As an illustration,consider a professor trying to find out how manystudents in his class are familiar with the opticalcomputing subject. If he considers the students to bea coordinate-addressed memory, he asks the ques-tions: Does the student in row one, column one knowthe subject? Does the student in row one, column twoknow the subject? He does this one at a time until heexhausts all the seats in the classroom. If the profes-sor assumes a content-addressable memory, he standsbefore the class and says: If you have prior knowl-edge of optical computing, please raise your hand. Hethen gets multiple responses (if any) at the sametime. In concept, such associative processing is anaturally parallel form of symbolic representationand manipulation of abstract data structures, and haspotential benefits in simplicity of expression (program-ming), storage capacity, and speed of execution.3 78

There are two hypotheses underlying this paper:

(1) Associative processing provides a sound basisto uncover inherent parallelism in symbolic process-ing and information retrieval systems.

(2) Optics is potentially the ideal medium toexploit such parallelism by providing efficient imple-mentation support for the associative processingmodel.

The rest of the paper is organized as follows.Section II briefly reviews associative processing. Thisincludes a basic architecture, benefits, and difficultiesfaced by electronics for implementing this computingmodel. Section III presents the design concepts andstructural organization of an optical content-address-able parallel processor (OCAPP). Section IV describesa variety of parallel algorithms for searching andinformation retrieval that can be efficiently imple-mented on an OCAPP, and in Section V we considerthe optical implementation of the different compo-nents of an OCAPP. Here, only a general descriptionof the implementation issues will be considered.Section VI presents an estimated performance analy-sis, and Section VII concludes the paper.

11 Associative Processing

A. BackgroundIn an associative memory, information is addressedby its contents.3 An associative processor is a parallelprocessing machine in which the data items arecontent addressable, and it has the added capability ofwriting in parallel into words satisfying certain crite-ria. It may be that the entire contents of stored wordsmay be changed, or just a few bits of the words. Abasic associative architecture that can provide aparallel search and update is depicted in Fig. 1.3 Itconsists of n words of associative memory, word select

K OUTPUT REGISTER

Fig. 1. Basic architecture of an associative memory processor.

logic, bit-slice select logic, a control memory forstoring programs and control, response register, mul-tiple match resolver, an output register, and someauxiliary circuits for control. The data memory con-sists of n words with m bits per word. Each bit cell inthe memory contains storage and comparison logic todetermine whether it matches an interrogating signalbroadcast by the bit-slice select logic, and also forparallel read and white. In the bit-slice select logic,the comparand register C is used to hold the keyoperand that is being searched for. The mask registerM is used to enable or disable the bit slices involved inthe parallel comparison operations across all thewords of the memory array. The response register Ris used to indicate if there are matching words. Themultiple response resolver is used to select one (itmay be the first) of the matching words. Parallelcomparisons are performed on all memory words.

This memory organization leads to a computationalmodel with the following advantages over conven-tional location-addressable models9 10:

* Information is processed within the associativememory, without transfer to an independent process-ing unit. Since there is no addressing of data and nodata movement, this implies the elimination of thefundamental von Neumann bottleneck encounteredin conventional systems.

* The amount of time required for searching,retrieving, and updating information is independentof the memory and the problem sizes (e.g., thenumber of words involved).

* There is no restriction on the length of wordsthat can be stored in the memory. This implies acomputational environment with a variable wordlength, as opposed to conventional systems in whichthe word length is fixed at design time.

3242 APPLIED OPTICS / Vol. 31, No. 17 / 10 June 1992

* Because of the modular and uniform nature ofthe associative memory model, an associative proces-sor can be easily expanded by adding identical cellsand by adjusting the control registers accordinglywithout any effect on the software and the controldesign. This implies great potential for scalability andexpandability.

B. Problems with Electronic Content-AddressableMemoriesDespite the above advantages of associative process-ing, this model of computing is not extensively usedbecause of the difficulty and high cost of implement-ing it in conventional electronic technology.5 3 In-stead, software methods such as hashing and index-ing that emulate a content-addressable capabilityhave been used. The major problems with currentelectronic content-addressable memories (CAM's) arethe following:

(1) Each bit cell in an associative memory is muchmore complex and requires more circuitry than does aconventional cell. Even with the advent of very-large-scale integration technology, the single cell complex-ity still does not allow for the use of large associativememories.

(2) The memory storage provides poor storagedensity compared with conventional memory.

(3) The third major difficulty is the complexity ofthe interconnects. Recall that in order for all cells tocompare their values to that of the comparand regis-ter, the control unit must broadcast the value to allcells involved in the comparison. However, usingconventional technology, we see that the time delaysassociated with the broadcasting function are appre-ciable. Moreover, intercell interconnects become cum-bersome for a large array size.

(4) The fourth difficulty is the lack of an efficientmeans of implementing parallel access to the cells,namely, parallel input and output.

Ill. Optical Content-Addressable Parallel ProcessorOptical systems hold the promise of providing effi-cient support for future parallel processing systems.Optics advantages have been cited on numerousoccasions. 11-'8 These include inherent parallelism,high spatial and temporal bandwidths, and noninter-fering communications. For associative processing,optics may be the ideal solution for the fundamentalproblems faced in electronic implementations; namely,cell complexity, interconnect latency, difficulty ofimplementing information broadcasting, and parallelaccess to the stored data.

A. Roles of Optics in Content-Addressable MemoryDesignOptics can provide direct architectural support forCAM's in the following ways:

(1) Optics can alleviate the cell complexity by

migrating the implementation of wiring and logic intofree space.

(2) The high degree of connectivity available infree-space space-invariant optical systems, and theease with which optical signals can be expanded(which allows for signal broadcasting) and combined(which allows for signal funneling) can also be ex-ploited to solve the interconnect design and alleviatenetwork latency problems.

(3) Optical and electro-optical systems can offerconsiderably more storage capacity (especially second-ary storage) than pure electronic systems.19-21

(4) Finally, the multidimensional nature of opti-cal systems can alleviate the input-output problemby providing parallel access to the stored-processeddata.

Most current research in optical CAM's is related toneural network models.2223 Neural network modelsused as CAM's can produce correct results even whentheir inputs are only partially presented. Thereforethey can be used to select the best matches whenpartial inputs are provided. However, these modelsrequire not only a radical departure from currentarchitectures, but also a major change in the program-mability of computers. This paper concentrates onthe use of optical CAM's for traditional search (exact-match) and information retrieval applications thatare encountered in database-knowledge-base process-ing, expert systems, and list and string processing.Our goal is to develop adequate architectural supportfor these applications by exploiting the unique advan-tages of optics without radical changes to currentprogramming practices. The rationale is that theresulting architecture should be similar to currentcomputer systems but expanded in capability, forthere is a significant base of applications that can takeadvantage of such an expanded system.

_111111110 2-D array of optical data UNIT(n x m data bits) L LY OPTICA

iD t t tical data ~~~~~~~ELECTRICAL-- - 1D vector of opialdt OUTPUT

(n x 1 data bits)

Fig. 2. Schematic organization of the optical content-addressableparallel processor.


B. Optical Content-Addressable Parallel Processor:Overview

In Fig. 2 a preliminary organizational structure foran OCAPP is proposed. The architecture is organizedin a modular fashion and consists of a selection unit, amatch-compare unit, a response unit, an output unit,and a control unit. The architecture is developed tomeet four goals, namely, (1) exploitation of maximumparallelism, (2) amenability to optical implementa-tion with existing devices, (3) modular design that canbe scalable to bigger problems (to be explained later),and (4) capability of efficiently implementing informa-tion retrieval and symbolic computations. Moreover,the programming methodology for an OCAPP iscompatible with that of existing single-instructionmultiple data systems.2 4 In what follows, we describethe role of each unit. Detailed optical implementationof an OCAPP is presented in Section V.

The selection unit is schematically described in Fig.3. It consists of (1) a storage array of n words, each mbits long (in actuality, the storage array capacity isn x 2m, since each bit position consists of a true bitwij and its complement 7jj); and (2) word and bit-sliceenable logic to enable-disable the words and-or thebit-slices that participate in the match operation andreset the rest. It is assumed that the storage array canbe loaded in parallel and (if need be) read in parallel.Optical means for achieving this are discussed below.

The match-compare unit shown in Fig. 4 contains(1) a 1 x m interrogation register I; (2) logic hardwareto perform parallel bitwise comparison between thebits of the interrogation register and the enabled bitsof the storage array; (3) two n x 1 working registers,G and L, which are used for magnitude comparisons;(4) an n x 1 response register R for displaying theresult of the comparison; and (5) a single indicator bitcalled the match detector MD, which indicates whether

Fig. 3. Organization of the selection unit.

2-0 OPTICALDATA FROM RISELECTION 7' COMPARISONUNITLOI

/ . . ~~~~~~~G L:LSS-THAN

R G GREATER-THAN

a t ~R: RESPONSE

MATCHDETECTOR BIT

Fig. 4. Organization of the match-compare unit.

or not there are any matching words. This unit allowsthe comparison of a single operand stored in theinterrogation register with all the words stored in thestorage array. As such, it is considered an SIMD unit.Bit position Ri of R is set to one when word Wi of thestorage array matches the contents of I. The I registeris a combination of the comparand register C and themask register M, as shown in Table 1. As such, itholds the operand (depending on masking informa-tion, if any) that is being searched for or is beingcompared with. It is assumed that register I isavailable in dual-rail logic (both true and complementbits are available).

The response unit is responsible for selecting thefirst matching word when there are several matchingwords. It consists of a combinatorial priority circuitfor selecting the first matching word. Depending onthe program control, the output of the response unitis either routed to the output unit for outputting theresult or fed back to the selection unit for furtherprocessing of the matching words. All units are underthe supervision of a conventional control unit withconventional storage (e.g., a local random-access mem-ory), which stores the program instruction. Its role isto load and unload the storage array, set and resetvarious registers such as the I, R G, and L registers ofthe match-compare unit, enable and disable memorywords, perform conditional instructions, monitor theMD bit, and test program termination. In Section IV

Table I. Formulation of Interrogation Register I

Search Bit, Mask Bit, Interrogation Bits,Cj mi ij,

0 0 011 0 100 1 001 1 ooa

aThe entry Ij j = 00 means that bit positionj for all words doesnot participate in the match operation (e.g., bit slicej is disabled).


we describe the implementation of several parallelalgorithms on an OCAPP in order to show its imple-mentation requirements and processing benefits. Wethen proceed to describe the optical implementationof its functional units in Section V.

IV. Parallel Search Algorithms on an OpticalContent-Addressable Parallel ProcessorWe classify search operations as basic and compoundoperations. A basic search operation is one that can becompleted in one sweep over all the bit slices of thestorage array. It does not involve any feedback process-ing. A compound search operation requires a feed-back from the response unit to the selection unit. As aconsequence, it takes more than one sweep over thestorage array to complete. For basic search opera-tions, we group the following operations:

* The equivalence search comprises the equalitysearch, the not-equal-to search, and the similaritysearch (search for a match within a masked field).

* The threshold search comprises the smaller-than, the not-smaller-than, the greater-than, and thenot-greater-than searches.

* The extremum search comprises the greatest-value search and the smallest-value search.

Compound search operations can be implementedin a series of basic search operations. For the com-pound search, we group the following operations:

* The adjacency search comprises the next-abovesearch and the next-below search.

* The between-limits search is the search forwords z between two limits X and Y (X < Y) for theconditionsX < z < Y X < z < YX < z < Y, andX <

< Y.* The outside-limits search is the search for

words z outside two limits X and Y (X < Y) for theconditionsX 2 z > Y, X > z 2 Y, X 2 z > Y, andX >

> Y.* The ordered retrievals (sorting) are the ascend-

ing order retrieval and the descending order retrieval.

Of course, many more compound search operationscan be formulated by using the basic search opera-tions. The above search operations are the mostfrequently used in information retrieval applications.

A. Parallel Algorithms for Basic Search Operations on anOptical Content-Addressable Parallel Processor

In what follows, we denote a memory word as Wi=(wilwim-i ... Wim), where wij is the jth bit cell of theword Wi, wil is the most significant bit, and wim is theleast significant bit. We denote the j th bit slice byBj = (wlijw2j... w,,j), which is made up of the jth bitof every word in the storage array. The interrogationand response registers are denoted by I = IJ2 ... I m

and R = R 1R 2 ... Rn, respectively. The comparandword (search argument) and the mask register wordare denoted by C = (c 2 ... cm) and A = (A 1A2 ...Am), respectively.

1. Equivalence SearchIn this type of search, the memory is partitionedaccording to the magnitude of the search word C intotwo sets, namely, words that are equal to C and wordsthat are not. The equality and masked search opera-tions can be implemented by a bitwise match.3 For anequality match, all the bits of the search word need tobe matched, whereas for the masked search, only asubset of the bits of the search word is compared withthe respective bits of the memory words. A = 0means that c; is not masked, while Aj = 1 means thatcj is masked. These two search modes can be com-bined as shown in Table 1. The comparison operationcan be accomplished by using either the exclusive-OR(for bit mismatch) or the exclusive-AND (for bit match)logic functions.3 In the proposed system, we considerthe exclusive-OR function since it can be efficientlyand easily implemented in optics.

Given a search word C (including masking informa-tion), we see that the exclusive-OR function for a bitmismatch denoted by bi on the jth cell of the ithmemory word is

bij = (Ij A wj) V (Ij A wij), (1)

where the symbols A, V, and the bar ( ) denote thelogical AND, the logical OR, and the logical NOT,

respectively. Now the inequality of memory word Wiwith interrogation vector I requires at least one bitmismatch between the two words, and therefore

j=mMi = V bij = b V bim-i V ... V bil,

j=1(2)

where the big V denotes a logical OR over all bits andMi indicates a word mismatch. Alternatively, theequality of words Wi and I is also computed by

Ri = Mi =(IAi) V (IA wil) A, ) V(ImA wim).

(3)

Equation (3) indicates that matching words in mem-ory will be flagged by having their corresponding R bitset to one, and all mismatches will have their R bitsset to zero. Indicator bit Ri can be computed by usingNOR logic on all the mismatch bits bij forj = 1, . . . , m.Moreover, Eq. (3) is space invariant and can beimplemented in bit-parallel and word-parallel fashionbecause of the inherent parallelism of optics. Conse-quently, all Ri's for i = 1, . . , n can be computed atthe same time with a single access to the storagearray. Implementation details are described in Sec-tion V.


Equivalence Search Algorithm(1) Initialize by using the following steps:

(a) Load I (this will depend on the search wordand the masking condition).

(b) Clear R (clear all bits in the R register).(2) _Compute the match condition with R =

Vj-l bij for i = 1, . . ., n. (Ri = 1 if and only if Wmatches I.)

2. Threshold SearchThis mode of search partitions the memory accordingto the magnitude of the interrogation vector I intothree sets, namely, words equal to, less than, andgreater than I. We introduce here an algorithm thatprovides this result by simultaneously using threeregisters of the response unit, namely, R, G and L.We should note that there are other parallel algo-rithms that can accomplish the same effect.6 Initially,all memory words are made active by setting registersRGL = 100 (for all bits). The memory is scanned fromthe most significant to the least significant bit posi-tion by enabling a single bit slice at a time. When thebit I is one, we select all active memory words withwi = 0 as "less than" by setting their correspondingbit position RGL = 001. These words are thendisabled from further comparisons. Similarly, whenI = 0, we select all active memory words with wi = 1as "greater than" by setting their corresponding bitposition RGL = 010, and then we disable them fromfurther processing. At the end of the last bit position,words still in the state RGL = 100 are equal to thecomparand, words in the state RGL = 010 are greaterthan the comparand, and words in the state RGL =001 are less than the comparand. It is important tonote that even though we are scanning the memoryfrom the most significant bit to the least significantbit, the search process can be terminated any timethere are no matching words at a given bit position(R = 0 for all i = 1, . . , n). Such a condition is easilydetectable by checking the MD bit.

Threshold Search Algorithm(1) Initialize by using the following steps:

(a) Load I (depending on the search word andmasking condition).

(b) Enable memory words.(c) Set R, clear G clear L, and set j = 1 (the

variable j is used by the control unit to scan thestorage array).

(2) Perform the magnitude search at bit-slice j(compute R, G, and Li). R = V b, G = ij A w,Li = I A WJ for i = 1, . . ., n (note that only theenabled bit slicej determines the values of R, G, andLi; all other bit slices are disabled at this time, andtherefore have no influence on R, G, and Li).

(3) Test whether MD = 1 (are there any wordsthat match the I register at the current bit positionj?), and then use the following procedures:

(a) If MD = 1, proceed to step 4.(b) If MD = 0, proceed to step 6.

(4) Disable memory words whose corresponding

bits in R are zero. Memory words whose correspond-ing bit Ri = 0 have already been decided on. Wordswith Li = 1 are less than I, and words with Gi = 1 aregreater than I. These words can be outputted at thistime if needed.

(5) Incrementj withj '--j + 1, test ifj = m, anduse the following procedures:

(a) Ifj • m, proceed to step 2.(b) Ifj = m, proceed to step 6.

(6) The search is done and the result is reportedin R, G, and L.

Table II illustrates the threshold search algorithmfor a magnitude search of 7 words, each 5 bits long.

3. Extremum SearchThis type of search refers to finding the maximum (orminimum) of a set of (or all) memory words. Weconsider first the search for the maximum.

a. Maximum Search. To find the maximum wecan scan the memory words from the most to the leastsignificant bit positions. As we scan the bit slices, wedetermine if any of the enabled words have a one inthe current bit position. If we find some, we disable allthose words that do not have a one in this position. Ifnone of the words at the current position possesses aone, we keep scanning. At any given time, all remain-ing candidate words are equal as far as we haveexamined them, because for every bit position eitherevery word has a zero in that bit position, or whensome words have ones, we disable the ones with zeros.Therefore at bit positionj, enabled words with wi = 1are larger than enabled words with wi = 0. Since weare seeking the maximum, we disable the ones withwi = 0. This process is repeated until we exhaust allbit positions, at which time the maximum word willbe indicated by R = 1.

Finding the Maximum Algorithm(1) Initialize by using the following steps:

(a) Load I with I 11 ... 11 (I is loaded withall bits set to one).

(b) Set R, clear MD, and setj = 1.(c) Enable memory words.

Table II. Example of Threshold Search Algorithms

State of RGL at theMemory End of Each IterationWord,

1 Wi 1 2 3 4 5 (Last Iteration)

1 10111 100 100 100 100 010 (W > S)2 11000 100 010 010 010 010 (W2 > S)3 10010 100 100 001 001 001 (W < S)4 10110 100 100 100 100 100 (W4 = S)5 10101 100 100 100 001 001 (Ws < S)6 01101 001 001 001 001 001 (W6 < S)7 11101 100 010 010 010 010 (W7 > S)

aSearch word S, 10110; mask word, 00000; I register, 10110(effective word search).


(2) Perform an equivalence search at bit slice j(compute Ri for all i = 1, . . , n).

(3) Test whether MD = 1 (are there any wordswith a one in the current bit positionj?), then use thefollowing procedures:

(a) If MD = 1, proceed to step 4.(b) If MD = 0, proceed to step 6.

(4) Disable all words that do not have a one in thecurrent bit position (these words are indicated byRi = 0).

(5) Clear R and MD.(6) Incrementj withj -j + 1, test ifj = m, and

choose one of the following:(a) Ifj • m, go to step 2.(b) If j = m, the output maximum value is

indicated by Ri = 1.

Table III illustrates the maximum search algo-rithm.

b. Minimum Search. The search for the mini-mum is very similar to the search for the maximumexcept that the I register is initially loaded with zeros,and if any enabled word has a zero in the current bitposition (there exists a memory word Wi such that itscorresponding Ri = 1), we disable the words with aone in the current bit position (Ri = 0). These wordshave to be greater than the minimum sought. Theprocess is repeated until we exhaust all bits of theenabled words. The minimum value will also beindicated by a one in register R.

B. Parallel Algorithms for Compound Search Operationson the Optical Content-Addressable Parallel Processor

Compound search operations such as the ones statedin Subsection IV.A. cannot be economically imple-mented by a single sweep over the memory words. Wetherefore choose to implement such operations as aseries of basic searches. The rationale is to keep thearchitecture as simple as possible, and thereforemake it highly amenable to optical implementation.Of course, speed improvements can be gained byimplementing these search operations as a basicsearch, but the number of logic circuits may beextensive.

Table Ill. Example of Maximum Search Algorithm

State of Register R at theEnd of Each Iteration

Memory Word, i 1 2 3 4 5 (Last Iteration)

11000 1 1 0 0 0

11100 1 1 1 0 0

10001 1 0 0 0 0

11110 1 1 1 1 1 (Maximum)11001 1 1 0 0 0

1. Double-Limit Search (Between and OutsideLimits)Given two numbers called HIGH and LOW, we see thatthe double-limit search consists of finding thosewords that are between these limits and-or wordsthat are outside these limits. This gives rise to eightdifferent searches that can be accomplished in a verysimilar manner. Let us consider the between-limitsearch. Given the two numbers HIGH and LOW, wewish to find those words that are greater than LOW

but less than HIGH, namely, all Wi such that LOW <

Wi < HIGH. We can accomplish this search by usingthe threshold search described previously, as follows.First, we determine the words that are less than thecomparand HIGH. These words will be indicated by aone in the L register. We then disable all other wordsexcept the ones that are less than HIGH, and performanother threshold search by using the comparandLOW. After the second search, words that are less thanHIGH and greater than LOW will be marked with a onein the G register, which could be routed to the outputunit for outputting the search result. The algorithmfollows.

Between-Limits Search Algorithm.(1) Initialize by using the following steps:

(a) Load I with comparand HIGH.

(b) Enable all memory words (initially all mem-ory words participate in the search).

(c) Set R, clear G, clear L, and set j = 1.(2) Perform the threshold search.(3) Disable memory words with Li = 0. Words

with Li = 0 are either greater than or equal tocomparand HIGH and therefore need to be disabled atthis time.

(4) Load I with comparand LOW.

(5) Perform the threshold search (at the end ofthis step, all words that are greater than LOW and lessthan HIGH will be marked with a one in the Gregister).

(6) Route the G register to the output unit (bitGi = 1 of register G indicates that the memory wordWi satisfies LOW < Wi < HIGH).

2. Adjacency SearchTo find the word that is next above the comparand(the smallest word larger than the comparand), wesearch for all words that are larger than the com-parand and then select the minimum. Similarly, tofind the word that is next below the comparand (thelargest word smaller than the comparand), we searchfor all words less than the comparand and select themaximum.

Next-Above Search Algorithm(1) Initialize by using the following steps:

(a) Load I with a search word.(b) Enable memory words.(c) Set R, clear G, clear L, and setj = 1.

(2) Perform the threshold search (get all wordsgreater than the comparand).

(3) Disable those words that are less than the


comparand (these words will be indicated by a zero inthe G register).

(4) Perform a minimum search on all enabledwords. The word next above the comparand is indi-cated by a one in the R register.

The search for the largest word smaller than thecomparand (next-below search) can be carried out byan algorithm similar to the one above. In this case,step 2 of the next-above algorithm is replaced by asearch for words that are less than the comparand,and step 4 is replaced by a maximum search.

3. Ordered Retrieval (Sorting)Sorting a set of data using an OCAPP is straightfor-ward. This can be achieved by performing the extre-mum search repeatedly until all the data are re-trieved. For the ascending order retrieval we enablethe memory words to be sorted and determine theminimum (by using the minimum search operation).We output the obtained minimum value and disable itfrom the storage array. We repeat these steps until weretrieve (in ascending order) all the enabled words.For descending order retrieval we select the maxi-mum value at each step.

Ascending-Descending Order Retrieval Algorithm.(1) Enable the set of words to be ordered.(2) Perform a minimum-maximum search (deter-

mine the smallest or the largest word for descendingorder retrieval).

(3) Route R to the output unit (output the mini-mum-maximum word).

(4) Disable the selected minimum-maximumword.

(5) Repeat steps 2-4 until all enabled words areexhausted.

V. Optical ImplementationIn Section V we identify the fundamental and basicoperations required to implement the optical architec-ture and describe possible optical components forachieving them.

A. Basic Operations and Hardware Components RequiredAn analysis of the conceptual OCAPP and of thealgorithms that have been described in Sections IIIand IV, respectively, reveals that the following isrequired:

(1) Data bits are in optical form and must beavailable in dual-rail format (both the value and itscomplement are required).

(2) Parallel access is required for writing intoand reading from the storage array and the variouscontrol registers such as the interrogation registerand the response register. This also includes selectivewriting into and selective reading from the memory ofa single word or several words.

(3) Disablement of a memory word (or severalmemory words) whose corresponding bit in the re-

sponse register (or the G register or the L register) isnot asserted.

(4) Disablement of a memory word whose corre-sponding bit in the priority register P is asserted.

(5) Test the match detector bit MD.(6) Optical array logic functions.(7) Space-invariant optical interconnects.(8) Optical broadcasting and funneling.(9) Optical feedback connections.

(10) Dynamic routing of information (e.g., rout-ing contents of register R to a selection unit, anoutput unit, or a response unit, depending on thealgorithm).

The optical components required to accomplish theabove operations can be divided into logic elements,storage elements, and information transfer elements(or interconnects).

For optical logic and storage, many approaches arebeing investigated. One approach is the adaptation ofthe spatial light modulator (SLM) technology tooptical logic.25-27 Another approach for realizing opti-cal components capable of performing logic is tooptimize the device from the beginning for digitaloperations. The recent emergence of the quantum-well self-electro-optic effect device (SEED) and itsderivative, the symmetric or S-SEED, are such prod-ucts. 28,29 The SEED devices can be used to realizelogic operations such as NOR, OR, AND, and NAND, andthey can be used for storage such as set-reset (S-R)latches.30 Optical resonators are another family un-der this approach that are intended for opticallogic. 31 32 Recently, a promising new device called thesurface-emitting laser logic (CELL) device has beenintroduced both as an optical logic and as a latchingdevice.33 These devices are claimed to be ideal forapplications that require high-optical gain, cascadabil-ity, and insensitivity to external optical feedback.

All data movements and information transfer in anOCAPP are space invariant, which may render theirimplementation easier. Classical optical componentssuch as lenses, mirrors, beam splitters, holographicdeflectors, and delay elements are most likely to beused for this purpose.34 In addition, half-wave plates,shutters, and masks may be used for dynamic rout-ing.35-38

B. Modular Implementation of an OpticalContent-Addressable Parallel ProcessorIn this paper we present a modest design example ofan OCAPP, and we use existing optical hardware inorder to highlight the potential implementation is-sues of a practicable realization. The implementationof this first version will make use both of the SEEDdevice for optical logic operating as a NOR gate, and ofthe S-SEED device operating as a S-R latch forstorage.30 The NOR gate is preferable to any otherform of thresholding because it requires only discern-ment between the state in which no light comes inand the state in which light comes in. Thus a NOR gate


requires a signal-to-noise ratio better than one only.The family of SEED devices seems to be easy to use,and the devices are capable of high speed, low-energyoperation, and realization in a two-dimensional for-mat. 39 Space-invariant optical interconnects, dy-namic masking components, and beam spreading andcombining components are assumed for data routing.

A schematic diagram of the S-SEED device operat-ing as an S-R latch is shown in Fig. 5(a). The state ofthe device is set by a pair of unequal signal beamslabeled S (used for setting the output Q = 1, Q = 0)and R (used for resetting the output Q = 0, Q = 1).The device is set (Q = 1) when the power incident onthe S input is much higher than the power incident onthe R input. The state of the device is read by applyingtwo equal-power (clock-signal) beams to both inputs.During the setting of the device, the clock beamsmust be low in comparison to the signal beams. Thedevice holds its state when no clock signal is incident.Thus the device can operate as a latch. Moreover,during the application of the clock signal (the readingprocess), the state of the device is unaltered as shownin Fig. 5(b).

As described earlier, the optical processor can beconstructed from several units: the selection unit,the match-compare unit, the response unit, theoutput unit, and the control unit. In SubsectionsV.C-V.G we describe the optical implementation (ar-chitectural rather than experimental setup) of each ofthese units. Moreover, the details in the routing andimaging paths, such as lenses, holographic elements,masks, beam splitters, and polarizers, as well aspower supplies, presetting, and beam generation for

Va

S r MOWMODULATOR

CLOCK

MOW

R MODULATOR

ELECTRICAL POWEROPTICAL SIGNAL (a)

CLOCK

S(SET)

R(RESET)

0

0

the SEED, have been omitted from the diagrams toassist the reader's conceptual understanding of theseconfigurations. The details of the actual use of SEEDdevices as optical logic elements can be found in Refs.39-41. For diagrammatic simplifications, the SEEDand S-SEED devices used in the following figures areassumed to be transparent rather then reflective.

C. Optical Selection UnitThe optical selection unit of Fig. 6 is composed of astorage array, which consists of a two-dimensionaln x (m + 1)-bit array of clocked S-SEED devices(each entry in the array at position i, j has twoincoming bits S, R and two outgoing bits wij, Wij); aclocked n x 1-word register A, which serves to writedata words into the storage array; and a clocked 1 xm-word register B for writing a single bit slice into thestorage array. The first column of the storage array isreserved for an n X 1-bit enable register ER, whosebit ERi = 1 if and only if memory word Wi is enabled(more on enabling and disabling memory words willfollow). The storage array can be directly loaded withinput data in two-dimensional optical form from anexternal optical memory storage such as a page-oriented holographic memory,20 or from a regularelectronic memory if electrically addressed SLM's(E-SLM's) are used at the input side of the processor.

1. Writing a Word-Bit Slice Into the Storage ArrayThe storage array is assumed to be loaded in parallelat the beginning of the program. During programexecution, the contents of the storage array can bealtered by the use of the A and the B registers. Thelatter can be implemented by using E-SLM's such asthe magneto-optic modulators (SIGHTMOD42) or theCCD-based liquid-crystal valves.

To write a word in the storage array, say at wordposition i, we first write the word in E-SLM B. In thenext clock cycle, the contents of the B register arespread out vertically by using vertical spreadingoptics such that each bit By impinges on the set portsof the jth column of the storage array. Next, bit Ai ofE-SLM A (corresponding to word position i) is pulsedhigh and spread out horizontally such that it im-pinges on the set and reset ports of the ith row of thestorage array. A one bit is written in bit position wij ofthe storage array if and only if a high Ai and a high Bjcoincide at the set port of bit wij. Similarly, a zero bitis written in bit position wij if and only if a high Ai anda high B1 coincide at the reset port of wij. This ofcourse assumes that the set-reset thresholds of theS-SEED devices are so designed. Similar operationstake place for writing a bit slice in the jth column ofthe storage array, with the exception of interchangingthe roles of the B and A registers.

* TIME

(b)

Fig. 5. S-SEED device operating as a clocked S-R latch: (a)schematic diagram, (b) timing diagram. MQW, multiple quantumwell.

2. Enabling-Disabling Memory WordsBy enabling a memory word Wi is meant including itin the matching process. Similarly, by disabling it ismeant the excluding of it from further matching


I .: t

__FLL _-iiFLI

: i ! iFLF1__. . . .LFLF_

Input from R,GL registersof match/compare unit 2-D optical Input data

S-SEED device acting E-SLM Br-. as a clocked S-R latch Selective bit-sliceni W'i loading register

Optical Inverter gate

To opticalmatch/compare

-il unit

_ _ To opticaloutputunit

m+1) I Beam splitter

Halfwave plate(electrically controlled)

Fig. 6. Optical implementation of the selection unit.

operations. In order to have this capability, the firstcolumn of the storage array is designated as anenabling-disabling register ER. Bits of the ER regis-ter can be simultaneously set and reset, respectively,by using two external signals S-ER for setting all bitsof ER, and by using R-ER for resetting all bits of ER(bits Bo and Bo can be used for this purpose). Alterna-tively, selected bits of ER can be set and reset underprogram control by using an n x 1 SR register (shownin the Fig. 6), which can be loaded from the R, G, or Lregisters of the match-compare unit. To enable (dis-able) the entire memory words, the S-ER (R-ER) bit isset and spread out vertically to all the set (reset) portsof the ER register. To selectively disable memorywords whose R, G, or L bits are not asserted (R = 0,G = 0, or L = 0) requires the routing of the appropri-ate register (R, G, or L) to SR, which inverts its lightintensity and routes it to fall upon the reset ports ofER. Thus word W of the storage array is disabledfrom further processing if light emanating from bitposition SR = 1. For example, to disable memorywords whose R bits are not asserted (R = 0) fromfurther matching operations, we first route the con-tents of R to SR, which complements these data andimages the complemented data upon the reset portsof SR. With a low bit Ri = 0, the ith output ERi of theER register is also set low, which in turn disablesmemory word W from participating in further com-parisons. The use of the ER register in match opera-tions is explained next.

The output of the selection unit can be routed tothe match-compare unit for further processing, or tothe output unit for outputting the result. Suchrouting can be achieved by using polarization beamsplitters and electro-optical components that are capa-

ble of exchanging the state of polarization underelectrical control, as, for example, the Pockel's cells.34

D. Optical Match-Compare UnitThis is the most critical unit in an OCAPP since itperforms the match and magnitude comparisonsearches between the data stored in interrogationregister I and words of the storage array. Theseoperations should be implemented as parallel and asfast as possible. As shown in Fig. 7, the unit containsSEED arrays for comparison logic and three regis-ters, namely, the response register R, the greater-than register G, and the less-than register L. Parallelcomparison takes place between memory words ema-nating from the storage array and the interrogationregister I, as explained below in Subsection V.D. 1.

1. Optical Implementation of Parallel Match andComparison OperationsA match at bit wij is detected by an exclusive-ORprinciple as indicated by Eq. (1), and a word match iscomputed according to Eq. (3), which is rewrittenhere by using only the NOR function, as

(4)

and the relative magnitude comparison between theinterrogation word and stored data is given by

Gi = j A wij (word Wi > I),

Li = Ij A awj (word Wi < I).

(5)

(6)

Note that the magnitude comparison is performed bitserially; therefore bits Li and G require a single bit


.. I == -Ri = I, v =W- --ii v T, v wil, . V Wi. V I. V wi.,

1-D opticalNOR gates

I

2-D optical Ndata from optical I registerselection unit

Ri= V ER V ,V ER V I V i1 V match 'detector bit

Gi= i I iV Wu

Fig. 7. Optical implementation of match-compare unit.

comparison. In the best case (when there are nomatching words) it takes a single bit comparison (themost-significant bit position) to determine the valuesof Li and Gi. In the worst case, Eqs. (5) and (6) arecomputed m times.

In order to have the capability of selectively dis-abling memory words from further matching, weneed to incorporate the contents of the enablingregister ER in the match condition described above.One way of including ER in Eq. (4) would be

Ri = MV ERj foralli = 1. n. (7)

Whenever ERi is zero, Ri is zero, and therefore wordWi is disabled. The optical implementation of Eq. (7)would require that each bit ERi of register ER be splitinto two light beams and that both be directed to asingle NOR gate of the two-input NOR-gate array of thematch-compare unit. While this is feasible, it would

the storage array are simply imaged upon the two-dimensional NOR-gate array.

In order to keep the interconnections space invari-ant and to be able to selectively disable memory wordsfrom matching the interrogation register, we formu-late the match condition as follows:

Ri = M V (Io A ERi V Io A ERj) for all i = 1, . n. (8)

The optical implementation of the above equationdoes not require any special interconnection patternsbetween the selection unit and the match-compareunit because the ER register is treated as any othercolumn of the storage array. Therefore any imagingsystem would route the data, including ER, to thematch-compare unit. However, the interrogation reg-ister needs to be (m + 1)-bits long. The extra bit Io isset to one during a match-compare operation. Eq. (8)can also be expressed in terms of the NOR functiononly as follows:

Ri =IoVERVIoVERiVIVWiWVIiVwil V I.VWImVim.

require space-variant interconnections between theoptical selection unit and the optical match-compareunit since the rest of the data (e.g., memory words) in

It can easily be verified that when ERi = 0, the aboveequation yields Ri = 0, and when ERi = 1 bit, Ridepends on the magnitudes of the search word I and


(9)

|

the memory word Wi. Optical hardware to implementthe above equation is shown in Fig. 7. In order toimplement parallel comparison, the bits of register Ineed to be spread out vertically so that each bit I ()impinges on one port of the NOR gates of the jthcolumn of the two-dimensional NOR-gate array, whiledata bits wi-j (wv,) for i = 1, . . ., n impinge on thesecond ports of the same NOR gates of the two-dimensional NOR-gate array. The output of the two-dimensional optical NOR-gate array is replicated intothree copies. Cylindrical lens CL1 positioned in thepath of the first copy collects the output of an entirerow i into a single position i of a second one-dimensional NOR-gate array that represents registerR. The second copy is passed through a fixed mask M1whose purpose is to block part of the information thatis contained in the replicated copy, namely, the termIj V wij, and let pass the term Ij V w for all i =1, . . , n andj = 0, . ., m. The unmasled output iscollected by cylindrical lens CL 2 into a single one-dimensional optical NOR-array, which constitutes reg-ister G. Similarly, the third copy is passed through afixed mask M3, which blocks the terms I V w andlets through the terms I V wiJ. The unmasked dataare collected into a single one-dimensional opticalNOR-gate array by cylindrical lens CL 3 to form the Lregister.

A single level of light intensity on the input side ofbit position Ri will indicate that the word Wi of thestorage array and search word I differ in at least onebit. Inversely, if zero light impinges on position i ofthe one-dimensional NOR-gate array implementingthe R register, this will indicate that the two wordsare equal. Similarly, a single level of light intensity onthe input side of registers G and L will set theappropriate bit. Although it appears that bits Li andGi of registers L and G in Fig. 7 depend on all the bitsin row i (after appropriate masking) of the output of

the two-dimensional NOR-gate array, in actuality thatis not the case. Recall that registers L and G are usedonly in bit-serial algorithms in which a single bit sliceof the storage array is enabled at a time. Thereforeonly a single bit in row i of the storage arraydetermines the values of bits Li or Gi (see Section IIIfor details).

Finally, all the bits of register R are logically OR'edinto a single photodetector cell to form the matchdetector MD bit. The MD flip-flop is a quick indicationof whether or not there are any matches between thecontents of register I and memory words. The outputof the match-compare unit (e.g., the contents of theR, G, and L registers) can be routed to the responseunit for selecting the first matching word, or to theoutput unit for outputting the relevant data, or fedback to the selection unit for disabling irrelevantwords in bit-serial processing. Such routing is accom-plished by beam splitters, imaging optics, space-invariant optical interconnects, and control signalsemanating from the control unit.

2. Two-Dimensional Optical MatchingThe optical match-compare unit of Fig. 7 consists of asingle interrogation register I, and therefore allowscomparison of one search argument with the wordsstored in the storage array. However, because of themultidimensionality of optical systems, this unit canbe extended to perform multiple search operations ina single step. That is, several search arguments arecompared simultaneously with the words of the stor-age array. An extended multiple-interrogation matchdetector match-compare unit would have a k x mtwo-dimensional array of k search arguments, atwo-dimensional storage array of n words each m bitslong, and an m x k two-dimensional response arrayas shown in Fig. 8. Each response register R,(1 =1,..., k) would indicate the match between

Ik i AM storage array n x k response arrayk x n Interrogation array

Fig. 8. Optical implementation of a multiple match-compare unit. The interrogation and the response registers of Fig. 7 are replaced bytwo-dimensional arrays of search arguments and response registers, respectively. Register R, indicates the match or mismatch of memorywords with interrogation register I (for i = 1, . . ., ).


interrogation register I ( = 1, . . , k) and the wordsof the storage array. The two-dimensional matchoperation can be thought of as an optical binarymatrix-matrix multiplication that can be imple-mented by using several optical techniques.4 3-45

E. Optical Output UnitThe output unit outputs memory words whose corre-sponding bits in the R, G, L, or P registers areasserted. This unit should have the capability tooutput a single memory word or several words at atime. For a single word output, the output word isobtained by a multiplexing operation,

j = P1 A W11 V P2 A 2j V V Pn A forj=1. m.

(10)

The above equation can be expressed in terms of theNOR function as follows:

N

Data fromregister Pof responseunit

Oj= Pi v Wy v P+i V Wj+ V --- V P V W..

(11)

Similarly,

O = 1 V W1 j V 2 V W2j V .V n V W . (12)

Figure 9(a) depicts the optical implementation of Eqs.(11) and (12). The priority register is inverted with ann x 1 SEED inverter array, denoted by N. Eachoutput bit of register N is expanded in the horizontaldirection and imaged upon one port of a two-inputtwo-dimensional optical NOR-gate array. The contentsof the storage array are imaged upon the second portof the two-dimensional NOR-gate array. The output ofthe two-dimensional NOR-gate array is vertically col-lected by cylindrical lens CL1 to fall upon the one-dimensional optical NOR-gate array, which representsthe desired selected output word. Parallel readout ofseveral words can also be accomplished as shown inFig. 9(b). The optical two-input AND-gate array re-flects the superposition of a data page from thestorage array and a data page formed by the horizon-tally expanded register T, which can be loaded fromregisters R, G, or L. The superposed image is directedby beam splitter BS2 to a two-dimensional detectorarray for the parallel readout of desired words. Itshould be noted that Fig. 9(b) can also be used tooutput a single memory word if row-addressabletwo-dimensional photodetectors are used.

F. Optical Response Unit

The response unit contains a combinational prioritycircuit, and it contains a priority register P forindicating the first matching word in memory. Thepriority circuit allows only the first responder (thefirst memory word Wi whose Ri is one) to pass to thepriority register P.

The priority circuit can be implemented by usingseveral stages of one-dimensional NOR-gate arrays inthe form of a binary tree with space-invariant inter-

2-D optical data fromr/ l Ioptical selection unit array

2-ND-gate array\

Beam aplille/

CL2

Fig. 9. Optical implementation of the output uword output, (b) parallel readout of multiple words.

RorGorLregister

nit: (a) single

connections between them.4 6 The size of the NOR-gatearrays is equal to the number of words n stored in thestorage array, while the number of stages is propor-tional to log 2(n). A good technique for implementingsuch a unit would be the logic-interconnect architec-ture introduced by Murdocca et al.3 6 and Murdocca.47

The contents of the resulting P register are routed tothe output unit for outputting a single word and alsoare fed back to the selection unit for enabling anddisabling purposes.

G. Control UnitThe OCAPP is under the control of a memory controlunit, which comprises a local memory for storing


-VP5j=Ti-V---ijVr!Z`V--'j'� - " V W j""

0, =P1V�1jVF2V01jV ... V n V ZV71 -

programs and a program sequencer for executinginstructions that control the optical hardware such asthe S-R latches, the NOR-gate arrays, the routingshutters, and the splitters. The instruction set iscomposed of conventional assignment and condi-tional statements, and it is composed of additionalinstructions required to implement associative paral-lel processing. This includes data movement betweenunits, comparison operations, memory loading andunloading, and monitoring of the MD bit. Theseadditional instructions are few and are derived fromthe required fundamental operations described above.It should be noted that application programs forOCAPP can be written in conventional high-levellanguages such as Pascal or C, with few calls toexternal procedures that support parallel associativeprocessing.

VI. Performance AnalysisAn exact performance analysis of the proposedOCAPP, including speed, cost, and power-budgetbreakdown, is currently not feasible. We therefore tryto estimate theoretically the execution time of thevarious algorithms presented. For the current analy-sis, we will not estimate the power required for thesystem. The following are definitions of terms andassumptions used in the time complexity analysis:

(1) The storage array consists of n(m + 1)-bit-long words that require an S-SEED array of n x2(m + 1) pixels (recall that the first column is re-served for ER and that dual-rail coding is used).

(2) The response time of each algorithm is thesum of the following three terms: setup time Tsetup,execution time Tex, and transfer time Ttramsf. Thesetup time includes the time taken to load the datainto the S-SEED array and the interrogation wordinto the E-SLM, the time taken to clear controlregisters, and the time taken to initialize the system'svarious components, such as shutters and electro-optical components for routing purposes. The execu-tion time is the time taken to compute the desiredresult, and the transfer time is the time taken totransfer the computed result to the front-end or hostcomputer that drives the OCAPP. In this paper weonly estimate execution time.

(3) We assume that the response times of theoptical S-R latches (S-SEED's) and that of the variouslogic arrays such as the NOR-gate arrays, the AND-gatearrays, and the optical inverters are comparable andare all equal to Tresp.

(4) Tp is light propagation time through a 4fimaging system.

(5) T denotes the time required to load theinterrogation register I. Register I can be loaded inparallel in a single step since it is assumed to be aone-dimensional electrically addressed modulator.

(6) It is also assumed that reading memory wordsfrom the storage array and reading the I register isdone at the same time and takes Tresp. The enabling

and disabling of memory words are achieved in Tresptime, and testing the MD bit takes Treap.

A. Optical Content-Addressable Parallel Processor CycleTime and Estimated Execution Time for the Algorithms

1. OCAPP Cycle TimeThe cycle time of a computing machine is defined tobe the time required for the shortest well-definedprocessor micro-operation. 9 For the OCAPP, we de-fine the cycle time Tproc to be the time required for theequivalence search operation (optical comparison)since the latter is at the core of all the algorithmsintended for the OCAPP. This time includes (1)reading out stored data and propagating them to thematch-compare unit, (2) using the NOR function onthe stored data with the contents of the interrogationregister, and (3) propagating the output of the NOR-gate array and producing the result in the R, G, and Lregisters, as shown by

1 2 3

Tproc =Tresp + Tp + Tresp +T + Trep 3Tresp+ 2Tp.

(13)

The numbers over the braces indicate the timesneeded to accomplish each subtask as enumeratedabove. The dominant factor in Eq. (13) is the responsetime of the SEED arrays used for storage and logic.Although these devices have been demonstrated toswitch with speeds in the picosecond range (for asingle device), the required optical switching powerprohibits the use of larger arrays at high speeds.Currently, a single S-SEED gate requires approxi-mately 2.5 pJ per switching event. Therefore if wewere to use a 256 x 256 array of these gates (thisarray size has recently been reported), running at 100MHz would require laser power in the kilowatt range.This is excessively more power than is available frommost visible lasers. Moreover, these devices are sensi-tive to small temperature changes, which make themunreliable at these speeds. However, intensive re-search efforts are being pursued to lower the energypower requirements for larger array sizes of thesedevices and for solving the problems of focusability,operating wavelength, and stability at higher switch-ing speeds.

A promising alternative to the SEED family ofdevices is the cascadable optical logic devices calledsurfaCe-Emitting Laser Logic (CELL) devices.33 Thesedevices are expected to have a much more tolerantoperating range than SEED's. The CELL's are ex-pected to have a broadband input and will operate inthe gigahertz range. Of course, the proposed systemwill benefit from advances in other optically ad-dressed SLM's as well.

2. Execution Time for Threshold Search AlgorithmThe time it takes for the completion of one iterationof the threshold algorithm includes (1) reading out


stored data and propagating it to the match-compareunit, (2) using the NOR function on the stored datawith the contents of the interrogation register, (3)propagating the output of the NOR-gate array andproducing the result in the R, G, and L registers, (4)testing the MD bit, and (5) propagating the contentsof R to the selection unit and appropriately settingthe ER register. This process is either repeated asmany times as the word length m or terminated anytime the MD bit yields a zero light detection. Thebest-case execution time is then

1 2 3 4

Ttresh = Tresp + Tp + Tresp + Tp + Trop + Tresp, (14)

and the worst-case execution time is

1 2 3 4 5

Ttresh 2 m(Tresp + Tp + Tresp + Tp + Tresp + Tresp + Tp + 2

Treap)-

(15)

The numbers over the braces indicate the time neededto accomplish each subtask as enumerated above.Note that the loading time for register I after eachiteration is omitted from the maximum executiontime expression since this can be overlapped withsome other activities.

3. Execution Time for Extremum Search AlgorithmThe algorithm for the extremum search is quitesimilar to that of the threshold search algorithm.Therefore its execution time is equal to that of theworst-case execution time for the threshold search,

Textrem = m(6Tresp + 3Tp). (16)

4. Execution Time for Double-Limit SearchAlgorithmAs described earlier, the double-limit search algo-rithm is implemented by using the threshold searchtwice. The major processing steps are (1) searchingfor words less than the upper limit, (2) disabling thewords that are greater than the upper limit (wordswith corresponding Li = 0), (3) loading the lowerlimit in the I register, and (4) searching for words thatare greater than the lower limit. Thus the best-caseexecution time is

1 2 3 4

5. Execution Time for Adjacency Search AlgorithmThe adjacency search comprises three major consecu-tive steps; (1) a threshold search, (2) a disablingoperation, and (3) an extremum search (either asearch for a minimum value in the case of thenext-above search algorithm, or a search for a maxi-mum value for the next-below search). Therefore itsbest-case execution time is

1 2 3

Tadjc = 4

Treap + 2Tp + Tp + 2

Trep + m(6Tresp + Up),

(19)

and its worst-case execution time is

Tadjac = m(6Tresp + 3Tp) + Tp + 2 Treop + m(6Tresp + 3Tp). (20)

6. Execution Time for Ordered Retrieval AlgorithmThe execution time for ordered retrieval depends onthe number of memory words to be retrieved. Themajor processing step is a repeated search for anextremum (a search for a minimum for ascendingorder and a search for a maximum for descendingorder).

The best-case execution time would be obtainedwhen there are no multiple matches (e.g., only asingle bit set to one in the R register at the end of eachextremum search operation). In such a situation, thecontents of the R register are directly routed to theoutput unit where they are used to select a singleword from the storage array. In case of multiplematches and depending on the application, we mayneed to select only a single word at a time. In thiscase, the output of R is routed to the response unit forselecting the first match, which will be indicated inregister P. Then register P will be routed to theoutput unit for word retrieval. Thus the best-caseexecution time is

TsOrt = n[m(6Tresp + 3Tp) + 2Tp + 2 Tresp], (21)

and the worst-case time is

1 2

T.Ort = n [m(6Tresp + 3Tp) + Tp

3 4 5

+ log2 n(Tp + Treop) + T + 3 (Tresp + T)]. (22)

,-. I .

Tdouble = 4Tresp + 2Tp + T + 2

Tresp + Ti + 4Tre.p + 2Tp, (17)

and the worst-case execution time is

1 2 3 4

Tdouble = m(6Tresp + 3Tp) + Tp + 2

Tresp + Ti + m(6Tresp + 3Tp).

(18)

The number over the braces in the worst-case execu-tion-time expression correspond to the following sub-tasks: (1) searching for an extremum, (2) routingregister R to the response unit, (3) selecting the firstresponder in the priority register P, (4) routing P tothe output unit, and (5) outputting the selected word.Note that the time it takes to route R to the selectionunit and to disable the selected word in the ER


Table IV. Estimated Execution Time of the Parallel Algorithms on an Optical Content-Addressable Parallel Processor

Search Algorithm Minimum Execution Timea Maximum Execution Timea

Equivalence search 3 Trep + 2Tp 3Tresp + 2TpThreshold search 4 Tresp + 2Tp m(6Trep + 3Tp)Extremum search m(6Tresp + 3Tp) m(6Treap + 3Tp)Double-Limit search lOTresp + 5Tp + Ti 2

m(6Trep + 3T,) + T + T, + 2

TrepAdjacency search (m + 1) (STreap + 3Tp) 2 m(Tresp + 3Tp) + Tp + TresOrdered retrieval n[m(6Tresp + 3Tp) + 2Tp + 2Tresp] n[(6m + 3)Tresp + (3m + 5)Tp + log 2 n(Tp + Tresp)]

aThe parameters m and n represent the word length and the number of operands, respectively.

register is overlapped with that of routing R to theoutput unit and outputting the selected word.

Table 4 summarizes the estimated execution timefor the algorithms presented. It is important to notethat the execution time for the equivalence search,the threshold search (best-case), and the double-limitsearch (best-case) is a constant factor and is indepen-dent of the number of words in memory. The time forthe threshold search (worst-case), the double-limitsearch (worst-case), the adjacency search, and theextremum search is proportional to the word lengthand is independent of the number of words involvedin the operation. The delay time in ordered retrieval(sorting) is proportional to the product of the numberof words to be sorted with the word length (best case).This advantage translates into a speedup factor of m(number of bits per word) more than that for elec-tronic CAM's for the fundamental search algorithms.It is expected that such a speedup factor combinedwith the high speed at which these algorithms can beoptically executed will result in a system throughputfar better than any electronic associative machine canachieve.

Vll. DiscussionsAssociative processing based on content-addressablememories has been argued to be the natural solutionfor nonnumerical information processing applica-tions. Unfortunately, the implementation require-ments of these architectures when one uses conven-tional electronic technology have been cost prohibitive;therefore associative processors have not been real-ized. Instead, software methods that emulate thebehavior of associative processing have been pro-moted and mapped onto conventional location-addressable systems. However, this does not effectthe natural parallelism of associative processing,namely, the ability to access many data words simul-taneously. Optics has the advantage over electronicsof directly supporting associative processing by itsproviding economic and efficient interconnects, mas-sive parallelism, and high-speed processing.

This paper has presented the principles and initialdesign concepts of an associative architecture thatmatches well with optics advantages and is thereforehighly amenable to optical implementation. The archi-tecture relies heavily on the use of space-invariant

interconnections, optical signal broadcasting and fun-neling (combining), and the simultaneous applicationof the same operation to many data points (single-instruction multiple data mode of computing). Themotivation behind this architecture is to take advan-tage of the ease with which these operations can berealized with optics. A representative set of searchalgorithms have been presented to show the use andmerits of the architecture. These algorithms are keycomponents that occur in large computing tasks. It isimportant to note that these fundamental searchalgorithms are implemented on the optical architec-ture with an execution time independent of theproblem size (the number of words to be processed).This indicates that the architecture would be bestsuited to applications in which the number of datasets to be operated on is high. We are currentlyconducting a comprehensive study to determine thetype and range of applications for which the opticalarchitecture is most suitable. Some of the applica-tions being investigated are (1) real-time informationretrieval, (2) database management, (3) knowledge-base and expert system implementation, (4) list andstring processing, (5) bulk (numerical) processing(e.g., image processing), (6) pattern and speech recog-nition, and (7) implementation of data-driven architec-tures. At the architecture level, we are currentlyextending the one-dimensional matching concept (asingle search word is compared with the two-dimensional array stored words) to a two-dimen-sional scheme by which several search words aresimultaneously compared with the two-dimensionalarray. Such an extension will have a major impact ondatabase and knowledge-base processing.

We have derived a list of requirements in order tooptically implement the architecture, and we havepresented a preliminary and simple version of animplementation that meets these requirements. Thisinitial implementation version is meant to show onlythe feasibility of the architecture with existing opticalnonlinear devices and conventional components. Nooptimization attempts were made. Nevertheless, thispreliminary version reveals several key design issuesthat will determine the physical realization of such anoptical architecture. Even if we assume the availabil-ity of optical nonlinear devices (latches and NOR gates)


in large sizes, the effective memory size will becritically determined by the beam spreading-combinling optics, the contrast ratio, and the fan-inand fan-out factors of the logic elements to be used.

This research was supported by National ScienceFoundation grant MIP-8909216. The author thanksthe anonymous referees for their valuable sugges-tions.

References1. K. Hwang and D. Degroot, Parallel Processing for Supercom-

puters and Artificial Intelligence (McGraw-Hill, New York,1988).

2. R. Halstead, "Parallel symbolic computing," Computer 19(8),35-43 (1986).

3. T. Kohonen, Content-Addressable Memories (Springer-Verlag,New York, 1980).

4. C. J. Date, An Introduction to Database Systems (Addison-Wesley, Reading, Mass., 1986).

5. R. M. Lea, "VLSI and WSI associative string processors forcost-effective parallel processing," Computer J. 29, 486-494(1986).

6. C. C. Foster, Content Addressable Parallel Processors (Rein-hold, New York, 1976).

7. R. M. Lea, "Information processing with an associative paral-lel processor," Computer 8(11), 25-32 (1975).

8. C. Y. Lee and M. C. Paul, "A content addressable distributedlogic memory with applications to information retrieval,"Proc. IEEE 51, 924-932 (1964).

9. K. Hwang and F. Briggs, Computer Architectures and ParallelProcessing (McGraw-Hill, New York, 1984).

10. G. S. Almasi and A. Gottlieb, Highly Parallel Computing(Addison-Wesley, Reading, Mass., 1989).

11. A. A. Sawchuk and T. C. Stand, "Digital optical computing,"Proc. IEEE 72, 758-779 (1984).

12. A. Huang, "Architectural considerations involved in the de-sign of an optical digital computer," Proc. IEEE 72, 780-787(1984).

13. W. T. Cathey, K. Wagner, and W. J. Miceli, "Digital computingwith optics," Proc. IEEE 77, 1558-1572 (1989).

14. A. Louri, "3-D optical architecture and data-parallel algo-rithms for massively parallel computing," IEEE Micro 11(2),24-68 (1991).

15. K. Hwang and A. Louri, "Optical multiplication and divisionusing modified signed-digit symbolic substitution," Opt. Eng.28, 364-373 (1989).

16. B. K. Jenkins, P. Chavel, R. Forchheimer, A. A. Sawchuk, andT. C. Strand, "Architectural implications of a digital opticalprocessor," Appl. Opt. 23, 3465-3474 (1984).

17. Y. Li, D. H. Kim, A. Kostrzewski, and G. Eichmann, "Content-addressable memory-based optical modified signed-digit arith-metic," Opt. lett. 14, 1254-1256 (1989).

18. F. Kiamilev, S. C. Esner, R. Paturi, Y. Fainman, P. Mercier,C. C. Guest, and S. H. Lee, "Programmable optoelectronicmultiprocessors and their comparison with symbolic substitu-tion for digital optical computing," Opt. Eng. 28, 396-409(1989).

19. P. B. Berra, A. Ghafoor, M. Guizani, S. J. Marcinkowski, andP. A. Mitkas, "Optics and supercomputing," Proc. IEEE 77,1797-1815 (1989).

20. P. B. Berra, K. H. Brenner, W. T. Cathey, H. J. Caulfield, S. H.Lee, and H. Szu, "Optical database/knowledgebase machines,"Appl. Opt. 29, 195-205 (1990).

21. A. D. McAulay, Optical Computer Architectures: The Applica-

tion of Optical Concepts to Next Generation Computers (Wiley,New York, 1991).

22. K. Wagner and D. Psaltis, "Multilayer optical learningnetworks," Appl. Opt. 26, 5067-5076 (1987).

23. H. J. Caulfield, J. Kinser, and S. K. Rogers, "Optical neuralnetworks," Proc. IEEE 77, 1573-1583 (1989).

24. A. Louri and K. Hwang, "A bit-plane architecture for opticalcomputing with 2-d symbolic substitution algorithms, in Pro-ceedings of the 15th International Symposium on ComputerArchitecture (Institute of Electrical and Electronics Engineers,New York, 1988).

25. C. Warde and A. Fisher, "Spatial light modulators: applica-tions and functional capabilities," in Optical Signal Process-ing, J. Horner, ed. (Academic, New York, 1987), pp. 478-524.

26. J. A. Neff, R. A. Athale, and S. H. Lee, "Two-dimensionalspatial light modulators: a tutorial," Proc. IEEE 78, 836-855 (1990).

27. N. Streibl, K. H. Brenner, A. Huang, J. Jahns, J. Jewell, A. W.Lohmann, D. Miller, M. Murdocca, M. E. Prise, and T. Sizer,"Digital optics," Proc. IEEE 77, 1954-1970 (1989).

28. D. A. B. Miller, D. S. Chemla, D. J. Eilenberger, P. W. Smith,A. C. Gossard, and W. T. Tsang, "Large room-temperatureoptical nonlinearity in GaAs/GalixAlAs multiple quantumwell structures," Appl. Phys. Lett. 44, 821-823 (1982).

29. D. A. B. Miller, D. S. Chemla, T. C. Damen, A. C. Gossard, W.Wiegmann, T. H. Wood, and C. A. Burrus, "The quantum wellself-electro-optic effect device: optoelectronic bistability andoscillation, and self-linearized modulation," IEEE J. QuantumElectron. QE-21, 1462-1476 (1985).

30. A. L. Lentine, H. S. Hinton, D. A. B. Miller, J. E. Henry, J. E.Cunningham, and L. M. F. Chirovsky, "Symmetric self-electrooptic effect device: optical set-reset latch, differentiallogic gate, and differential modulator/detector," IEEE J.Quantum Electron. 25, 1928-1936 (1989).

31. S. D. Smith, J. G. H. Mathew, M. R. Taghizadeth, A. C. Walker,B. S. Wherret, and A. Hendry, "Room temperature, visiblewavelength optical bistability in ZnSe interference filters,"Opt. Commun. 51, 357-362 (1984).

32. J. L. Jewell, M. C. Rushford, and H. M. Gibbs, "Use of a singlenonlinear Fabry-Perot 6talon as optical logic gate," Appl.Phys. Lett. 44, 172-174 (1984).

33. G. R. Olbright, R. P. Bryan, K. Lear, T. M. Brennan, G. Poirier,Y. H. Lee, and J. L. Jewell, "Cascadable laser logic devices:Discrete integration of photoresistors will surface-emittinglaser diodes," Electron. Lett. 27, 216-218 (1991).

34. A. W. Lohmann, "What classical optics can do for the digitaloptical computer," Appl. Opt. 25, 1543-1549 (1986).

35. A. A. Sawchuk and B. K. Jenkins, "Dynamic optical intercon-nections for optical processors," in Optical Computing, J. Neff,ed., Proc. Soc. Photo-Opt. Instrum. Eng. 625, 145-153 (1986).

36. M. J. Murdocca, A. Huang, J. Jahns, and N. Streibl, "Opticaldesign of programmable logic arrays," Appl. Opt. 27, 1651-1660 (1988).

37. A. Hartmann and S. Redfield, "Design sketches for opticalcrossbar switches intended for large-scale parallel processingapplications," Opt. Eng. 28, 315-328 (1989).

38. J. Taboury, J. M. Wang, P. Chavel, F. Devos, and P. Garda,"Optical cellular logic architecture 1: Principles," Appl.Opt. 27, 1643-1650 (1988).

39. M. Prise, N. C. Craft, R. E. LaMarche, M. M. Downs, S. J.Walker, L. Asaro, and L. M. F. Chirovsky, "Module for opticallogic circuits using symmetric self-electrooptic effect devices,"Appl. Opt. 29, 2164-2170 (1990).

40. M. Prose, N. C. Craft, M. M. Downs, R. E. LaMarche, L. A.


Asaro, L. Chirovsky, and M. Murdocca, "Optical digital proces-sor using arrays of symmetric self-electrooptic effect devices,"Appl. Opt. 30, 2287-2296 (1991).

41. A. L. Lentine, D. A. Miller, J. E. Henry, J. E. Cunningham,L. M. Chirovsky, and L. A. Asaro, "Optical logic using electri-cally connected quantum well PIN diode modulators anddetectors," Appl. Opt. 29, 2153-2163 (1990).

42. B. Hill, "The current status of two-dimensional spatial lightmodulators," in Optical Computing Digital and Symbolic,R. Arrathoon, ed. (Dekker, New York, 1989), pp. 1-40.

43. R. A. Athale, "Optical matrix processors," in Optical and

Hybrid Computing, H. H. Szu, ed., Proc. Soc. Photo-Opt.Instrum. Eng. 634, 96-111 (1986).

44. A. A. Sawchuk, C. S. Raghavandra, B. K. Jenkins, and A.Varma, "Optical cross-bar networks," IEEE Computer 20(6),50-62 (1987).

45. G. Gheen, "Optical matrix-matrix multiplier," Appl. Opt. 29,886-887 (1990).

46. C. C. Foster, "Determination of priority in associativememories," IEEE Trans. Comput. C-17, 788-789 (1968).

47. M. J. Murdocca, A Digital Design Methodology for OpticalComputing (MIT, Cambridge, Mass., 1990).


optical content-addressable parallel processor ...hpcat/papers/ao_06_1992.pdf · optical...

Documents