static and run-time characteristics of ops5 production systems

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 7,64-95 ( 1989)

Static and Run-Time Characteristics of OPS5Production Systems

ANOOP GUPTA *

Department of Computer Science, Stanford University, Stanford, California 94305

AND

CHARLES L. FORGY

Department of Computer Science, Carnegie-Mellon University,Pittsburgh, Pennsylvania 15213

Received June 23, 1987

This paper presents measurements made on several large OPS5 production systems(rule-based systems). The complete set of measurements is divided into three parts.The first part consists of measurements on the textual structure of the productionsystem programs. The second part consists of measurements on the compiled formof the productions, and the third part consists of run-time measurements on the pro-duction system programs. The measurements are essential to the design of high-per-formance interpreters for production systems. The measurements are also designedto help explore the role of parallelism in execution of production systemPrOgramS. 0 1989 Academic Press, Inc.

1. INTRODUCTION

Production systems (or rule-based systems) are widely used in ArtificialIntelligence for modeling intelligent behavior and building expert systems[ 3, 14, 15, 18 1. Production-system programs, however, are computation in-

* This research was done while the author was at Carnegie-Mellon University. This researchwas sponsored by the Defense Advanced Research Projects Agency (DOD), ARPA Order No.4864, monitored by the Air Force Avionics Laboratory under Contract NOO039-85-C-O 134. Theviews and conclusions contained in this document are those of the authors and should notbe interpreted as representing the official policies, either expressed or implied, of the DefenseAdvanced Research Projects Agency or the U.S. Government,

640743-73 15/89 $3.00Copyright Q 1989 by Academic Press, Inc.All rights of reproduction in any form reserved.

CHARACTERISTICS OF OPS5 PRODUCTION SYSTEMS 65

tensive and run quite slowly. As a result, a number of researchers are cur-rently exploring new algorithms and parallel architectures for high-speed ex-ecution of production systems [ 9, 12,2 1,22,26,28].

Before proceeding with the design of a complex algorithm or an architec-ture, it is necessary to identify the characteristics of the programs or applica-tions for which the algorithm or the architecture are to be optimized. Forexample, computer architects exploit data about usage of instructions, depthof procedure call invocations, and frequency of successful branch instruc-tions from real programs to optimize the design of new machine archi-tectures [ 11, 23, 251. We believe the same is necessary for the design of ar-chitectures for production-system programs. Although several large pro-duction-system programs are now in existence, statistics characterizing theseprograms are not widely available to researchers. One reason for this is thefact that most of the large production-system programs have been developedat a few major industrial laboratories and universities (CMU, Stanford,DEC, AT&T) and thus are not available widely. Another reason is that untilrecently, the research on efficient implementations of production systemswas going on only at the same few sites, and thus a strong need was not feltfor publishing such data. With the widening interest in production systemsand their high-speed implementation, it is important that such data be widelyavailable and this paper intends to serve that purpose.

This paper presents data on the static and dynamic characteristics of pro-duction systems implemented using the OPS5 language [ 21. By static charac-teristics we refer to those features of production systems that can be mea-sured without executing the production system. The static measurementsinclude measurements on the text form of the production-system programsand measurements on the static data structures constructed by the OPS5interpreter to execute the programs. By dynamic measurements we refer tothe run-time statistics gathered on the data structures and the operationsperformed by the OPS5 interpreter.

The six production-system programs for which we present data in this pa-per are given below. They represent some of the large production-systemprograms that have been developed at Carnegie-Mellon University (often incollaboration with their industrial sponsors). Many of these systems are nowin actual use at the industrial sites that sponsored them. The systems arelisted below in order of decreasing number of productions, and this order ismaintained in all the graphs shown later.’

1 . VT [ 171 (Vertical Transport) is an expert system that selects compo-nents for a traction elevator system. It is written in OPS5 and consists of1322 rules.

’ Since many of the systems listed below are under active development, the number of ruleslisted may be smaller than the numbers in the more recent versions.

66 GUPTA AND FORGY

2. ILOG’ [ 201 is an expert system that maintains inventories and pro-duction schedules for factories. It is written in OPS5 and consists of1181 rules.

3. MUD [ 13 ] is an expert system that is used to analyze fluids used inoil drilling operations. It is written in OPS5 and consists of 872 rules.

4. DAA [ 141 (Design Automation Assistant) is an expert system thatdesigns computers from high-level specifications of the systems. It is writtenin OPS5 and consists of 445 rules.

5. Rl-SOAR [ 271 is an expert system that configures the UNIBUS forDigital Equipment Corporation’s VAX-l 1 computer systems. It is writtenin Soar and consists of 3 19 rules.

6 . EP-SOAR is an expert system that solves the Eight Puzzle. It is writ-ten in Soar and consists of 62 rules.

The above production-system programs represent a variety of applicationsand programming styles. For example, VT is a knowledge-intensive expertsystem which has been especially designed with knowledge acquisition inmind. It consists of only a small number of rule types and is significantlydifferent from the earlier systems [18, 191 developed at CMU.3 ILOG is arun-of-the-mill knowledge-intensive expert system. The MUD system, incontrast to the other five systems, is a backward-chaining production system[I] and is primarily goal driven. The DAA program represents a computa-tion-intensive task compared to the knowledge-intensive tasks performed byVT, ILOG, and MUD systems. Both RI-SOAR and EP-SOAR representprogramming styles in Soar [ 151. Rl-SOAR also represents an attempt atdoing knowledge-intensive programming in a general weak-method prob-lem-solving architecture. It can make use of the available knowledge toachieve high performance, but whenever knowledge is lacking, it has mecha-nisms so that the program can resort to more basic and knowledge-lean prob-lem-solving methods.

2. BACKGROUND

To understand the results of the measurements, it is necessary to compre-hend the computational model underlying production systems and the algo-rithms that are used to implement the required computation. In the follow-

* Referred to as PTRANS in the cited paper.3 Personal communication from John McDermott.

CHARACTERISTICS OF OPSS PRODUCTION SYSTEMS 67

ing subsections we first describe the OPS5 and Soar production-system lan-guage, and then describe the Rete algorithm that is used to implement them.

2.1. OPSS and SoarAn OPS5 [ 2, 41 production system is composed of a set of if-then rules

called productions that make up the production memory, and a database ofassertions called the working memory. The assertions in the working mem-ory are called working-memory elements. Each production consists of a con-junction of condition elements corresponding to the ifpart of the rule (alsocalled the left-hand side of the production), and a set of actions correspond-ing to the then part of the rule (also called the right-hand side of theproduction). The actions associated with a production can add, remove, ormodify working-memory elements or perform input-output. Figure 2.1shows an OPS5 production named pl , which has three condition elementsin its left-hand side and one action in its right-hand side.

The production-system interpreter is the underlying mechanism that de-termines the set of satisfied productions and controls the execution of theproduction-system program. The interpreter executes a production-systemprogram by performing the following recognize-act cycle:

l Match. In this first phase, the left-hand sides of all productions arematched against the contents of working memory. As a result a conflict setis obtained, which consists of instantiations of all satisfied productions. Aninstantiation of a production is an ordered list of working-memory elementsthat satisfies the left-hand side of the production. At any given time, the con-flict set may contain zero, one, or more instantiations of a given production.

l Conflict Resolution. In this second phase, one of the production in-stantiations in the conflict set is chosen for execution. If no productions aresatisfied, the interpreter halts.

l Act. In this third phase, the actions of the production selected inthe conflict-resolution phase are executed. These actions may change thecontents of working memory. At the end of this phase, the first phase is exe-cuted again.

The recognize-act cycle forms the basic control structure in productionsystem programs. During the match phase the knowledge of the program

(p pl (Cl %ttrl cx> ^attr2 12)cc2 "attrl 15 &attrZ <x>)

- (C3 "attrl W)+

(remove 2))

FIG. 2.1. A sample production.

68 GUPTA AND FORGY

(represented by the production rules) is tested for relevance against the exist-ing problem state (represented by the working memory). During the con-flict-resolution phase the most relevant piece of knowledge is selected fromall knowledge that is applicable (the conflict set) to the existing problemstate. During the act phase, the relevant piece of knowledge is applied to theexisting problem state, resulting in a new problem state.

A working-memory element is a parenthesized list consisting of a constantsymbol called the class or type of the element and zero or more attribute-value pairs. The attributes are symbols that are preceded by the operator A.The values are symbolic or numeric constants. For example, the followingworking-memory element has class Cl, the value 12 for attribute attrl andthe value 15 for attribute attr2.

Aattrl 12 /\attr2 15).

The condition elements in the left-hand side of a production are parenthe-sized lists similar to the working-memory elements. They may optionallybe preceded by the symbol -. Such condition elements are called negatedcondition elements. For example, the production in Fig. 2.1 contains threecondition elements, with the third one being negated. Condition elementsare interpreted as partial descriptions of working-memory elements. Whena condition element describes a working-memory element, the working-memory element is said to match the condition element. A production issaid to be satisfied when:

l For every nonnegated condition element in the left-hand side of theproduction, there exists a working-memory element that matches it.

l For every negated condition element in the left-hand side of the pro-duction, there does not exist a working-memory element that matches it.

Like a working-memory element, a condition element contains a classname and a sequence of attribute-value pairs. However, the condition ele-ment is less restricted than the working-memory element; while the working-memory element can contain only constant symbols and numbers, the con-dition element can contain variables, predicate symbols, and a variety ofother operators as well as constants. A variable is an identifier that beginswith the character “ < ” and ends with “ > .” For example, the followingcondition element contains one constant value (the value of attrl ), one vari-able value (the value of attr2), and one constant value that is modified bythe predicate symbol < > (the value of attr3).

(Cl Aattrl nil Aattr2 < x > /‘at&3 < > nil).

A working-memory element matches a condition element if the class fieldof the two match and if the value of every attribute in the condition element


matches the value of the corresponding attribute in the working-memoryelement. The rules for determining whether a working-memory elementvalue matches a condition element value are:

l If the condition element value is a constant, it matches only an identi-cal constant.

l If the condition element value is a variable, it will match any value.However, if a variable occurs more than once in a left-hand side, all occur-rences of the variable must match identical values.

l If the condition element value is preceded by a predicate symbol, theworking-memory element value must be related to the condition elementvalue in the indicated way.

Thus the working-memory element

(Cl Aattr1 12 “attr2 15)

will match the two condition elements

(Cl /\attrl 12 Aattr2 < x > )

(Cl /\attr2 > 0)

but it will not match the condition element

(Cl /‘attrl <x> *at&2 <x> ).

The right-hand side of a production consists of an unconditional sequenceof actions which can cause input-output and which are responsible forchanges to the working memory. Three kinds of actions are provided to effectworking-memory changes. Make creates a new working-memory elementand adds it to working memory. Modify changes one or more values of anexisting working-memory element. Remove deletes an element from theworking memory.

Soar [ 15 ] is a new production-system formalism developed at Carnegie-Mellon University to perform research in problem solving, expert systems,and learning. Currently, Soar is built on top of OPS5-the operators, thedomain knowledge, and the goal-state recognition mechanism are all builtas OPS5 productions. As a result most of the implementation issues, includ-ing the exploitation of parallelism, are similar in OPS5 and Soar. The maindifference, however, is that Soar does not follow the Match-Conjlict-Resolu-tion-Act cycle of OPS5 exactly. The computation cycle in Soar is dividedinto two phases: a monotonic elaboration phase and a decision phase. Duringthe elaboration phase, all directly available knowledge relevant to the current

70 GUFTA AND FORGY

problem state is brought to bear. On each cycle of the elaboration phase, allinstantiations of satisfied productions fire concurrently. This phase goes ontill quiescence, that is, till there are no more satisfied productions. Duringthe decision phase a fixed procedure is run that translates the informationobtained during the elaboration phase into a specific decision-for example,the operator to be applied next. With respect to parallelism, the relevantdifferences from OPS5 are: ( 1) there is no conflict-resolution phase; and (2)multiple productions can fire in parallel.

2.2. The Rete Match Algorithm

The most time consuming step in the execution of production systems isthe match step. It constitutes around 90% of the interpretation time. Themost common algorithm used by uniprocessor implementations of OPS5and Soar is called Rete [ 5 1. This section describes the Rete algorithm in somedetail as many of the static and run-time measurements presented later inthe paper are based on Rete.

Rete is a highly efficient algorithm for match that exploits (1) the fact thatonly a small fraction of working memory changes each cycle-by storingresults of match from previous cycles and using them in subsequent cycles;and (2) the similarity between condition elements of productions-by per-forming common tests only once. The Rete algorithm uses a special kindof a data-flow network compiled from the left-hand sides of productions toperform match. The network is generated at compile time, before the pro-duction system is actually run. To generate the network for a production,the compiler begins with the individual condition elements in the left-handside. For each condition element it chains together test nodes that check:

l If the attributes in the condition element that have a constant as theirvalue are satisfied.

l If the attributes in the condition element that are related to a constantby a predicate are satisfied.

l If two occurrences of the same variable within the condition elementare consistently bound.

Each node in the chain performs one such test. The three kinds of tests aboveare called intracondition tests, because they correspond to individual condi-tion elements. Once the algorithm has finished with the individual conditionelements, it adds nodes that check for consistency of variable bindings acrossthe multiple-condition elements in the left-hand side. These tests are calledintercondition tests, because they refer to multiple-condition elements. Fi-nally the algorithm adds a special terminal node to represent the productioncorresponding to this part of the network.

Figure 2.2 shows such a network for productions pl and p2 which appearin the top part of the figure. In this figure, lines have been drawn between


(p pl (Cl "attrl <x> ^attr2 12) (p p2 (C2 "attrl 15 *ate2 <Y>)(C2 "attrl 15 "attr2 <x>) (C4 Aattrl <y>)

- (C3 "attrl co) --,

--, (modify 1 "attrl 12))

(remove 2))

FIG. 2.2. The Rete network.

nodes to indicate the paths along which information flows. Informationflows from the top node down along these paths. The nodes with a singlepredecessor (near the top of the figure) are the ones that are concerned withindividual condition elements. The nodes with two predecessors are the onesthat check for consistency of variable bindings between condition elements.The terminal nodes are at the bottom of the figure. Note that when two left-hand sides require identical nodes, the algorithm shares part of the networkrather than building duplicate nodes.

To avoid performing the same tests repeatedly, the Rete algorithm storesthe result of the match with working memory as state within the nodes. Thisway, only changes made to the working memory by the most recent produc-tion firing have to be processed every cycle. Thus, the input to the Rete net-work consists of the changes to the working memory. These changes filterthrough the network updating the state stored within the network. The out-put of the network consists of a specification of changes to the conflict set.

The objects that are passed between nodes are called tokens, which consistof a tag and an ordered list of working-memory elements. The tag can beeither a +, indicating that something has been added to the working memory,or a -, indicating that something has been removed from it. (No special tag

72 GUPTA AND FORGY

for working-memory element modification is needed because a modify istreated as a delete followed by an add.) The list of working-memory elementsassociated with a token corresponds to a sequence of those elements that thesystem is trying to match or has already matched against a subsequence ofcondition elements in the left-hand side.

The data-flow network produced by the Rete algorithm consists of fourdifferent types of nodes.4 These are:

1. Constant-test nodes. These nodes are used to test if the attributes inthe condition element which have a constant value are satisfied. These nodesalways appear in the top part of the network. They have only one input, andas a result, they are sometimes called one-input nodes.

2. Memory nodes These nodes store the results of the match phasefrom previous cycles as state within them. The state stored in a memory nodeconsists of a list of the tokens that match a part of the left-hand side of theassociated production. For example, the right-most memory node in Fig. 2.2stores all tokens matching the second condition element of production p2.

At a more detailed level, there are two types of memory nodes-the (Y-mem nodes and the /3-mem nodes (labeled as amem-nodes and bmem-nodesin Fig. 2.2). The Lu-mem nodes store tokens that match individual conditionelements. Thus all memory nodes immediately below constant-test nodesare cy-mem nodes. The @-mem nodes store tokens that match a sequence ofcondition elements in the left-hand side of a production. Thus all memorynodes immediately below two-input nodes are ,&mem nodes.

3 . Two-input nodes. These nodes test for joint satisfaction of conditionelements in the left-hand side of a production. Both inputs of a two-inputnode come from memory nodes. When a token arrives on the left input of atwo-input node, it is compared to each token stored in the memory nodeconnected to the right input. All token pairs that have consistent variablebindings are sent to the successors of the two-input node. Similar action istaken when a token arrives on the right input of a two-input node.

There are also two types of two-input nodes-the and-nodes and thenot-nodes. While the and-nodes are responsible for the positive conditionelements and behave in the way described above, the not-nodes are responsi-ble for the negated condition elements and behave in an opposite manner.The not-nodes generate a successor token only if there are no matching to-kens in the memory node corresponding to the negated condition element.

4 Current implementations of the Rete algorithm contain some other node types that are notmentioned here. Nodes of these types do not perform any of the conceptually necessary opera-tions and are present primarily to simplify implementations. For this reason, they have beenomitted from discussion here.

CHARACTERISTICS OF OPS5 PRODUCTION SYSTEMS 7 34 . Terminal nodes. There is one such node associated with each produc-

tion in the program, as can be seen at bottom of Fig. 2.2. Whenever a tokenflows into a terminal node, the corresponding production is either insertedinto or deleted from the conflict set.

The performance of Rete-based interpreters has steadily improved overthe years. The Franz Lisp implementation of the Rete interpreter for OPS5runs at around 8 wme-changes/set (about three rule firings per second) ona VAX- 1 1 / 7 80, while a Bliss-based implementation runs at around 40 wme-changes/set. In the above two interpreters a significant loss in the speed isdue to the interpretation overhead of nodes. In the OPS83 interpreter [ 61,this overhead has been eliminated by compiling the network directly intomachine code. While it is possible to escape to the interpreter for complexoperations during match or for setting up the initial conditions for the match,the majority of the match is done without an intervening interpretation level.This has led to a large speedup and the OPS83 interpreter runs at around200 wme-changes/see on the VAX- 11/780. On the basis of the data thatwe present in this paper, it should be possible to improve the performancesignificantly, using better algorithms and using parallelism.

3. MEASUREMENTSONPRODUCTIONSYSTEMS

This section describes the characteristics of six OPS5 and Soar productionsystems that we mentioned in Section 1. The data about the six productionsystems are divided into three parts. The first part consists of measurementson the textual structure ofthese production systems. The second part consistsof information on the compiled form of the productions, and the third partconsists of run-time measurements on the production-system programs.

3.1. Surface Characteristics of Production SystemsSurface measurements refer to the textual features of production-system

programs. Examples of such features are the number of condition elementsin the left-hand sides of productions, the number of attributes per conditionelement, and the number of variables per condition element. Such featuresare useful in that they give information about the code and static data struc-tures that are generated for the programs, and they also help explain someaspects of the run-time behavior of the programs. We wish to note at thispoint that some of the features for which data are presented in the followingsubsections may not seem very relevant to the casual reader. This is some-what expected, but they are essential for the reader who actually wants to usethe data to evaluate alternative algorithms and architectures for productionsystems. Examples of such use to evaluate both sequential and parallel archi-tectures can be found in [ 8, 12, 16,2 1,241.

74 GUPTA AND FORGY

N u m b e r o f Comfldlons

FIG. 3.1. Condition elements per production.

The following subsections present the data for the measured features, in-cluding a brief description of how the measurements were made. Data aboutthe same features of different production systems are presented together andhave been normalized to permit comparison.5 Along with each data graphthe average, the standard deviation, and the coeficient of variation6 for thedata points are given.

3.1.1. Condition Elements per Production. Figure 3.1 shows the numberof condition elements per production for the six programs. The number ofcondition elements per production includes both positive and negative ele-ments. The curves for the programs are normalized by plotting percentageof productions, instead of number of productions, along the y-axis. Thenumber of condition elements in a production reflects the specificity of theproduction, that is, the set of situations in which the production is applicable.The number of condition elements in a production also impacts the com-plexity of performing match for that production and the size of the codegenerated by the compiler for that production. We note that most OPS5productions have a fairly small number (I -3) of condition elements in theleft-hand side. This is in contrast to most Soar productions, which have be-tween 4 and 10 condition elements. In fact, some Soar productions (that are

’ The limits of the axes of the graphs have been adjusted to show the main portion of the graphclearly. As a result, in some cases a few extreme points could not be put on the graph. For thisreason, the reader should not draw conclusions about the maximum values of the parametersfrom the graph.

6 Coefficient of variation = standard deviation/average.

CHARACTERISTICS OF OPS5 PRODUCTION SYSTEMS 7 5

A-. 4.60, SD 13.66. CV 2.69 for VTAvg. 3.16, SD 4.46, CV 1.42 for ILOGAvg. 3.42, SD S.TI, CV 1.69 for MUDA-. 2.42. SD 2.19, CV 0.91 for DAAAyl. 9.62, SD 16.66, CV 1.73 for RI -SOARAvg. 4.29. SD 17.17. CV 4.00 for EP-SOAR

1 2 3 4 5 6 7 0 0 10N u m b e r o f Acfions

FIG. 3.2. Actions per production.

automatically learned by the system) may have as many as 50- 100 conditionelements.

3.1.2. Actions per Production. Figure 3.2 shows the number of actions perproduction. The number of actions reflects the processing required to exe-cute the right-hand side of a production. A large number of actions per pro-duction also implies a greater potential for parallelism, because then a large

6c Arg. 0.27, SD 0.59 C V 1.85 for VTAvg. 0.35, SD 0.69. C V 1.96 for ILOGAvg. 0.33, SD 0.61. C V 1.86 for MUD

so Ave. 0.52, SD 0.63, C V 1.60 for DAAA-. 0.24, SD 0.59, CV 2.491~ Ri-SOARAvg. 0.21, SD

400.44, C V 2.12 for EP-SOAR

30

30

10

00 1 2 3 4 5

N u m b e r o f Ntgated Condltlons

FIG. 3.3. Negative condition elements per production.

7 6 GUPTA AND FORGY

number of changes to the working memory can be processed in parallel,before the next conflict-resolution phase is executed. Again, as in the case ofcondition elements, on average Soar productions have a larger number ofactions than OPS5 productions.

3.1.3. Negative Condition Elements per Production, The graph in Fig. 3.3shows the number of negated condition elements in the left-hand side of aproduction versus the percentage of productions having them. It shows thatapproximately 27% of productions have one or more negated condition ele-ments. Since negated condition elements denote universal quantificationover the working memory and often require special processing in the matchalgorithm [ 2 11, the percentage of productions having them is an importantcharacteristic of production-system programs. The measurements are alsouseful in calculating the number of not-nodes in the Rete network.

3.1.4. Attributes per Condition Element. Figure 3.4 shows the distributionfor the number of attributes per condition element. The class of a conditionelement, which is an implicit attribute, is counted explicitly in the measure-ments. The number of attributes in a condition element reflects the numberof tests that are required to detect a matching working-memory element. Thestriking peak at three for R 1 -Soar and EP-Soar programs reflects the uniformencoding of data as triplets in Soar and indicates an opportunity for fine-tuning the data structures and algorithms.

3.1.5. Testsper Two-Input Node. This feature is specific to the Rete matchalgorithm and refers to the number of variable bindings that are checked for

Avg. 2.56, SD 1.64. C V 0.64 f o r VTA-. 2.67. SD 1.86, C V 0.65 f o r ILCGAvg. 2.34. SD 1.39. C V 0.60 f o r MUDA-. 2.65, SD 1.46, C V 0.56 for DAAAvg. 3.10. SD 0.61, C V 0.20 fa RI-SOARAvg. 3.12, SD 0.56. C V 0.21 f o r EP-SOAR

Attrhutet? per Condftion E~OIIIOII~

FIG. 3.4. Attributes per condition element.

CHARACTERISTICS OF OPS5 PRODUCTION SYSTEMS

Tests per Two&put Node

FIG. 3.5. Tests per two-input node.

consistency at each two-input node (and-node or not-node). A value of zeroindicates that no variables are checked for consistent binding, while a largevalue indicates that a large number of variables are checked. For example, ifthe number of tests is zero, for every token that arrives at the input of anand-node, as many tokens as there are in the opposite memory are sent toits successors. This usually implies a large amount of work. Alternatively, if

e 500P2 VT540 $ LooMUDs D&A‘0E8Iii 30 Avg.0.72.SD1.15.CV1.5QforVTQ Avg. 2.21, SD 2.02, CVO.91 for ILOG

Avg. 0.68. SD 0.96. CV 1.41 for MUDA-. 2.29. SD 3.13, CV 1.27 for DAAAVQ. 4.77, SD 2.75, CV 0.58 for Rl-SOAR

20 Avg 5.84, 5.41,SD CV 0.93 fw EP-SOAR

Variables B o u n d a n d R e f e r e n c e d

FIG. 3.6. Variables bound and referenced.

7 8 GUPTA AND FORGY

the number of tests is large, then the number of tokens sent to the successorsis small, but doing the pairwise comparison for consistent binding now takesmore time. The graph for the number of tests per two-input node is shownin Fig. 3.5.

3.1.6. Variables Bound and Referenced. Figure 3.6 shows the number ofdistinct variables which are both bound and referenced in the left-hand sideof a production. Consistency tests are necessary only for these variables. Be-yond the cu-mem nodes, all processing done by the two-input nodes requiresaccess to the values of only these variables; values of other variables or attri-butes are not required. This implies that the tokens in the network may onlystore the values of these variables instead of storing complete copies of work-ing-memory elements. For parallel architectures that do not have sharedmemory, this can lead to significant improvements in the storage require-ments and in the communication costs associated with tokens.

3.1.7. Variables Bound but Not Referenced. Figure 3.7 shows the numberof distinct variables which are bound but not referenced in the left-hand sideof a production. (These bindings are usually used in the right-hand side of theproduction.) This indicates the number of variables for which no consistencychecks have to be performed.

3.1.8. Variable Occurrences in Left-Hand Side. Figure 3.8 shows the num-ber of times each variable occurs in the left-hand side of a production. Bothpositive and negative condition elements are considered in counting the vari-ables. Since measurements show that variables almost never occur multiple

4 50sBB + M”0$”‘0 % W-SOARE$ gw Avg. 1.93. SD 2.65. C V 1.37 for V-f

Avg. 1 .S. SD 5.30, C V 2.70 for ILOGAvg. 0.94, SD 1 .S4. CV 1 .a for MUDAvg. 2.62, SD 3.50. C V 1.24 for DAAAvg. 1.61, SD 2.00, C V 1.24 for Rl-SOAR

WA-. 1.65, SD 1.69. CV 0.96 for EP-SOAR

10

00 2 4 6 0 IO

Varlablas Bound but not Referenced

FIG. 3.7. Variables bound but not referenced.

CHARACTERISTICS OF OPS5 PRODUCTION SYSTEMS 7 9W

A VT3 LOG

7c + MU00 OAAl R I- S O A R

W m EP-SMR

W Avg. 1.32. SD 0.59. CV 0.44 for VTAbg. 1.85, SD 1 .l 1, CV 0.60 for ILOGAvg. 1.54. SD 0.74, CV 0.49 for MUD

40 Avg. 1.72, SD 1 .lO, CV 0.64 for DAAAvg. 2.36. SD 1.25. CV 0.52 ,P RI-SOARAvg. 2.49, SD 1.25. CV 0.50 for EP-SOAR

30

W

10

01 2 3 4 5 6 7 a

N u m b e r o f Occumcas

FIG . 3.8. Occurrences of each variable.

times within the same condition element (average of 1.5% over all systems),the number of occurrences of a variable represents the number of conditionelements within a production in which the variable occurs. When correlatedwith the average number of condition elements per production, this numberis indicative of the variable linkage structure for productions.

3.1.9. Variables per Condition Element. Figure 3.9 shows the number ofvariable occurrences within a condition element (not necessarily distinct,though as per Section 3.1.8 they mostly are). If this number is significant

I? wEiss m AWLu 0 LOO: + MUDP w5s

m EP-SOAR

0 w"0t0 40 Avg. 1.07, SD 1.44. CV 1.34 f o r VTe Avg. 1.97. SD 2.13. CV 1.09fa ILOG2 Avg. 1.00, SD 1.24, CV 1.24 for MUD

W Avg. 2.26, SD 2.49, CV 1 .lO f o r DAAAvg. 1.77. SD 0.56, CV 0.37 f o r M-SOARAvg. 1.86. SD 0.64. CV 0.34 f o r EP-SOAR

W

10

00 2 3 4 5 6 7 8 0 1 0

N u m b e r o f V a r i a b l e s

FIG . 3.9. Variables per condition element.

8 0 GUFTA AND FORGY

compared to the number of attributes for some class of condition elements,then it usually implies that the selectivity of those condition elements issmall, or in other words, a large number of working-memory elements willmatch those condition elements.

3.1.10. Condition Element Clusses. Tables 3.1 through 3.6 list the sevencondition element classes occurring most frequently for each of the produc-tion-system programs. The tables also list the total number of attributes, theaverage number of attributes and its standard deviation, and the averagenumber of variable occurrences in condition elements of each class. The totalnumber of attributes for a condition element class gives an estimate of thesize of the working-memory element. This information can be used to deter-mine the communication overhead in transporting working-memory ele-ments among multiple memories in a parallel architecture. It also has im-plications for space requirements for storing the working-memory elements.If we subtract the average number of variables from the average number ofattributes for a condition element class, we obtain the average number ofattributes which have a constant value for that class. This number, in turn,indicates the selectivity of condition elements of that class.

3.1.11. Action Types. Table 3.7 gives the distribution of actions in theright-hand side into classes make, remove, modify, and other for the produc-tion-system programs. The only actions that affect the working memory areof type make, remove, or modify. While each make and remove actioncauses only one change to the working memory, a modify actions causes twochanges to the working memory. These data then give an estimate of thepercentage of right-hand side actions that change the working memory.These data can also be combined with data about the number of actions inthe right-hand side of productions (given in Section 3.1.2) to determine theaverage number of changes made to working memory per production firing.

TABLE 3.1VT: C ONDITION E LEMENT C LASSES

Class name No. of CEs (W) Tot-attr Avg-attr SD-at@ Avg-vars

1. Context 1366 (31) 4 1.59 0.75 0.222 . Item 756(17) 47 2.97 0.86 1.143 . Input 448 (IO) 1 9 3.06 1.24 1.564 . Needdata 239 (5) 27 2.62 1.53 1.565 . Distance 228 (5) 1 2 5.18 1.40 1.676 . Sys-measure 175 (4) 11 4.87 1.47 1.717 . lo-stack 1 lO(2) 4 1.05 0.35 1.02

Note. Total number of condition element classes is 48.’ SD = standard deviation.

CHARACTERISTICS OF OPS5 PRODUCTION SYSTEMS

TABLE 3.2

81

ILOG: CONDITIONELEMENTCLASSES

Class name No. of CEs (W) Tot-attr Avg-attr SD-attr Avg-vars

l.Alg 1270 (27) 4 2.99 0.14 1.912. Task 1004 (21) 2 1.76 0.44 0.773. Datum 43 1 (9) 58 4.16 2.15 2.894. Period 143 (3) 13 3.81 1.18 3.415. Packed-with 106 (2) 32 4.73 2.23 3.846. Order lOl(2) 37 3.16 2.29 3.087. Capacity 91(l) 4 1 5.66 4.57 3.90

Note. Total number of condition element classes is 86.

TABLE 3.3MUD: CONDITIONELEMENTCLASSES

Class name No. of CEs (W) Tot-attr Avg-attr SD-attr Avg-vars

1. Task 678 (31) 4 2.35 0.85 0.582 . Data 547 (25) 24 2.35 1.15 1.113 . HYP 160 (7) 9 1.99 0.72 0.604 . Datafor 111 (5) 20 4.14 1.93 2.555 . Reason 74 (3) 1 3 3.12 1.58 1.246 . Change 65 (3) 6 1.40 0.87 0.887 . D o 65 (3) 2 1 5.25 1.60 2.83


TABLE 3.4DAA: CONDITIONELEMENTCLASSES

Class name

1. Context2 . Port3 . Db-operator4 . Link5 . Module6 . Lists7 . Outnode

No. of CEs (a) Tot-attr Avg-attr SD-attr Avg-vars

474 (24) 3 2.40 0.52 2.05241(13) 6 2.35 0.72 2.08197(11) 6 1.70 0.58 0.54173 (9) 6 5.28 1.53 5.55170 (9) 6 2.68 1.12 1.66134 (7) 3 1.75 0.44 2.06112 (6) 11 2.37 0.87 2.14


82 GUPTA AND FORGY

TABLE 3.5R 1 -SOAR: CONDITION ELEMENT CLASSES

Class name No. of CEs (W) Tot-attr Avg-attr SD-attr A v g - v a t s

1. Goal&x-info 988(36) 3 2.99 0.11 1.802.Opinfo 383(13) 3 2.95 0.23 1.543. State- info 375(13) 3 2.88 0.32 1.774. Space- info 217(17) 3 3.00 0.07 1.045. Order-info 183 (6) 3 2.99 0.10 1.676. Preference 157(5) 8 5.32 0.78 3.447. Module-info 87(3) 3 2.92 0.27 1.90

Nofe. Total number of condition element classes is 2 1.

TABLE 3.6EP-SOAR: C O N D I T I O N E LEMENT C L A S S E S

Class name No. of CEs (8) Tot-attr Avg-attr SD-attr Avg-vars

1. Goal-ctx-info 278(44) 3 2.99 0.10 1.832 . Binding-info 85(13) 3 3.00 0.00 1.713 . State-info 59(9) 3 2.90 0.30 1.924 . Eval-info 54(8) 3 2.96 0.19 1.835 . Op-info 41(6) 3 2.93 0.26 1.546 . Preference M(5) 8 5.47 1.12 3.227 . Space-info 30(4) 3 3.00 0.00 1.13


TABLE 3.7ACTION TYPE DISTRIBUTION

Action type V T ILOG M U D D A A RI-SOAR EP-SOAR

1. Make (W) 5 2 2 0 4 8 3 4 8 6 7 82 . Modify (%) 1 3 15 1 7 1 8 0 03 . Remove (%) 5 7 4 18 0 04 . Others (%) 2 7 5 6 2 8 2 7 12 21

3.1.12. Summary of Surface Measurements. Table 3.8 gives a summaryof the surface measurements for the production-system programs. It bringstogether the average values of the various features for all six programs. Thefeatures listed in the table are condition elements per production, actions perproduction, negated condition elements per production, attributesper condition element, variables per condition element, and tests per two-input node.


TABLE 3.8SUMMARYOFSURFACEMEASUREMENTS

Feature VT ILOG MUD DAA R 1 -SOAR EP-SOAR

1. Productions 1322 1 1 8 1 872 445 319 622 . CEs/prod 3.28 3.92 2.47 3.89 8.60 9.973 . Actnsfprod 4.80 3.16 3.42 2.42 9.62 4.294 . nCEs/prod 0.27 0.35 0.33 0.52 0.24 0.215 . Attr/CE 2.58 2.87 2.34 2.65 3.10 3.126 . Vars/CE 1.07 1.97 1.00 2.26 1.77 1.867 . Tests/Zinp 0.37 1.21 0.59 1.27 1.16 1.23

3.2. Measurements on the Rete Network

This section presents results of measurements made on the Rete networkconstructed by the OPS5 compiler. The measured features include the num-ber of nodes of each type in the network and the amount of sharing that ispresent in the network.

3.2.1. Number of Nodes in the Rete Network. Table 3.9 presents data onthe number of nodes of each type in the network for the various production-system programs. These numbers reflect the complexity of the network thatis constructed for the programs. Table 3.10 gives the normalized number ofnodes, that is, the number of nodes per production. The normalized numbersare useful for comparing the average complexity of the productions for thevarious production-system programs.’

Table 3.11 presents the number of nodes per condition element for theproduction-system programs. The average number of nodes per condition

’ All the numbers listed in Tables 3.9 and 3.10 are for the case where the network compiler isallowed to share nodes.

TABLE 3.9NUMBEROFNODES

R 1 -SOAR EP-SOARNode type VT (%) ILOG (90) MUD (%) DAA (a) (96) (%)

1. Const-test 2849 (29.8) 1884 (21.8) 1743 (34.9) 397 (14.6) 436 (10.7) 118 (10.9)2. ru-mem 1748(18.3) 1481(17.2) 878(17.6) 339(12.5) 398(9.8) 96 (8.9)3 . &mem 1116(11.7) 1363(15.8) 358(7.2) 549 (20.2) 1252 (30.7) 369 (34.1)4 . And 2205 (23.0) 2320 (26.9) 872 (17.4) 847 (31.1) 1542(37.8) 425 (39.2)5 . Not 332 (3.4) 400 (4.6) 267 (5.4) 144 (5.3) 60(1.5) 13(1.2)6 . Terminal 1322(13.8) 1181 (13.7) 872(17.5) 445(16.4) 391 (9.5) 62 (5.7)

7. Total 9572(100) 8629(100) 4990(100) 2721(100) 4079(100) 1083(100)

84 GUPTA AND FORGY

TABLE 3.10NODES PER PRODUCTION

Node type V T ILOG MUD DAA R 1 -SOAR EP-SOAR

1. Con&-test 2.15 1.59 1.99 0.89 1.11 1.902 . a-mem 1.32 1.25 1.00 0.76 1.01 1.543 . j3-mem 0.84 1.15 0.41 1.23 3.20 5.954 . And 1.66 1.96 1.00 1.89 3.94 6.855 . Not 0.25 0.33 0.30 0.32 0.15 0.206 . Terminal 1 .oo 1.00 1.00 1.00 1.00 1.00

7. Total 7.22 7.28 5.70 6.09 10.41 17.44

element over all the systems is 1.86. This number is quite small becausemany nodes are shared between condition elements. If no sharing is allowed,this number jumps up two- to threefold, as is shown in Table 3.12.

3.2.2. Network Sharing. The OPS5 network compiler exploits similarityin the condition elements of productions to share nodes in the Rete network.Such sharing is not possible in parallel implementations of production sys-tems where each production is placed on a separate processor, although somesharing is possible in parallel implementations that use a shared-memorymultiprocessor. To help estimate the extra computation required due to lossof sharing, Table 3.13 gives the ratios of the number of nodes in the unsharedRete network to the number of nodes in the shared Rete network. Note thatthe ratios do not give the extra computational requirements exactly becausethey are only a static measure-the exact numbers will depend on the dy-namic flow of information (tokens) through the network. Table 3.13 alsoshows that the sharing is large only for constant-test and a-mem nodes, andsmall for all other node types.*

* Note that the reported ratios correspond to the amount of sharing or similarity exploited bythe OPS5 network compiler, which may not be the same as the maximum exploitable similarityavailable in the production-system program.

TABLE 3. I 1NODES PER CONDITION ELEMENT (WITH SHARING)

Feature V T ILOG M U D DAA RI-SOAR EP-SOAR

1. Total CEs 4336 4629 2153 1 7 3 1 2743 6182 . Tot. nodes 9572 8629 4990 2721 4079 10833 . Nodes/GE 2.20 1.86 2.31 1.57 1.48 1.75


TABLE 3.12NODESPERCONDITION E L E M E N T (WITHOUTSHARING)

Feature V T ILOG M U D DAA R 1 -SOAR EP-SOAR

1. Total CEs 4336 4629 2153 1 7 3 1 2743 6182 . Tot. nodes 20950 19717 9953 7006 12024 25323 . Nodes/GE 4.83 4.25 4.62 4.04 4.38 4.104 . Sharing 2.19 2.28 2.00 2.57 2.95 2.34

3.3. Run-Time Characteristics of Production SystemsThis section presents data on the run-time behavior of production systems.

The measurements are useful to identify operations frequently performed bythe interpreter and to provide some rough bounds on the speedup that maybe achieved by parallel implementations. Although most of the reportedmeasurements are in terms of the Rete network, a number of general conclu-sions can be drawn from the measurements.

3.3.1. Constant-Test Nodes. Table 3.14 presents run-time statistics forconstant-test nodes. The first line of the table, labeled “visits/change,” refersto the average number of constant-test node visits (activations) per changeto working memory. The second line of the table reports the number of con-stant-test activations as a fraction of the total number of node activations.The third line of the table, labeled “success,” reports the percentage of con-stant-test node activations that have their associated test satisfied.

Although constant-test node activations constitute a large fraction (63%on average) of the total node activations, a relatively small fraction of thetotal match time is spent in processing them. This is because the processingassociated with constant-test nodes is very simple compared with that associ-ated with other nodes like cy-mem nodes, or and-nodes. For example, in theimplementation discussed in [lo], the evaluation of a constant-test nodetakes only 3 machine instructions. The evaluation of two-input nodes incomparison takes 100 or more instructions.

TABLE 3.13NETWORKSHARING(NODESWITHOUTSHARING/NODESWITHSHARING)

Node type V T ILOG M U D DAA RI-SOAR E P - S O A R

1. Const-test 3.86 4.57 3.21 7.38 10.34 6.902 . a-mem 2.35 3.04 2.05 4.57 6.85 6.403 . j3-mem 1.35 1.44 1.17 1.44 1.63 1.314 . And 1.19 1.30 1.12 1.24 1.52 1.275 . Not 1.08 1.04 1.00 1.61 1.26 1.00

8 6 GUPTA AND FORGY

TABLE 3.14CONSTANT-TEST NODES

Feature VT ILOG MUD DAA RI-SOAR EP-SOAR

1. Visits/change 107.00 23 1.20 117.79 57.02 48.79 18.932 . Percentage of total 76.9 84.6 70.2 52.0 60.0 35.03 . success (5%) 15.3 3.3 24.5 8.0 6.3 14.14 . Hash-visits/ch 22.92 24.48 41.96 7.14 5.05 3.97

The numbers on the third line show that only a small fraction (11.9% onaverage) of the constant-test node activations are successful. This suggeststhat by using indexing techniques (for example, hashing), many constant-test node activations that do not result in satisfaction of the associated testsmay be avoided. The fourth line of the table, labeled “hash-visits/ch,” givesthe approximate number of constant-test node activations per working-memory change when hashing is used to avoid evaluation of nodes whosetests are bound to fail. Calculations show that approximately 82% of the totalconstant-test node activations can be avoided by using hashing. The hashingtechnique is especially helpful for the constant-test nodes immediately belowthe root node. These nodes check for the class of the working-memory ele-ment (see Fig. 2.2), and since a working-memory element has only one class,all but one of these constant-test nodes fail their test. Calculations show thatby using hashing at the top level, the total number of constant-test nodeactivations can be reduced by about 43%.

3.3.2. Alpha-Memory Nodes. An a-mem node associated with a condi-tion element stores tokens corresponding to working-memory elements thatpartially match the condition element, that is, tokens that satisfy all intracon-dition tests for the condition element. These nodes are the first signifi-cant nodes, in terms of the processing required, that are affected when achange is made to the working memory. It is only later that changes filterthrough a-mem nodes down to and-nodes, not-nodes, ,&mem nodes, andterminal-nodes.

The first line of Table 3.15 gives the number of a-mem node activationsper change to working memory. The average number of activations for thesix programs is only 5.00. This is quite small because of the large amount ofsharing between a-mem nodes. The second line of the table gives the numberof a-mem node activations when sharing is eliminated (something that isnecessary in many parallel implementations). In this case the average num-ber of a-mem node activations goes up to 26.48, an increase by a factor of5.30. The third line of the table gives the dynamic sharing factor (line 2 /line I), which may be contrasted to the static sharing factor given in Table


TABLE 3.15ALPHA-MEMORY NODES


1. Visits/ch(sh) 5.29 6.60 10.73 3.28 2.57 1.552 . Visits/ch(nsh) 29.67 30.06 27.59 37.94 19.17 14.503 . Dyn. shar. factor 5.60 4.55 2.57 11.56 7.45 9.354 . Avg. tokens 302.76 180.44 64.91 14.91 48.50 7.155 . Max. tokens 1467 572 369 88 1 9 7 38

3.13. As can be seen from the data, the dynamic sharing factor is consistentlylarger than the observed static sharing factor.

The fourth line of Table 3.15 reports the average number of tokens presentin an cy-mem node when it is activated. This number indicates the complex-ity of the processing performed by an cw-mem node. When an cy-mem nodeis activated by an incoming token with a - tag, the node must find a corre-sponding token in its stored set of tokens and then delete that token. If alinear search is done to find the corresponding token, on average, half of thestored tokens will be looked up. Thus the complexity of deleting a tokenfrom an a-mem node is proportional to the average number of tokens. Onarrival of a token with a + tag, the a-mem node simply stores the token.This involves allocating memory and linking the token and takes a constantamount of time. If hashing is used to locate the token to be deleted, the deleteoperation can also be done in constant time. However, then we have to paythe overhead associated with maintaining a hash table. Hash tables becomemore economical as the number of tokens stored in the cy-mem increases.The numbers presented in the second line are useful for deciding when hashtables (or other indexing techniques) are appropriate.

The fifth line of Table 3.15 reports the maximum number of tokens foundin an a-mem node for the various programs.’ These numbers are useful forestimating the maximum storage requirements for individual memorynodes. The maximum storage requirements, in turn, are useful in the designof hardware associative memories to hold the tokens.

3.3.3. Beta-Memory Nodes. A @-mem node stores tokens that match asubset of condition elements in the left-hand side of a production. The datafor ,f%mem nodes, presented in Table 3.16, can be interpreted in the sameway as those for a-mem nodes. There is, however, one difference that is ofrelevance. The sharing between ,&mem nodes is much less than that between

9 It is interesting to note that the value for maximum number of tokens is the same as thevalue for maximum size ofworking memory (see Table 3.22) for VT, ILOG, and MUD systems.This implies that there is at least one condition element in each of these three systems that issatisfied by all working-memory elements,

88 GUPTA AND FORGY

TABLE 3.16BETA-MEMORY NODES


1. Visits/ch(sh) 0.53 1.57 2.62 4.12 3.89 8.472 . Visits/ch(nsh) 1.29 2.36 4.14 5.44 8.03 9.813 . Dyn. shar. factor 2.43 1.50 1.58 1.32 2.06 1.154 . Avg. tokens 3.30 3.97 73.10 28.26 7.43 4.955 . Max. tokens 48 50 168 360 85 1 8

cy-mem nodes, so that in parallel implementations the cost of processing@-mem nodes does not increase so much. When no sharing is present, theaverage number of @-mem node activations goes up from 3.53 to 5.17, anincrease by a factor of only 1.46 as compared to a factor of 5.30 for the (Y-mem nodes.

3.3.4. And-Nodes. The run-time data for and-nodes are given in Table3.17. The first line gives the number of and-node activations per change toworking memory. The average number of node activations for the six pro-grams is 27.66. The second line gives the average number of and-node activa-tions for which no tokens are found in the opposite memory nodes. For ex-ample, for the VT program, the first line in the table shows that there are25.96 and-node activations. Ofthese 25.96 activations, 24.48 have an emptyopposite memory. Since an and-node activation for which there are no to-kens in the opposite memory requires very little processing, evaluating themajority of the and-node activations is very cheap. Most of the processingeffort goes into evaluating the small fraction of activations which have non-empty opposite memories. This means that if all and-node activations areevaluated on different processors, then the majority of the processors willfinish very early compared to the remaining few. This large variation in theprocessing requirements of and-nodes reduces the effective speedup that canbe obtained by evaluating each and-node activation on a different processor.

TABLE 3.17AND NODES


1. Visits/change 25.96 26.59 25.95 39.41 24.48 23.562 . Null-mem 24.48 23.42 20.26 33.53 16.86 10.813 . Null-tests (5%) 13.2 7.8 12.8 8.2 0.3 0.04 . Tokens 17.00 4.39 24.33 27.18 4.87 7.965 . Tests 17.35 5.18 25.94 27.51 5.29 8.456 . Pairs 1.41 0.90 1.06 0.83 0.60 0.71

CHARACTERISTICS OF OPS5 PRODUCTION SYSTEMS 8 9

When a token arrives on the left input of an and-node, it must be com-pared to all tokens stored in the memory node associated with the right inputof that and-node. The comparisons may involve a test to check if the valuesof the variables bound in the two tokens are equal; a test to check if one isgreater than the other; or other, similar tests. The third line of the table givesthe percentage of two-input node activations where no equality tests are per-formed.” These numbers indicate the fraction of node activations wherehash-table-based memory nodes do not help in cutting down the tokens ex-amined in the opposite memory.

The fourth line shows the average number of tokens found in the oppositememory for an and-node activation, when the opposite memory is notempty. If tokens in memory nodes are stored as linked lists, this numberrepresents the average number of tokens against which the incoming tokenmust be matched to determine consistent pairs of tokens, The magnitude ofthis number can be used to determine if hashing ought to be used to limitthis search. The hashing function discussed in [lo] uses the values of vari-ables being tested for equality at the and-node to significantly reduce thenumber of tokens examined in the opposite memory.

The numbers in the fifth line of the table indicate the average number oftests performed by an and-node when a token arrives on its left or right inputand its opposite memory is not empty. The number of tests performed isequal to the product of the average number of tokens found in the oppositememory (given in the fourth line) and the number of consistency tests thathave to be made to check if the left and right tokens of the and-node areconsistent. Thus ifthe number of tokens that are looked up from the oppositememory is reduced by use of indexing techniques, then this number will alsogo down.

The numbers in the sixth line of the table show the average number ofconsistent token pairs found after matching the incoming token to all tokensin the opposite memory. For example, for the DAA program, on the activa-tion of an and-node, an average of 27.18 tokens are found in the oppositememory node. On average, however, only 0.83 tokens are found to be consis-tent with the incoming token. This indicates that the opposite memory con-tains a lot of information, of which only a very small portion is relevant tothe current context. The numbers in the sixth line also give a measure oftoken regeneration taking place within the network. These data may be usedto construct probabilistic models of information flow within the Retenetwork.

” For reasons too complex to explain here, separate numbers for and-node and not-nodeactivations were not available. That is, the numbers presented in line 3 are for the combinedactivations of and-nodes and not-nodes.

9 0 GLJPTA AND FORGY

TABLE 3.18NOTNODES

Feature V T ILOC M U D DAA RI-SOAR EP-SOAR

1. Visits/change 5.01 5.84 5.79 3.97 2.63 0.752 . Null-mem 3.90 4.28 3.89 2.33 1.42 0.273 . Tokens 31.39 5.99 13.94 12.51 9.87 6.434 . Tests 34.95 7.94 14.06 12.53 11.91 1.385 . Pairs 0.25 0.45 0.31 0.43 1.41 0.75

3.3.5. Not-Nodes. Not-nodes are very similar to and-nodes, and the datafor them should be interpreted in exactly the same way as those for and-nodes. The data are presented in Table 3.18.

3.3.6. Terminal Nodes. Activations of terminal nodes correspond to inser-tion of production instantiations into the conflict set and deletion of instanti-ations from the conflict set. The first line of Table 3.19 gives the number ofchanges to the conflict set for each working-memory change. The secondline gives the average number of changes made to the working memory perproduction firing, and the third line, the product of the first two lines, givesthe average number of changes made to the conflict set per production firing.The data in the third line give the number of changes that will be transmittedto a central conflict-resolution processor, in an architecture using centralizedconflict resolution. The fourth line gives the size of the conflict set whenaveraged over the complete run.

3.3.7. Summary of Run-Time Characteristics. Table 3.20 summarizesdata for the number of node activations, when a working-memory elementis inserted into or deleted from the working memory. The data show that alarge percentage (63% on average) of the activations are of constant-testnodes. Constant-test node activations, however, require very little processingcompared to those of other node types, and furthermore, a large number ofconstant-test activations can be eliminated by suitable indexing techniques

TABLE 3.19TERMINALNODES

Feature V T ILOG MUD DAA R 1 -SOAR EP-SOAR

1. Visits/change 1.79 2.06 3.69 1.65 0.55 0.742 . Changes/cycle 3.27 1.70 2.13 2.22 4.55 4.693 . Mods/cycle 5.85 3.50 7.86 3.66 2.50 3.474 . Avg con&set 3 5 1 0 36 22 1 2 1 8


TABLE 3.20SU M M A R Y O F N O D E A C T I V A T I O N S PER C H A N G E

Node type VT ILOG MUD DAA RI-SOAR EP-SOAR

1. Const-test 107.00 231.20 117.79 57.02 48.79 18.932 . ol-mem 5.29 6.60 10.73 3.28 2.57 1.553 . fi-mem 0.53 1.57 2.62 4.12 3.89 8.474 . And 25.96 26.59 25.95 39.41 24.48 23.565 . Not 5.01 5.84 5.79 3.97 2.63 0.756 . Terminal 1.79 2.06 3.69 1.65 0.55 0.74

7 . Total 145.58 273.92 166.57 109.45 82.91 54.008 . Line 7 - line 1 38.58 42.72 48.78 52.43 34.12 35.07

(see Section 3.3. I). To eliminate the effect of this large number of relativelycheap constant-test node activations, we subtracted the number of constant-test node activations from the activations of all nodes. These numbers areshown on line 8 of Table 3.20.

The first observation that can be made from the data on line 8 of Table3.20 is that, the way production-system programs are currently written,changes to working memory do not have global effects but affect only a verysmall fraction of the nodes present in the Rete network (see Table 3.9 ) . Thisalso means that the number of productions that are u&ted” is very small,as can be seen from line 1 in Table 3.2 1. Both the small number of affectednodes and the small number of affected productions limit the amount ofspeedup that can be obtained from using parallelism.

The second observation that can be made is that the total number of nodeactivations (excluding constant-test node activations) per change is quite in-dependent of the number of productions in the production-system program.

” A production is said to be affected by a change to working memory, if the working-memoryelement satisfies at least one of its condition elements.

TABLE 3.2 1NUMBEROFAFFECTEDPRODUCTIONS

Feature V T ILOG MUD DAA RI-SOAR EP-SOAR

1. p-&/change 31.22 34.19 27.01 28.54 34.57 12.072 . SD for line 1 19.55 38.53 25.39 27.77 60.16 14.693 . Changes/cycle 3.27 1.70 2.13 2.22 4.55 4.694 . paffffiring 40.14 36.49 32.05 40.04 63.04 20.455 . SD for line 4 31.59 52.70 28.69 32.55 93.67 20.12

92 GUPTA AND FORGY

This, in turn, implies that the number of productions that are affected is quiteindependent of the total number of productions present in the system, ascan be seen from Table 3.2 1. There are several implications of the aboveobservations. First, we should not expect smaller production systems (interms of number of productions) to run faster than larger ones. Second, itappears that allocating one processor to each node in the Rete network orallocating one processor to each production is not a good idea. Finally, thereis no reason to expect that larger production systems will necessarily exhibitmore speedup from parallelism.

Table 3.22 gives general information about the runs of the production-system programs from which data are presented in this section. The first twolines of the table give the average and maximum sizes of the working mem-ory. The third and the fourth lines give the average and maximum values forthe sizes of the conflict set. The fifth and the sixth lines give the average andmaximum sizes of the token memory when memory nodes may be shared.(The size of the token memory at any instant is the total number of tokensstored in all memory nodes at that instant.) The seventh and eighth lines givethe average and maximum sizes of the token memory when memory nodesmay not be shared. The last line in the table gives the total number of changesmade to the working memory in the production-system run from which thestatistics are gathered.

4. CONCLUSIONS

In this paper, we have presented measurements on the static structure andthe run-time behavior of six OPS5 and Soar production systems. Along withthe measurements, we have given interpretations for some of the data. For

TABLE 3.22GE N E R A L R U N-TI M E D A T A


1. Avg work-mem 1134 486 241 250 543 1992 . Max work-mem 1467 572 369 308 786 2583 . Avg cod-set 3 5 1 0 36 22 1 2 1 84 . Max cod-set 1 3 1 38 648 88 36 3 15 . Avg tokm(sh) 5485 3506 3176 1182 1515 5556 . Max tokm(sh) 7416 4204 4576 2624 2716 8567 . Avg tokm(nsh) 13366 5363 4717 18343 3892 25468 . Max tokm(nsh) 22640 8346 7583 23213 7402 34809 . WM changes 1767 2191 2074 3200 2220 924


example, we point out that actions of productions do not have global effectsbut affect only a small number of productions. Furthermore, the number ofproductions affected is independent of and does not increase with the totalnumber of productions present in the system. This puts certain limitationson the maximum speedup that we can expect from parallelism for produc-tion systems. Other examples of use of similar measurements can be foundin [ 8, 12, 16, 241. We would also like to add that the measurements pre-sented here for the six programs are very similar to those obtained for anotherset of programs reported in [ 7 1. Consequently, we believe that the measure-ments presented in this paper are quite widely applicable, although certainlythere may exist OPS5 programs with very different characteristics.

The reported measurements form only a subset of all useful measurementsthat can be made on production-system programs. We think, however, thatthe reported measurements are comprehensive enough to form a good start-ing point for the study of specialized algorithms and architectures for produc-tion systems. Finally, the measurements are based on (as they must be) exist-ing production systems. Some people have argued that, if we are to designhigh-speed parallel architectures for production systems, then maybe OPS5and Soar are the wrong languages to do rule-based programming in. We hopethat the measurements presented in this paper will initiate a discussion whichwill help in identifying some of the limitations of current languages and indi-cate how they may be extended to exploit the capabilities of upcoming mas-sively parallel architectures.

A CK N O W L E D G M E N T

We thank Allen Newell for careful reading of earlier drafts of this paper.

REFERENCES

1. Barr, A., and Feigenbaum, E. A. The Handbook ofArtificial Intelligence, Vol. 1. Kaufmann,1981.

2 . Brownston, L., Farrell, R., Rant, E., and Martin, N. Programming Expert Systems in OPSS:An Introduction to Rule-Based Programming. Addison-Wesley, Reading MA, 1985.

3 . Buchanan, B. G., and Feigenbaum, E. A. DENDRAL and Meta-DENDRAL: Their applica-tions dimensions. Artificial Intelligence 11, 1,2 ( 1978).

4. Forgy, C. L. OPS5 User’s Manual, Tech. Rep. CMU-CS-8 l- 135, Carnegie-Mellon Univer-sity, Pittsburgh, 198 1.

5. Forgy, C. L. Rete: A fast algorithm for the many pattern/many object pattern match prob-lem. Artificia[lnte#igence 19 (Sept. 1982).

6. Forgy, C. L. The OPS83 Report. Tech. Rep. CMU-CS-M-133, Carnegie-Mellon Univer-sity, Pittsburgh, May 1984.

94 GUPTA AND FORGY

7. Gupta, A., and Forgy, C. L. Measurements on production systems. Tech. Rep. CMU-CS-83-167, Carnegie-Mellon University, Pittsburgh, 1983.

8. Gupta, A. Implementing OPS5 production systems on DADO. Proc. IEEE InternationalConference on Parallel Processing, 1984.

9. Gupta, A., Forgy, C., Newell, A., and Wedig, R. Parallel algorithms and architectures forproduction systems. Proc. 13th International Symposium on Computer Architecture, June1986.

10. Gupta, A., Tambe, M., Kalp, D., Forgy, C., and Newell, A. Parallel Implementation ofOPS5 on the encore Multiprocessor. Internat. J. Parallel Programming 17, No. 2, (1988).

11. Hennessy, J. L., Jouppi, N., Przybylski, S., Rowen, C., and Gross,‘T. The MIPS machine.Proc. Computer Conference, Feb. 1982.

12. Hillyer, B. K., and Shaw, D. E. Execution of OPS5 production systems on a massivelyparallel machine. J. Parallel Distrib. Comput. 3 (1986), 236-268.

13. Kahn, G., and McDermott, J. The MUD system. Proc. First Conference on Artificial Intelli-gence Applications, IEEE Computer Society and AAAI, Dec. 1984.

14. Kowalski, T., and Thomas, D. The VLSI design automation assistant: Prototype system.Proc. 20th Design Automation Conference, ACM and IEEE, June 1983.

15. Laird, J. E., Newell, A., and Rosenbloom, P. S. Soar: An architecture for general intelli-gence. Artificial Intelligence 33 ( 1987)) l-64.

16. Lehr, T. F. The implementation of a production system machine. Master’s thesis, Depart-ment of Electrical and Computer Engineering, Carnegie-Mellon University, Pittsburgh,1985.

17. Marcus, S., McDermott, J., Roche, R., Thompson, T., Wang, T., and Wood, G. Designdocument for VT. Carnegie-Mellon University, Pittsburgh, 1984.

18. McDermott, J. R 1: A rule-based configurer of computer systems. Tech. Rep. CMU-CS-80-119, Carnegie-Mellon University, Pittsburgh, April 1980.

19. McDermott, J. XSEL: A computer salesperson’s assistant. In Hayes, J. E., Michie, D., andPao, Y. H. (Eds.). Machine Intelligence. Horwood, 1982.

20. McDermott, J. Extracting knowledge from expert systems. Proc. International Joint Confer-ence on Artificial Intelligence, 1983.

2 1. Miranker, D. P. Performance estimates for the DAD0 machine: A comparison of Treatand Rete. In Fifth Generation Computer Systems. ICOT, Tokyo, 1984.

22. Oflazer, K. Partitioning in parallel processing of production systems. Ph.D. thesis, Came-gie-Mellon University, Pittsburgh, 1987.

23. Patterson, D. A., and Sequin, C. H. A VLSI RISC. Computer 9 (1982).24. Quinlan, J. A comparative analysis of computer architectures for production system ma-

chines. Proc. Hawaii International Conference on System Sciences, Jan. 1986.25. Radin, G. The 801 Minicomputer. IBMJ Res. Develop. 27 (May 1983).26. Ramnarayan, R., Zimmerman, G., and Krolikoski, S. PESA-1: A parallel architecture for

OPS5 production systems. Proc. Hawaii International Conference on System Sciences, Jan.1986.

27. Rosenbloom, P. S., Laird, J. E., McDermott, J., Newell, A., and Orciuch, E. RI-Soar: Anexperiment in knowledge-intensive programming in a problem-solving architecture. Proc,IEEE Workshop on Principles of Knowledge Based Systems, 1984.

28. Tenorio, M. F. M., and Moldovan, D. I. Mapping production systems into multiprocessors.Proc. IEEE International Conference on Parallel Processing, 1985.


ANOOP GUPTA is Assistant Professor of Computer Science at Stanford University and amember of the Computer Systems Laboratory. Prior to joining Stanford he was on the researchfaculty of Carnegie-Mellon University, where he received his Ph.D. in 1986. Professor Guptahas published extensively in the area of parallel algorithms and architectures for artificial intelli-gence. His current research focuses on the design of hardware and software for scalable shared-memory multiprocessors.

CHARLES FORGY received a B.S. in mathematics from the University of Texas at Arlingtonin 1972 and a Ph.D. in computer science from Carnegie-Mellon University in 1979. He was aresearch associate and a research computer scientist at Carnegie-Mellon University from 1979to 1987. He is currently president of Production Systems Technologies, Inc. His research inter-ests lie in the area of hardware and software support for artificial intelligence.

static and run-time characteristics of ops5 production systems

Documents