dpt. con tw -z^^ rnv seriespv828kw6551/pv828... · 2015-10-21 · 2 "transparent" box,...

Stanford Untwwafty

Übrwnes

D*pt. of Sp*:-:al Cotectiooscon _AXl^- tw* -Rnv Series-Z^^ .

MIMICKING THOUGHT

Earl Hunt

This research was supported by the National ScienceFoundation, Grant No. NSF 87-1438R, to the University ofWashington, Earl Hunt, Principal Investigator.

Department of PsychologyUniversity of Washington -- Seattle

Technical Report No. 68-1-03February 20, 1968

1

2

Mimicking Thought

Earl Hunt

The University of Washington

We call ourselves homo sapiens, the wise man. While I cannot prove

that you think, or even that I think, we both do. The problem is to bring

this obvious fact into the arena of scientific study. What does it mean

to think? Is there a scientific method we can use to study this ephemeral

activity?

There are many approaches, each with its unique advantages and dis-

advantages . Introspectionists tried to observe and record their own mental

processes . Later the behaviorist tried to find laboratory paradigms which

were supposed to reveal in observable fashion the basic processes of

thought . Today the factor analyst identifies the components of thought

by studying correlations between performance on different tasks which seem

to involve thinking. The psychoanalyst tries to understand normal thought

by examination of pathological thought, inferring function from the study

of malfunction. And finally, some people try to imitate man by building

a thinking machine. This is what will occupy our discussion.

The rationale for simulation is captured by the catchy "black box"

problem. Suppose you were confronted with a firmly constructed box, which

had on it a set of dials labeled "Input" and a set of meters labeled "Out-put." When the dials are moved, the meter readings change in some complex

way. How does the box work? Or, to be a bit more general, how would you

go about finding out how the box works? This is the psychologist's problem

The human is his black box. Just relabel the dials and meters "stimulus"and "response." One approach to the black box problem is to build another

2

"transparent" box, with its own input and output dials. You will know

how this box works—after all, you built it. If it shows the same input

and output features as the black box--i.e., if the output is the same

function of the input in each case, then you have some basis for the claim

that you understand the black box.

There is an important thing that has not been said. There is no claimthat the black box and the transparent box are physically identical.

Victor Frankenstein is not the father of psychology.^ We are not trying

to build a human being. We are trying to design a device whose behavior

can be related directly to the human behavior. As we shall see, such

'devices" have been constructed, but by physical mechanisms which are

totally unlike those which must exist in man. Thus we eschew a reduc-

tionist explanation. Simulation has not been used to show how the nerves

and muscles combine to produce a sentient being. It has, and is, used to

provide an analysis of the functions involved in thinking.

Having decided to imitate man by building a thinking machine, how

do we go about the task? At this point, for the first time, the digital

computer and the computer program appear. Computer programming is one

way to build a simulation.

There are many excellent introductions to computing (26,51,54).so here I will confine myself to a minimum of detail. We think of a

computer as a device for doing arithmetic . Actually it is more general

than that . It is a device for performing operations on symbols . To

appreciate the force of this remark- -aren't reading or writing reducible

to this? Obviously. Yet they include word selection, sentence generation

and interpretation, a myriad of very complex functions .

3

We do not have to be concerned with how a computer manipulates symbols,it is sufficient to know that it will always manipulate symbols in exactly

the way we tell it to. The set of instructions we feed into a computer to

control its actions are known, collectively, as a program. By programming

a computer, then, we construct a device which manipulates symbols in a rapid4but precise manner. It is particularly important to realize that the

resulting machine behaves exactly in the way we have specified, and exer-

cises no judgment of its own. In spite of loose talk, the computer does

not "think" anything. To illustrate, if a computer is programmed to

replace the symbol "rat" in a sentence with the symbol "mouse" it will change

the sentence

"I put the rat into the apparatus

into

"I put the mouse into the Appamouseus . "This is not idiocy on the part of the machine, it is carelessness on the

part of the programmer. He should have instructed the computer to replace

the characters

(blank) r.at (blank)"

with "(blank) mouse (blank)

Once he did this, he could rewrite a hundred sentences in a few seconds.

Now what has such an unimaginative machine got to do with psychology?

I, and many other psychologists, believe that certain aspects of thought

are best expressed as operations on symbols. When we want to describe

thought, then, we try to state rules for symbol manipulations. These rules

are going to be very complex, reflecting the complexity of our topic. At

the same time, if the descriptions are to have any scientific status, they

4

must be precise. This rules out an explanation in natural language terms

because of its inherent ambiguity. The language of mathematics, to take

another language in which theories may be expressed, is unambiguous but

does not lend itself to the statement of as detailed processes as we wish to

describe. Even if you could write down the appropriate equations, could

you understand them? Can you describe a camel properly in English? Or a

sports car in Arabic?

The simulation proposal is that languages for programming computers

can and do provide a suitable vehicle both for stating and evaluating a

theory of th0ught (5 6,61,74) . We can write a program to describe the

processes we think are involved in a specific problem- solving situation.

By observing the behavior of a computer, we get an explicit evaluation of

what our theory says . We can see if the program will

instruct a computer to solve problems in the same manner as humans do.

If we are satisfied that it does, then we can say that we have built our

transparent box, and therefore understand some aspect of human thought.

This straightforward argument has its critiques. While this sort of

simulation may be possible, it has not been done. Specifically, it has been

charged that the psychologists who say that they are writing computer pro-

grams to simulate human thought have actually been quite lax in checking to

see if their programs do, indeed, solve problems like humans do (58).This is a very difficult charge to answer, since the argument really

revolves around the meaning of the word "like." (How closely do two

things have to match before we can say that they are "like each other?")Here it is best to proceed by studying examples, which we shall do shortly.

A second objection is that computer simulation of human thought is

impossible in principle, because computers manipulate symbols in ways

5

that are basically different from the human mode. Neisser (59) has pre-

sented this view very well. He points out that the languages used to

program computers reflect two things; the requirements of the tasks the

computers are to attack and the mechanical operations of the computers

themselves. In mathematics, which is the basis of most computer uses,this poses no problem. The operations which we wish the computer to do

are well known, and there is a well understood correspondence between

these basic operations and the circuitry of the computer. In other areas

of thought the relationships are not so clear-cut . It may be that in

some areas of thought the basic mechanisms involved are so difficult to ex-

press in basic computer instructions that they are, for all practical pur-

poses, ineffable. I can imagine, and indeed have written, computer pro-

grams capable of conducting a medical diagnostic interview. But a program

that would compose music to match the tempo of the dancers in Cattula

Carmina? Or even Swan Lake? It may very well be that Orf and Tschaikowsky

created their music by symbol manipulating processes which could be

duplicated on the computer, but we will never know this unless we have a

programming language in which we can express the concepts which they used.

At present we do not.

That we do not know how to write a psychological programming language

does not mean it cannot be done . Neisser implied that the goal itself is

impossible because of the limitations of computer symbol manipulation.

To the extent that human thought can be represented by processes of symbol

rearrangement, (e.g., writing down notes on a piece of paper) including

musical notes), then a computer can match it. There may be other aspects

of thought that cannot be matched. One is the emotional, "non-rational"component. How can this be represented by a program? There is an interesting

6

problem here, one which is sort of combination of physiology and

philosophy. We have excellent grounds for saying that all circuits

of nerve elements in the central nervous system are really computing

logical functions, and therefore their activity can, in principle,be mimicked by electronic circuits (7). Animals, however, also react

to humoral factors. The release of adrenalin, to name only one

compound, will alter the balance between different electrical circuits

in the brain. Humoral controls such as this are truly parallel,analog "computing systems," and there is no guarantee that they can,in principle, be simulated by a computer. There is even less

guarantee that it will be practical to do so.5

We cannot take a firm stand on the ultimate possibility or

impossibility of computer simulation. We do know that programming

is sometimes a useful tool for studying thought. At the least,computer programming provides a language for talking about cool, routine

thought. In some situations this is the primary aspect of behavior,while in others it is secondary to less easily simulated tasks .The proof of the pudding is in the eating. Let us see what sort of

simulations have been constructed, and how successful they have been.

7

Theorem Proving

Logical deduction is the basis of formal thought. Yet few people

really appreciate what a formal logical argument is. To a logician a

deduction has these components.

(a) There is an agreed-upon rule for forming sentences, or

'well formed expressions" (wfe's). Thus in algebra 2 + 7 =35 is a well

formed, although erroneous, expression, while + = 32 7 is nonsense.

(b) A set of wfe's is designated as the premises, or expressions

which are assumed to be true. In conventional algebra, which will serve

as our usual example, a + b = b + a and a + (b + c) = (a + b) + c are

premises .(c) One or more rules of inference are established. These are

rules by which new true sentences may be produced from one or more true

sentences . In algebra the rule is that if any expression fits the form of

one side of a true expression, then it can be rewritten in the form given

by the other side. By this rule, for example, (X +V) + Z may be rewritten

as Z + (x + V), using the commutivity of "+", which was given in (b) as

a true expression.

In a theorem proving problem one must show that, given a particular set

of premises and rules of inference, another specific statement, the hypothe-

sis, can be derived. To offer a very difficult example, given the rules

of Euclidean geometry, prove the Pythagorean theorem. Obviously, this is

the sort of problem mathematicians face all the time. In fact, some people

feel that theorem proving is one of the most tasking of human per-

formances . Can we design a machine to imitate, or perhaps improve, upon

the human theorem prover?

8

1

A very simple "machine" springs to mind. It could be represented by

a computer program which executed the following steps.

(l) Apply every rule of inference, in turn, to every one of the

premises . This will expand the set of true statements .(2) Examine the new set of true statements to see if the hypothesis

has been produced. If it has, the theorem is proven. If not, return to

step (l), except that now the rules of inference are applied to every well

formed expression in the expanded set of true statements .(3) Continue until either the theorem or its negation appear in the

set of true statements. At that point the theorem will be either proven

or disproven.

Newell, Shaw, and Simon (60) referred to this as the "British Museum

Algorithm," since it seemed to them as sensible as placing monkeys in front

of a typewriter and waiting until they reproduced all the books in the

British Museum. Even the speeds of modern computing machinery do not

approach the speeds required to make the British Museum Algorithm prac-

tical. We must make our question more sophisticated. Can a machine be

built which will incorporate the rules of thumb people use in finding their

way to a formal proof? Several attempts have been made, virtually all

either initiated or heavily influenced by the work of Allen Newell,

Herbert Simon, and their collaborators at the RAND Corporation and at the

Carnegie-Mellon University.

The initial program, the LOGIC THEORIST, (U?) (60) was designed to

prove theorems in elementary formal logic . The program succeeded in

proving thirty-eight of the fifty-two theorems in Chapter Two of Whitehead

and Russell's Principia Mathematica, a book which is often considered a.

9

basis for logical foundation of mathematics . All but one of the failures

were due to the slow speed of the machine available at the time. A sub-

sequent modification of the LT solved all fifty-two theorems (89) .Buoyed by this success, a more ambitious program, the GENERAL PROBLEM

SOLVER (GPS), was attempted (62,64). GPS was to solve deductive problems

in general, instead of being specialized to a particular area of mathe-

matics. LT had had built into its program special routines which were

only applicable to the operations permitted in symbolic logic . Other

similar programs had been written for other areas of mathematics, notably

plane geometry (34) and symbolic integration (87) . Like the LT, each

of these programs had area specific operations written into them. The

GPS contained within itself only those techniques of deduction which were

applicable to formal arguments in general . The program accepted a

definition of a particular area of mathematics (the "problem environment"),

and from this data found a way to solve specific problems. The GPS has

attacked problems in diverse areas, including symbolic logic, trigonometry,

symbolic integration, and logical reasoning in word problems . Examination

of the output leaves one with the impression that the program is "clever,but not deep. It can solve the "Missionaries and cannibals" puzzle, and

find symbolic integrals . While most people do have trouble with such

problems, solving them certainly is not awe inspiring. The same comment

is an accurate statement of the performance of the other programs cited.

How the GPS goes about its task is really more to the point than the

level of results which it has achieved. The basic idea of the GPS, and

of the programs related to it, is that a hard problem should be solved by

breaking it down into easier subproblems which, when solved, will combine

to provide a solution to the hard problem. The first step in successful

10

problem solving, then, is to identify the subproblems. This can be

illustrated by taking an example from the geometry problem solving program

(35). The problem is to prove that the diagonals of a parallelogram bisect

each other. This is shown diagramatically in Figure 2-1. Referring to

this picture, we see that this problem has two sub-problems, prove that

Problem: Prove that the diagonals of aparallelogram bisect each other

Figure 2-1

AE =CE and that BE = DE. Taking the first problem, AE is a side of

triangles AED and AEB. Similarly, CE is a side of triangle CED and CEB.

Since corresponding sides of congruent triangles are equal, we could

prove equality of sides by proving any one of the congruencies

> .AAED s-ACED, AAED -ACEB, i^AEB s-ACED, MEBand then proving that AE and CE were corresponding sides of the members of

the congruent pair. We can summarize the development to this point by the

graph of Figure 2-2. This shows a goal tree, an important concept in

"GPS-like" problem solving programs. Each node in the goal tree corresponds

to a problem (at the highest node) or sub-problem. The nodes below a

problem node show the sub-problems which, if solved, will constitute a

solution of the higher order problem. Note that Figure 3 actually has two

types of nodes. Some nodes are labelled by the statement of a problem.

Other nodes are labeled by the symbol "&" . Such a node means that the

11

i

subproblem requires, for its solution, that all the subproblems below

the "&" must be solved before it is, itself, solved. Thus, the problem

of Figure 2 is indicated by an "&" node, since it requires that both

AE = CE and DE = BE be shown.

Figure 2-2

sides inAED,CEDrespectively

A partial goal tree for the problem of Figure 2

In using a goal tree subproblems are generated until one is found which

can be solved. When it is solved the tree is "pruned." That is, all

problems immediately above the solved subproblem are marked solved until

an 85 node is encountered. A problem at an & node is not marked solved

until all problems below it are solved. Eventually the top node, which

represents the original problem, will be solved.

How is the goal tree to be created in the first place? Then, having

generated it, where do you begin when you have a set of subproblems, any

one of which will, if solved, solve the original problem? This last point

is particularly important, since if you could somehow always first generate

a solvable, and easily solvable subproblem, you would not even have to

12

1

identify other possible routes to solution of the main problem. (The

reader may have recognized that in the example we have been using, two

of the four pairs of triangles are not, in fact, congruent, and therefore

only two of the subproblems have solutions at all.) How subproblems should

be located and ordered for attack have been major research questions in

artificial intelligence. The method used by GPS and its "relatives"has been dubbed means-end analysis (65) . It is important both as a problem

solving technique for machines and because it may well be relevant to

human problem solving.

Formerly, means-end analysis is a technique for going from a starting

state, specified by the premises, to a goal state specified by the

hypothesis to be proven. Successive transformations must be found which

change the starting state, sn, into a first transitional state, s., then

Sp, etc. until some s identical to the goal is reached. Each movement

from state to state will be achieved by applying a rule of inference to

prove a new true statement . Means-analysis is a technique for guiding

the search for appropriate new true expressions . Given any pair of

states, there will be at least one difference between them or they will be

identical, in which case no problem exists . Suppose there is a difference

between two states . Either there is a rule of inference which reduces this

difference (an "operator" in Newell et al. 's terminology, which we shall

use in this section of the discussion) or the problem is unsolvable.

If there exists an operator, then ask if the operator can be applied to the

current state. If it can, then apply it. If it cannot, set up the sub-

goal (i.e., subproblem in the goal tree) of changing the current state to

a state such that the operator can be applied.

13

Since this is a very important point, let us consider an informal

example .A person is in his office in Seattle, and must go to an associate's

office in Los Angeles . Let his geographic position be his state, so that

"Office in Seattle" is the starting state, and "Office in Los Angeles"

is the goal state. The analysis proceeds.

1. What is the difference between the current (starting) state and

the current goal state? Large distance.

a. What reduces large distances? Airplanes.

b . Can an airplane be taken? No, you must be at the Seattle-

Tacoma airport. Make this the current subgoal.

2. What is the difference between the current state and the current

goal state? Intermediate distance.

a. What reduces intermediate distance? Automobiles.

b . Can an automobile be taken? No, you must be at the parking

lot . Make this the current goal .3. What is the difference between the current state and the current

goal state? Small distance.

a. What reduces small distance? Walking

b. Can you start walking? Yes.

4. Some "goal tree pruning" can now be done. Apply "walk,changing the current state to "parking lot." Similarly, by reasking

questions 2 and 1, and applying "car" and "airplane," we return to

question 1 again, but with the changed state "Los Angeles airport." The

analysis will now lead us to a taxi, and then to a state which, the rush

hour traffic willing, will be identical to the goal.

14

With this background on means-end analysis, we are in a position to

see how the GPS proves theorems, and does some other tasks. The program

itself is an algorithm for selecting and applying operators to reducedifferences. The person using the program specifies how states are to be

described, how differences between them are to be detected, and how

operators change states when they are applied. (To aid him in doing this

a special language for describing operators and states has been developed

(25) .) In addition, the program must have given to it an operator -differencetable. This is a table which states what operators reduce which dif-

ferences. In solving a specific problem, the program applies the difference

detection routines to find out what needs to be done to move the starting

state toward the goal state, then consults the operator difference table

to find out what operators can be used to do this. When an operator is

selected, it may, itself, require that the state to which it is applied

have certain characteristics. (Recall the "take an airplane" example.)

This can be related to the goal tree. To move from the starting state

to the goal state all differences between the two must be reduced. Thus

we have an "&" node, "remove each of these differences." For any one

difference, several "or" nodes may be generated, each one corresponding

to an operator which, if applied, will remove the difference. The GPS

must order these differences so that the easy (or solvable) ones are

tried first . An added complication is that when an operator is applied

to reduce one difference, it may introduce or affect other differences.

Therefore, the program cannot generate a goal tree, then blindly apply

operators as specified by the goal tree plan. Instead, it must apply anoperator, then re-evaluate the difference between the current and goal

15

state, to decide if it has advanced towards or retreated from a solution.

This destroys the neat formulation of GPS as a simple goal tree generator,

and imposes some technical problems in computer programming with which we

need not concern ourselves . With this background, let us follow through a

formal theorem proving example.

A greatly reduced subset of algebra has the rewriting rules

Remember that A, B, and C are free

expression may be substituted for them.

variables. Any well formed

From these rules the following differences and an operator difference

table (Table 2-1) can be defined.

Operator-Difference Table for GPS example

R.l A + B = B + A

R. 2 A+ (B+C) = (A+B)+C

R. 3 (A+B)-B = A

R. 4 A-A » 0

R. 5 A+(B-C) = (A+B)-C

Differences

applicableiperators

+ or -Symbols

No. of Vari-ables

Order ofVariables Parens . No. o:

O's

Rl x

R2 x

R3 x x x

R4 x x x

R5 X

Table 2-1

16

"

We will develop a proof of the theorem

in the manner of GPS. The starting state will be x, and the goal

state x + 0.

Step 1(a) Differences - Goal state has a +as its main connectivemore variables than the starting state, and contains a zero. Rule 3,applied right to left, effects two of these differences, while Rule 4affects all three. Rule 3, however, can be applied directly, whileRule 4 cannot. The "Goal tree" looks like this

/\

N

Apply R. 3 Apply R.4Applying R.3we have a new subproblem.

If this can be proven, the main problem will be proven.

Step 2. In the subproblem the differences are that the main

connective of the left-hand side is -, and of the right-hand side +.Also, the arrangement of parentheses is different. Rule 5 affects boththese variables. Applying it, we have

At this point the goal tree is

/ Apply R. 4(x-+ A) -A =x + 0

x=x + 0

x=x + 0

(x+A)- A = x + 0

x + (A - A) =x+o

x=x + 0

x + (A-A) = x + 0

17

Step 3 . The difference between current and goal state is now that

the expression to the left of the main connective of the current state,(A-A), should be 0. Only one operator, Rule 4, affects the presence of 0.

Further, it can be applied, with the result

completing the proof.

Hopefully, the examples have shown GPS produces proofs of theorems and

how it can be applied, more generally, to a variety of tasks which are

not theorem proving in the classical sense. The efficiency of means-end

analysis is crucial. If the person using GPS defines a "good" set of

differences and operator -difference table, then the program will be an

effective problem solver, otherwise it vrill not. An obvious question

occurs, could one write a program which would accept the definition of

rewriting rules (GPS operators), and from them develop the definition of

differences and the operator -difference table? The answer is "yes,"

such a program has been written (72) . However, it does this in a machine

oriented, rather than human oriented, manner so its implications for

psychology are doubtful. The resulting program has been shown to out-

perform most university undergraduates on mathematics theorem proving

problems. As yet, it has not been compared to a combined computer-human

problem solving team, such as GPS provides .An important point that has been learned from the study of computer

theorem proving is the importance of having an adequate representation

for one's problem. Loosely, a good representation is a way of picturing

a problem so that the solution is clear . Perhaps the best example of this

is the use of drawings to suggest the solution of geometry problems .

x + 0 = x + 0,

18

Strictly speaking, geometry is a process of manipulating symbols in

accordance to certain formal rules, as is any other theorem proving problem.

In plane geometry the rules can be interpreted as statements about rela-

tions between lines and angles in a plane. Therefore, virtually everyone

solves geometry problems by drawing a picture, and using the informationin it to suggest the steps in the formal proof.

Turn back for a moment to Figure 2-1 and 2-2. The goal tree indicatesfour possible congruences, any one of which, if proven, would prove the

main problem. A glance at the diagram shows that two of these congruencies,

in fact, do not exist, therefore there is no sense proving them. The diagram

can be used to select sensible subproblems, a fact which is used by the

Artificial Geometer (35) and, we are certain, by the high school student.

The drawing is a convenient representation for plane geometry because

it is a tool people can work with, and because there is a precise corre-

spondence between operations in the formal system and operations in the

representation. The first part of this statement tells us something about

people, not geometry. It says that, for some unknown reason, people can

manipulate drawings . It happens that computers also have a considerable

drawing-manipulation capability (although not nearly so great a one asQ

people have), so here they can use the same representation. Now suppose

that we were dealing with n-dimensional geometry. The nature of the way

in which computers handle drawings (as ensures us that the same techniques

for manipulating a plane figure will generalize to the n-dimensional case.

But people do not visualize in n-dimensions . This illustrates a point

which, it would seem, could be used as a take off point for psychological

research. What sort of representation can people use? The answer to this

question should tell us something about their thinking machinery.

19

It is clearly not the case that a good representation for humans is

always a good representation for a computer program. In fact, one of the

best methods of proving theorems on a computer (76) makes use of a

representation which may require the program to deal with literally

hundreds of thousands of subgoals at a time, trying proofs initiated from9one subgoal, than proofs initiating from another. This is clearly not

human. In fact, it is difficult for people to follow the proofs

established by this sort of computer program, let alone generate them.

This is probably because humans work well with systems which do not place

heavy demands on immediate memory, since people can keep track of only a

few things at any one time. Note that the goal tree ..method of organiza-

tion makes this the case, so the GPS type program is a reasonable candi-

date for the simulation of human behavior. On the other hand, keeping

track of many things at the same time is exactly the sort of thing at

which computers excel .Although in some sense, computers have larger working memories than

do humans, this does not imply that computer programs are about to replace

human thought . People have a compensatory ability to recall relevant

facts or even to change their representation of a problem until the

solution is obvious . This latter ability is something for which present

computer programs have no analogue. The ease with which humans manipulate

their choice of representation may, on some problems, make people markedly

superior to computer programs. This was illustrated in a study com-

paring human to machine solution of algebra word problems, such as might

appear in a junior high school text (69) " One of the problems was

20

"A board was sawed into two pieces . One piece was two thirds as

long as the whole board and was exceeded in length by the second piece

by four feet. How long was the board before it was cut?"

The mechanical way to solve this problem is to identify the variables

and the relations between them, by scanning the text for certain key

words such as "as long," "was," and "twice," then set up a system of

linear equations (or inequalities) which can be solved by standard

algebraic manipulation. With some reservations about the first step,

since handling natural languages poses some difficulty for a computer,

a program can be written to solve such word problems in this straight-

forward way (11). But if you do this, what do you get? Let xbe the

length of the first board and y the length of the second. The resulting

equations are

This has the solution

x = -8 feet, y = -4 feet,

so the original board must have been -12 feet long. Such a solution is

perfectly acceptable in this machine oriented representation. Fortunately

for the prestige of humanity, several people used a different representa-

tion. Instead of reducing the word statement to a set of equations, they

reduced it to a mental picture of a board being cut. This immediately

accentuated the paradox, and they cried "foul," as they should have.

They had used a representation which drew on knowledge from long-term

memory, not stated in the problem, to detect an analomy. Computers cannot

<io this, although they can be programmed to detect inconsistencies in the

problem statement . And finally, in this study there was the discouraging

observation that not all people spotted the analomy.

s = (2/3) (x +y)

y = x + 4.

21

The idea of using a representation is very close to the idea of

proof by analogy. This is an old idea. Nowhere, and certainly not in

mathematics, is anything proven by analogy. Analogies are used to suggest

proofs which can then be verified formally. Polya (71), himself a famous

mathematician, cites many examples of how good analogies can be crucial

in mathematical problem solving. Polya advised the embryo mathematician

to "think of a good analogy." When we advise the writer of a theorem

proving program to "Use a clever representation" we are doing very much

the same thing. When good representations are known, of course we use

them. The use of the drawing in geometry is the classic example.

Someone must develop the representation. Could this job be handled by

a program? We cannot say that it cannot, but insofar as I know, no one

has yet shown that it can be done in any practical way. Neither do we

know how people develop their analogies. The crucial role of representa-

tions thus points a finger at a gap in our knowledge of the psychology

of problem solving. If we knew more about how analogies were developed,

we might be able to say more about how to write theorem proving programs.

We will encounter such points again. One of the chief results of

writing a simulation program is to show us how little we know about human

problem solving. A second thing we learn is that humans must have certain

capabilities because the careful analysis required in writing the program

has shown us that the capacity is required to do the task at all.

Game Playing and Decision Making

There have been many attempts to program computers to play board

games, with chess as a favorite. Just why is not clear. Unlike theorem

proving, game playing is not important in itself. Board games do present

22

complex decision problems in a completely competitive situation. Many real

life problems are at least somewhat like this, although the case of pure

competition is probably rare (73) . Be that as it may, one could accept

the argument for studying board games, then ask "Why chess?" The answer

seems to be that chess is a difficult task, which some people do well, and

others do poorly. The proposal to build a chess playing machine sounds

reasonable, but it has proven a difficult task. Only very recently has

a program been produced which plays passable amateur chess (36), and master

level play is far beyond us. This is a marked contrast to the optimistic

prediction that by 1967 the world's chess champion would be a computer

(86) . in trying, though, we have learned a good deal about how board games

must be played, and by inference may have learned something about human

Play. In this section I will try to summarize our current knowledge about

game playing by machine, without going into detail about particular pro-

grams in the genesis of specific ideas about game playing. Newell, Shaw,

and Simon (63) have written an excellent specialized review of this field.

The key to game playing is looking ahead to evaluate the next move.

A perfect player would consider all legal moves, all legal replies to them,

all counters to the replies, etc. until he had played, in advance, every

Possible game. His calculation of possible games could be represented as

a tree, as shown in Figure 3.1, which shows a stylized tree for an hypotheti

cal game . At every move each player would be presented with a set of

alternatives . Accepting one of these would either end the game in a win,

!osB, or draw, or would present the opponent with a set of alternatives,

from which he could select his next move. Suppose that we have two players,

A and B, and that a win for A is scored 1, a win for B -1, and a draw 0.

23

Clearly, if either player has among

for him, he will take it. Also, we

to a loss . Therefore A will always

on his move, and B the minimum one.

We can generalize this idea to

his alternatives the choice of a win

assume that each player prefers a draw

choose the maximum valued alternative

choices which do not contain only end

points. Suppose that the choice for A consists of a loss, a draw, or

presenting a set of choices to B such that, no matter what move B makes,

A's next choice will be between alternatives resulting in a win or a loss

for A. On A's first choice, how should he evaluate the alternative of

presenting a choice to B? The answer is 1, by the following reasoning.

On A's second choice, he will maximize the value of the choices (1,-l).

Clearly, this is 1, On B's choice, he will have to choose between a set of

alternatives whose ultimate value A will dictate. Therefore, B must

choose the minimum of the set (l, 1, ...l), i.e., 1. The value to A

of presenting this choice to B is the minimum value B can extract from

it, which is 1, and is the maximum of the set (-1,0,1). A should

present B with the choice at A's first move

With this example in mind, examine the tree of Figure 3.1. The

rule for choosing at each point can be stated as follows

(1) If the node represents, a move by A, its value is the maximum

value of the nodes below it.

(2) If the node represents a move by B, its value is the minimum of

the values of the nodes below it.

(3) If the node is an endpolnt, its value is -1,0, or 1 depending on

whether it represents a win by B, a draw, or a win by A.

This strategy, known as the "minimax rule," is really the only safe

way to play a game.11 Virtually every game playing program uses it, and

24

FIGURE 3.1

Game Tree showing valves for A's move, B's reply, and A's counter.A should begin with "middle" move, since otherwise B is sure to win.

25there is good evidence that experienced players do something similar

This Is not surprising, since the minimax strategy is essentially a

warning not to count on your opponent's stupidity.

There is an important qualification to the statement that game playing

programs use the minimax rule. They do, but not in the way in which it

has been described. The minimax strategy is a prescription for playing a

perfect board game, just as the British Museum algorithm is a prescrip-

tion for perfect theorem proving. Except in the simplest cases, neither

is practical for man or computer . Some simple statistics demonstrate the

problem. There are about thirty legal moves in the average chess position

(21). To explore just five moves ahead, the, 30^ (24,300,000) positions

must be evaluated. If the end is further away, the figures are even

more astonomical. Clearly, the literal minimax procedure is not feasible*

Two techniques are used to apply minimaxing within reasonable bounds .One Is restricted look-ahead, not all possible moves are explored. The

other is heuristic evaluation. In analyzing the game tree, one evaluates

a position by determining, for certain, its relationship to end-point

positions, whose values will be assigned. In heuristic evaluation this

relationship is guessed at, by. .noting certain features of the board position

Thus in chess positions are assigned high value if they exhibit a piece

advantage for A over B, since such positions seem to be heading toward a

victory for A, even though this is not certain.

The simplest restricted look-ahead scheme is to evaluate all possible

legal moves for the next n positions, then select the best one. In chess

this produces very poor play. A program which looked ahead to all pos-

sible positions three moves hence (an average of 27,000 positions per

Play) played very badly. By contrast, it appears that expert chess players

26

consider less than one hundred different positions before making a move.

They analyze some moves in great depth, while others are abandoned very

quickly. How can this decision be made?

Variable depth search can be produced by using the concept of static

position (90). Loosely, a static position is one in which It is not

obvious what to do next . The middle of a queen exchange is a classic

example of what a static position is not. Most successful game playing

programs look ahead from one static position to the next, instead of

arbitrarily looking ahead a fixed depth. A checkers playing program

which made sophisticated use of this concept beat a checkers champion (8l) .The program always looked ahead at least k (usually 3) moves. It would

then evaluate its board positions and apply the minimax rule unless one

of the points reached was not a static position. A non-static position was

defined as one in which either (l) the next move was a jump, (2) the last

move was a jump, or (3) an exchange of men could be offered by one more

move. For non-static positions the program looked ahead one more step

further, then applied a reduced test (criteria (l) and (3) only) to

see if it had reached a static position. If further look ahead was re-

quired, the stringency of the test for static position was relaxed the

further the look ahead was carried. The result was a highly flexible

program, which exhibited considerable variation in the depth of its

searches, depending on the types of positions it encountered. This is

reasonable and quite in line with our ideas about human play.

Static evaluation alone is not enough, since it still leaves too many

positions to be evaluated. It is likely that most of the legal moves in

a board game are not even considered by human players . The question is

how does one decide to disregard a possibility?

27

s

There is a fairly direct way to cut down on the examination process,based upon the evaluation of positions already examined. The idea (knownoccasionally as "Alpha-Beta" cutoff) is that if you know that you can reacha new position with some fixed value, you should not consider in detail amove which immediately takes you to a position of lower value. Suppose,for instance, that in chess A finds that of two legal moves, the firstwill take him to a non-static position in which he places his opponent'sking in check, while the second, also non-static position exposes, hisqueen to attack. Formally, alternative 1 leads to an immediate value of,say, an estimated +.7, while alternative two leads to an estimated valueof -.5. Alternative one should be evaluated in depth, while alternativetwo might as well be dropped. Of course, there is no proof that alternativetwo is not the correct choice, it might lead to a brilliant win fourMoves hence. The point is that even at computer speeds there is onlylimited time to evaluate each position, and that analyzing ways to lose oneQueen is not usually a rewarding way to play chess

The Alpha-Beta selection procedure is a way of saying "Don't wastetime analyzing bad things." A more positive approach is plausible move

££H££ation. Game playing programs should contain subroutines whichgenerate moves intended to accomplish particular subgoals. For example,in chess there would be a subroutine for generating moves which increase

c protection of one's own king, and another subroutine for finding movesEmoting piece advantage . A master routine could then apply look aheadari(i evaluation procedures (including, if appropriate, Alpha-Beta selection)0 the moves generated by the specialized routines . Many legal moves°uld not be generated by any routine.

28

This technique of plausible move generation is very powerful, and is

relied on heavily in the better chess playing programs (10,36). Careful

analysis of the comments made by players indicates that it is also a

characteristic of human play (66). On the other hand, the identity of

plausible move generators is evidently a shifting thing, depending upon

the stage of the game. In chess, for instance, subgoals which are

appropriate in the open and middle stages of the game may be different

from those appropriate in end-game play.

So far, we have more or less assumed that the board can be evaluated.

How does a computer decide that a position is either good or bad, without

Playing the game out to the end? Two techniques have been tried, the

most powerful game players use a linear weighting scheme, in which various

attributes of the board are given a score, and the average score attributed

An example is the score, S, defined by

where B refers to piece balance, R to the relative number of pieces for

each player, P a term for pawn structure, X a term for king safety, and

c a term for center control. It seems unlikely that people compute such

aggregate position evaluations explicitly, although they may approximate

this. More generally, it is known that if people are repeatedly exposed

to situations in which the total value of a stimulus is determined by a

linear combination of the values of its components, they will come to

respond to the different cues roughly in accordance with their relative

validities (70), and expert chess players would have precisely this sort

°f experience.

An alternate evaluation technique, which was intended more explicitly

as a simulation of human behavior, is based on the idea that evaluation

S=B+R + P+K+C,

29

is done by ordered comparisons rather than by averaging evaluations (10,63)In this method the board measures are ordered in terms of their relative

value. King safety would be the first measure, since nothing outweighs

having one's king in checkmate. In comparing two moves, one first looks

at their relative value in terms of the most important measure. Only if

they are equal, is the comparison continued to the next measure. Com-

parisons are repeated, measure by measure, until an advantage for one of

the two moves is found. This move is retained for comparison against

other moves, while the less favored move is dropped. The ordered com-

parison method can be used if the categories for evaluating positions

are quite broad. A position might be simply rated as good, bad, or12indifferent in terms of king safety, instead of being scored.

The linear weighting method and the ordered comparison method really

imply different "psychological" theories about how decisions should be

made. What do humans do? The evidence is not at all clear. We noted

that in some situations people behave roughly as if they were computing

multiple correlation coefficients (70). Certainly they don't do exactly

this. A very powerful argument can be made that the paired comparison

method is to be preferred in a situation in which the cost of making the

computations required for decision making is considerable . People

often operate in such a situation, when there is no time to find anoptimal solution, but a satisfactory one will do. (82) A study of

decision making outside of game playing (16), showed that the investment

decisions of a bank trust officer could be simulated by a program which

executed a sequence of tests of the "this alternative is satisfactory on

criterion A, now try criterion B" nature. The question of what sort of

30

1

I

decision making policy people apply, or more sophisticatedly, of what pro-

cedures they apply in what situations, is an open one for psychologists.

Hopefully, studies of computerized game playing and decision making will

suggest the effect of using certain types of policies in different situa-

tions. They cannot show us what humans do.

People not only play games, they learn to play them. Can machines?

Learning, in this sense, could mean two things. A program could develop

a store of specific information about games which it had played, then use

this record to guide future play. Also, it could develop a general style

of play based on its experience. Samuel's (8l) checkers program exhibited

both types of learning, to advantage. When the program played a particular

position, it would record its analysis of the position. If it encountered

the position again in a later game, it did not need to recompute the

analysis. Instead, it would treat the move chosen as a single move, and

look ahead beyond that. Experienced human players use specific memory of

this form, as witnessed by the standard openings and stereotyped style

of master level chess play (22). The best chess program contains within it

a table of standard opening moves (36) .Learning how to play the game "in general" is a trickier question. In

the checkers program different board evaluation schemes were tried out

during the course of a series of games. Good schemes were kept, while bad

ones were thrown away. This sort of learning appeared particularly ad-

vantageous in the less stereotyped play in the middle and end of the game.

In a still more ambitious project, Newman, Uhr (67) wrote a program which

recorded the sorts of positions which had appeared in winning board games

"in general." An example would be the pattern "pieces in a line," which

31

useful in both chess and checkers. Their program was not a powerful player

of any specific game, but it did exhibit learning which could be transferred

across games . .Undoubtedly both rote learning and generalization play their part

in human game playing. The relative contribution of each probably varies

from game to game, and perhaps with level of play within a game. Chess

is a good case in point. Intelligent amateurs can play a psychologically

Interesting game (i.e., they solve difficult problems) using generalized

game playing skills . In master play, on the other hand, the ability to

recall that a position is "like" one in the chess literature, and hence

that there is a suggested line of attack, is evidently quite important

(22) . Current chess programs make little use of this sort of knowledge .The difficulty is the meaning of "like." It is not hard to program a

mechanized chess player to recognize that it has seen exactly the same

position before, but it is difficult to specify how one recognizes that the

new position is "identical to the old except..." Master chess players

evidently have some coding of chess positions which makes such information

retrieval easy.

Another major difference between present game playing programs and

human play is in how the board is searched. In general, the programs

consider a move, the opponent's replies, counters to his replies, etc., in

that order. Careful observation indicates that humans proceed differently

(21,66). After an initial analysis of a position, the apparently best move

is selected, the opponent's best reply, the best counter, etc., up to a

static position. The master player then backs up and "proves the point,"

checking to make sure he has not overlooked a possibility. The two stages

of rough analysis and proof are done as one in a game playing program.

32

1

In spite of all the public discussion, there is remarkably little

evidence about how well game playing programs perform. What there is

suggests that simple games are handled well, and complex games honorably.

Computer programs to play tic-tac-toe on a 4 x 4 x 4 board give most people a

fight. Samuel's checkers player beat a state champion, and Greenblatt's

chess program won a class D trophy in a state tournament. Inevitably,

computers have played computers . A match between a Stanford University

"team" of programs and a similar team from the Soviet Union resulted in

two wins for the U.S.S.R. and two draws. The details of the programs

involved have not been released. In the process of deciding how games

should be played, a myriad of suggestions for the psychology of human

game playing and decision making have been uncovered. By and large,

however, the suggestions have not been developed or exploited by experi-

mental psychologists .The Structure of Beliefs

Chess and mathematics require cold, steely-eyed thought. What about13"hot cognition," that peculiar bit of human thinking which combines

emotions with intellect? Surely there can be no computer counterpart for

this?

There can. Human thinking about emotionally involving issues can be

simulated by a computer program. The techniques for doing so are surprisingly

close to the techniques used for theorem proving. Two major research

efforts have been mounted, one in social and one in clinical psychology.

In each case the attempt has been not so much to simulate the actions of

specific individuals as it has been to show that slight, but specifically

defined, distortions of rigorous deduction can produce a program which will

33

display the sort of reasoning we associate with partially emotional argu-

ments.A social psychologist, R. P. Abelson, wrote a program to simulate the

process by which a person reconciles an assertion with his previous

beliefs (1,2,3,4). In the simplest case, one believes through statements

which are directly implied from known facts, if it is known that Bill is

taller than Tom, and Tom taller than John, it is certainly acceptable that

Bill is taller than John. In other cases the deductions will be more

involved, and may even involve contradictions. Can "the peace loving peoples

of the world" be at war? People do reconcile such contradictions. Abelson

wrote his program to illustrate some mechanisms which could accomplish the

resolution. The resulting program, in spite of its content, looks sur-

prisingly like a theorem prover.

The program is given an initial set of true sentences. Subsequently

it will accept those sentences which are identical to an accepted belief and

reject those sentences flatly contradicted by one. But what should be

done with indeterminate statements? Consider the sentence "The West

Watchahootchee Republican Women's Club opposed high federal taxes." Most

readers even passably familiar with American politics accept this statement.

But why? In all probability they have never heard of the West Watchahootchee

Club. The sentence is credible by deduction. It is known that (a) The

West Watchahootchee WRC is a subset of "The Republican Party," and (b) The

Republican Party opposes high federal taxes. The original sentence can

be regarded as a specific instance of the set of sentences generated by

replacing "The Republican Party" in sentence (b) with one of its subsets .Since the general statement has been accepted as true, all specific

34

I

I

instances of it are believable by implication. Note that I did not say

"true," nor does Abelson's program. In the strict sense of logic, specific

instances of a true generalization are true. Not in hot cognition.

To see this, we use another example, the sentence "Students support

football scholarships." A person would be more willing to accept this if

he knew that "The President of the Interfraternity Council supports

football scholarships" and "The homecoming queen supports football

scholarships." The generalization is supported by induction, as it

summarizes previously accepted specific statements. To make the general-

ization one may have to overlook certain facts, such as "The chess

club opposes football scholarships." How many exceptions can a rule

stand? In programming a computer to simulate human behavior there must

be some explicitly stated way of resolving the conflict. Our inability

to state the required rule has lead to research directed at^this social

psychological question (2) .Sometimes statements should be accepted in spite of an apparent

incongruity. But when do we decide things are incongruous? The assertion

"Republicans advocate increased government spending" may well be true, but

most of us will want amplification. The social psychology theory of

logical consistency is relevant. Roughly, this theory says that good

things ought to support good things, and oppose bad things, and vice

versa. "God loves dogs" is acceptable to the devout veterinarian and the

animal hating atheist. In a paper which did not arise from computer

simulation, Abelson, Rosenberg (5), called a sentence like this balanced.

By contrast, the previous statement about Republicans would be unbalanced.

To decide whether a sentence is balanced or not, one must have an affective

35

!

i.

connotation (i.e., is the thing good or bad) for every substantive term

(God, Republican, dogs, government spending) in a belief. In addition,verbs must be categorized as either "support" or "oppose" verbs. Let allpositive affect terms and all support verbs be given a "+" sign, while

all negative affect terms and oppose verbs are given a "-" sign. To detect

whether a sentence is balanced or not, multiply the signs of its terms.

The resulting product will be positive for balanced sentences, negative

for unbalanced ones. Thus, + + + = + tells us that good things support

good things, etc. Once the affective signs are established, it is trivial

to build this sort of evaluation into a computer program. How the signs are

determined is, itself, an interesting and complex psychological question,

but it is different from the question of how affect determines believability

A belief simulator must have a way to deal with an unbalanced sentence.

If "My wife removed my dessert" is an indisputable fact, a program to

simulate my behavior must be able to find an explanation. Abelson found

a way to program a limited sort of rationalization. Simple assertions

can be divided into a subject , S, and a predicate (verb + noun), P. The

predicate is itself a substantive term, since it may be the subject of

another sentence. ("Removing dessert leads to good health.") The example

suggests the program's action. If an unbalanced sentence was asserted and

found believable, the program tried to rationalize it by considering the

special verbs "Controls" and "Leads to." Any accepted unbalanced sentence

of the form A supports B could be rationalized if the program. could

locate previously accepted sentences of the following forms.

(i) A sentence of the form C controls A, where C supports (opposes)

B is credible. ("My mother-in-law controls my wife. My wife removed my

dessert.")

36

I

1

I

I

P

4.

(ii) A sentence of' the form B leads- to C and A supports

(opposes) C is credible. ("Removing my dessert improves ray health." "My

wife removed my dessert.")

Formally, these two rules are rules of inference, exactly like the

rules of inference of theorem proving. When Abelson 's program accepted a

belief, it had, in fact, proven the new assertion by deriving it from

previously accepted ones.

Abelson has reported that the most interesting result from his study

was not that so much behavior could be simulated, but that certain

features of human belief structures could not be captured by the program.

The reason why is interesting.

People evidently react to credibility and balance in sentences or

pairs of sentences if the affective components are seen as related. I can

accept "My children enjoy macaroni" without a qualm, while loving the

children and hating the macaroni. Extending this, pairs of sentences are

seen as related or unrelated depending upon the context in which they

occur. Under what circumstances does a politically conservative nature

lover relate "The Forest Service restricts farming" to "The federal

government stifles free enterprise" or to "Natural resources ought to be

protected"? A belief simulation program must have an explicit criteria

for relevance .A second problem which Abelson uncovered must be considered by anyone

who wishes to simulate "real, live thought." Abelson tried to simulate the

belief structure of a prominent American politician. Sometimes he was

able to mimic the man's reasoning, sometimes not, because of the afore-

mentioned relevance problem. A still greater stumbling block was the

problem of getting enough information into the computer to start the

37

simulation going. In talking with people, one can assume a vast amount of

knowledge about the general state of the world. In this section, for

instance, I have assumed a general knowledge of American politics in

the 19605. In a computer simulation every single belief must be made

explicit. A computer simulation program could be absolutely correct in

its mechanisms of belief manipulation, yet be unable to mimic behaviorbecause its user had left out some apparently innocuous statement which

formed a step in his subject's reasoning.

Assuming, for the minute, that the problem of providing knowledge to

the program is solved, we still must face the problem of defining relevancebetween statements. Kenneth Colby, a psychoanalyst as well as a computer

simulation advocate, has made a start in this direction in a series of

studies which are, logically, very close to Abelson's work (17,18,19).Colby tried to simulate the emission of statements by a subject in a

psychotherapeutic interview. The basic idea behind the simulation was

that belief statements are grouped into complexes, or sets of related

statements . The assumption was made that during the interview every

belief which the patient had, had to be expressed in some form. The form

of the expressed belief, however, could not be in substantive conflict

with any other belief within the same complex. If a conflict was detected

Colby's programs attempted various distortions of the belief it was trying

to express (e.g., replacing an intense verb, such as "love" or "hate"with a less intense one, such as "enjoy" or "dislike" ) in order to produce

an acceptable sentence. Colby's concept of balance was somewhat more

general than Abelson's, as different degrees of conflict were recognized

and only the higher levels of conflict were taken as an indication that a

belief had to be distorted.

38

L

Another, rather elegant, feature of Colby's simulation was its

ability to change the topic of the conversation. In altering a sentence

to reduce a conflict, it might be that one would produce a distorted

sentence which engendered more conflict than the original version.

Suppose that the original conflict sentence was "I hate Joe." Since

Joe is a man, a permissible distortion is to replace Joe with the name

of another man. This could produce "I hate father," thus intensifying

the conflict. To avoid this Colby's program monitered the conflict

produced by the distortions. If the conflict exceeded a pre-set level, the

program ceased processing its current complex and switched to another one.

In effect, it changed the topic of the conversation. This is at least

loosely similar to the observed behavior of some psychiatric patients

when the conversation touches on a particularly sensitive area.

What do we learn from Abelson and Colby's work? The concepts on

which their programs are based have all been previously discussed in

social psychology and psychiatry. Program writing forced them to make

a more precise statement of these ideas than is usually found in verbal

theories. Observing the program's results showed that a reasonably large

part of human hot cognition can be explained on the assumption that men

are "locally logical." That is, one seems to act as if he held in his

head a large number of axioms, some of which are in conflict with each

other. In many cases, however, the conflict is simply ignored, because

axioms are seen as applying only within selected contexts . What remains

to be done is to define rules for establishing the context.

39

Memory

We are so frequently vexed with our memory that we forget how well

it works. One of the best arguments against computer analogies to

human thought is that people are capable of far greater feats of

information retrieval than are computing systems (96) . Further, the

sort of retrieval people do is basically different from the sort of

retrieval one finds easy to do with computers. This was illustratedin the discussion of games. Computers can be programmed to recognize

that the current position is identical to one seen before, but it is

much more difficult to program them to recognize that a similar position

has been seen before .Nevertheless, attempts have been made to simulate limited feats of

memory. I say limited advisedly. Evidence from physiological studies

indicates that memory in animals involves more than one physical sub-

system, and that different subsystems are active at different times

(33,49,55). Simulation studies have all been concerned with memory for

information presented minutes before recall. In addition, human memory is

obviously affected by the meaning of the information to be remembered

and the context in which it is stored and recalled. These variables are

extremely difficult to manipulate, so there is relatively little reliable

experimental evidence on their effects.

In fact, most studies are of the memorization of nonsense syllables,

such as the list of meaningless trigrams DAX-GIR-TIB-XIF-GYB . This

technique, in use since the classic studies of Ebbinghaus, is a conscious

attempt to isolate meaning from the experiment. Since it cannot be

entirely successful, an elaborate literature and technology of nonsense

40

t

syllable learning has developed. It has been shown that, given proper

technique, one can obtain highly reliable and orderly data from nonsensesyllable learning studies. (Whether these results generalize to normal

adult learning is, at best, a moot point.) Computer simulations of

nonsense syllable learning have been conducted, and an impressive number

of the results reported in the experimental literature have been reproduced

By far the most important model ls the EPAM (Elementary Perceiver and

Memorizer) program introduced by Feigenbaum (27,28) and subsequently

developed by him and others (41,84) .The basic assumption of EPAM is that stimulus recognigion and

response choice involve a sequential discrimination process, in which the

stimulus is recognized as being "old," some associated response informa-

tion is retrieved, and the response information used to reconstruct the

image of the required response. This process can be represented

graphically by a tree, which Feigenbaum calls a discrimination net.

A very simple net for recognizing the trigrams DAK and GIR is shown

in Figure 5.1. Any syllable whose first letter is D will be recognized

as DAK, any syllable whose first letter is G will be recognized as GIR.

Now suppose that GYB is entered into the net. Initially this will be

FIGURE 5.1

Initial Discrimination Net

For DAX-GIR Example

(See Text)

I

41

misrecognized as GIR. The program is informed that it has made a mistake,

so the net is corrected. The correction extends the net so that GIR and

GYB can be discriminated. The elaborated tree is shown in Figure 5.2.

FIGURE 5-2

Elaborated Net after GYB has been presented.

(See Text)

This sort of tree development handles stimulus discrimination.

Response discrimination is handled in a related way. In the early

versions of EPAM (27,28) the endpoints of the tree, each of which

represented a nonsense syllable, would have associated with them infor-

mation about where in the net the response term had been stored at the time

the stimulus term was entered. Thus, in the example of Figure 5-1 the

location "immediately below and to the left of the first node" would

be indicated with DAK as the location of the associated response term.

42

As Figure 5.2 shows, this information might not be correct after the net

had been elaborated. This made EPAM susceptible to retroactive inter-

ference, the distruption of old learning by new, which, of course, is

observed in nonsense syllable learning studies . In a later version of

EPAM (84) the response mechanism was changed considerably. Instead of

having a single net for nonsense syllables, separate nets were developed

for letters, syllables, and syllable pairs. By partial examination of the

characteristics of a letter, EPAM would "guess" the letter's identity.

(To make this seem reasonable, think of the problem if the letters were

pronounced, or written in Gothic script.) In turn, by partial examination

of the assumed letters, a syllable's identity would be guessed. The assumed

stimulus syllable would then be used to trace through the "stimulus-responsepairs" net to locate a pair of nonsense syllables with the stimulus. Having

identified the pair, the program would have information identifying the

response term.

EPAM, then, learns a task when its discrimination net has been elabo-

rated to the point at which no more errors are made. The EPAM program

does provide a good simulation of many experiments in the literature.

This is particularly true if the studies are of the effects of stimulus

discrimination, rather than response construction. In EPAM this is equiva-

lent to using a discrimination tree in which the responses are so familiar

that one can assume that they are perfectly identified from the outset.

(What this means is easy to understand if one considers the appropriate

experiment. Instead of learning pairs such as DAF-GUX, one learns DAF-9,

GUX-4. Since the response terms will be highly overlearned, the theory

need consider only stimulus discrimination (4l) .

43

Now let us turn to the possibility of stages in memory. There appear

to be at least three stages for human memory, a short term, sensory phase

lasting fractions of a second (88), a more central temporary memory buffer,

in which, perhaps, items are rehearsed (9,97), and a long-term store.

Feigenbaum (29), in an unpublished address to the American Psychological

Association, made the point that this view of human memory is quite com-

patible with the general view that man should be analyzed as an information

processing system. The functional organization of modern computing systems

also depends upon a multi-stage organization of memory. The typical large

computing system consists of a central processor, which is the only element

capable of combining units of information to create new information, a set

of input-output devices (e.g., card readers, teletypes), which connect the

computing system to the outside world, and several data storage areas

(memories), each with its own unique capabilities. The input-output

devices, which, as Feigenbaum pointed out, correspond to sensory devices

in man, must be able to feed information into a buffer area, rather than

directly into the central processor. Why? Because the central processor

might be busy at the exact instant that the input device received informa-

tion. The buffer, then, serves to decouple the computing system from the

environment, letting it have some control over the order in which it will

do things. On the other hand, the buffer areas need not be large, since

they must be scanned frequently, so that the central processor will not be

too long "out of touch" with its environment.

Within the computing system there also must be hierarchies of memory.

The central processor will require a small scratchpad memory, which con-

tains the few pieces of information on which it is working at the time. A

larger, but not quite so fast memory is needed to store information which

44

is associated with the general problem on which the central processor is

working, but is not required for the immediate computation. Finally, there

must be a very large, but possibly slower access memory in which is stored

all information the system "knows," i.e., the necessary programs and data

for executing any problem which the system may receive in the future. As

the system receives information from its environment (in computer termin-

ology, as new jobs enter), the information in each of these memories will

be altered.

The argument, which has been made by Feigenbaum and others (14,15,

29,46), is that this is an appropriate analogy to human memory, since the

characteristics of a computer system are not so much forced on it by the

physical machinery as by the nature of information processing in a changing

environment. Offices are organized in the same way. The executive has

high priority information on his (literal) scratchpad, more information

of lesser priority in his desk and in his files . The secretary serves as

a buffer, and her notes must be scanned periodically to see what the next

job is, or if the current job should be interrupted for a new one of higher

priority.

Now carry the analogy further, to a mathematician working at his

office. He must very briefly remember his intermediate results, while

storing only final products. To get any results at all, he must have

brought into a reasonably rapid access memory a variety of once-learned

techniques, such as the expression for common integrals. Suddenly the

phone rings. If the mathematician is in the midst of a computation, he

may not interrupt this immediately, but this is all right so long as he

gets back to his sensory buffer, to read the signal "phone has rung," in

45

>

time to answer it. Suppose that it is his wife, reminding him of a dinnerengagement. Techniques for integration must now be returned to his large,

slow access memory, to be replaced by a mental street map and the stated

motor vehicle code. As he drives home, the man must keep continual tract

of the "just noted" location of cars about him, but there is no need to

transfer this information to long-term store. The hierarchial memory

organization is imposed by the task, not by the characteristics of the

system components . The same demands apply to the structure of computing

systems, the organization of an office, and the functioning of human memory

While the analogy is compelling, there have been relatively few

attempts to go from it to a precise model of hierarchial memory. The

general spirit of the analogy is apparent, however, in analyses of the use

of short-term memory in problem solving.

A program based on the idea of hierarchial memory was written to

simulate how a person might keep track of the current state of several

independently changing variables (45) . In the experimental situation the

subject received messages about the current state of variables, and was

aperiodically asked to give the state of one of them. A typical message-

question sequence would be

THE DIRECTION IS WEST

THE COLOR IS RED

THE DIRECTION IS EAST

THE SIZE IS SMALL

WHAT IS THE COLOR?

Experimental studies have shown that performance in this task is stable

and is controlled by well defined experimental variables, such as the

number of states a given attribute (color, size, etc.) can have, and

a

5i»ji

46

illI Himi

ft!

Iii!j

whether or not the different attributes have the same or different states.

(99,100). In the simulation it was assumed that the subject had an immediate

memory consisting of a limited number of "slots," each of which could

store part of the information content of a message. As this short-termmemory became overloaded, some of the information content of a messagemight drop out. When a question was asked, the program first tried to

find a complete and relevant message in short-term memory. If this

failed, the program examined short-term memory to see if some incompletely

stored message could provide a clue for an educated guess. Falling also

in this, it guessed randomly from its long-term memory of possible states.The simulation produced a reasonable fit to experimental observations,though it failed to mimic behavior in a few experimental conditions.Subsequently, and independently, a mathematical analysis of how people might

use short-term memory in related keeping track tasks has demonstrated the

efficacy of the general approach, although neither the models or experi-

mental conditions are identical in detail (9).

Studies such as this are studies of pure recall of information

Normally, memory is used to aid in problem solving, not recall. The inter-

play between memory and problem solving was nicely illustrated in a study

by Simon and Kotovsky (85) of the letter series completion task, a common

intelligence test item. Given a sequence of letters, the subject must

produce the next one in the sequence. A trivial example is AABBC " Simon

and Kotovsky wrote a program to solve such tasks. It had a "working memory"

capable of detecting short cycles, or units of the letter sequence. In

the example given, there is a cycle length of two...AA, 88, etc. The pro-

gram also had a back-up store, analagous to long-term memory, which con-

tained rules for detecting relations between two letters. These relations

47

■mk.

,d.ii!di, ■!t *

liiIP

11

could be complicated and lengthy—for example, the successors to each14letter in the forward and backwards alphabet. The program selected a

trial cycle length, then attempted to apply one of its function rules to

generate the form of the first trial unit, given the first letter, and at

least the first member of a succeeding trial unit. For example, in the

sequence

ABCZYXDEFWVU

the rule is

1. cycle length 3

2. Odd cycles, the next three letters, in order, from the forward

alphabet .3. Even cycles, the next three letters, in order, from the

backward alphabet .To apply this rule one must hold in working memory the type of the

current cycle (odd or even), the letter number in the current cycle (1,2,

or 3), and the name of the last letter in the cycle just before this one.

The Simon and Kotovsky program, which does provide a reasonable fit to

observed data on the difficulty of letter sequence problems, suggests that

one of the determiners of a problem's difficulty is the amount of working

memory required. This is true even if the problem is not formally a memory

problem, (as letter sequence problems are not), in the sense that nowhere

is information presented to the subject, then withheld for later recall.

Even with all the information ostensibly in front of him, a man is limited

15by the amount he can "hold in his head" at one time.

48

Other programs have used similar mechanisms to imitate behavior in

problem solving situations in which information does have to be held in

memory. Feldman (30,31,32) considered the behavior of a subject who must

predict which of two events will occur. If this task is carried out over

a long period of time the gross statistics make it appear as if the subject

predicts randomly, but alters his probability of predicting a given event to

approximate its observed frequency. If you ask the subjects what they are

doing, they do not say they are behaving randomly at all. Instead they will

report that they have found (non-existent) rules which enable them to make

a deterministic prediction. Since the deterministic rules are bound to

fail if the event occurrence is, in fact, determined randomly, the sub-

jects have to keep changing them, and this is what makes the subject look

like a random number generator. The various models which Feldman con-

structed closely resemble the logic of the Simon-Kotovsky program, in that

they picture a subject trying to detect an orderly sequence in a series of

remembered events. The sort of sequence which can be detected will, of

course, be determined by the length of the series which can be remembered.

Gregg (37) analyzed a switch setting task in a similar manner. Sub-

jects were required to find a sequence of switch settings which would keep

a light on. The identity of the next correct setting was a function of

the current setting. The subjects' behavior could be analyzed using the

same concepts Simon and Kotovsky used for the letter sequence task.

In summary, analogies to computing systems provide a convenient way

to think about memory. In many simple situations, however, one can use

conventional mathematical analysis instead of computer simulation to

examine the implications of the analogy. In studying more complex situa-

tions, such as the use of memory in problem solving, computer simulation

49

has been of considerable help. The chief point learned from such studiesis that man must find a problem solving method which does not place great

strain on short-term memory. Memory is often a bottleneck in human

information processing.

Inductive Inference

We began by discussing deduction. We will close by consideringinduction, the problem of finding an underlying rule which can account for

many observations.

Suppose you are shown objects, and told that they are divided into oneor more classes. (Think of cats, dogs, and rabbits.) You are shown

examples of each class, then asked to state the underlying classifica-tion rule. How do you find it? This task has been studied intensively

both by psychologists and the builders of artificial intelligence systems .Two variants of the classification task have been studied, under the

terms pattern recognition and concept learning. In pattern recognition

studies the classifications of interest are more or less what the

psychologist would call immediate, sensory or perceptual classifications,such as the distinction between pictures of men as opposed to those of

women. Concept learning studies are usually concerned with a moreabstract, conceptual task. The "real world" analog is medical diagnosis,instead of the classification of visual stimuli. As always when we deal

with distinctions between perception and cognition, the exact difference is

hard to state. In fact, the two tasks can be described in identical,although abstract, terms using the language of symbolic logic. Neverthe-

less, it is reasonable to believe that people do these formally identical

50

tasks in different ways. To see why, let us consider the construction of

machines that would classify visual patterns or diagnose medical cases.

Beginning with the visual problem, we see immediately that we must

have some formal representation to map the information presented from eye

to brain into some machine representable form. When a visual stimulus is

presented to the eye, a certain amount of light falls upon a photochemical

receptor, initiating a complex sequence of chemical and neural events.

Eventually the information in the visual stimulus will be transferred to

the brain as a pattern of firing in nerve cells. Now imagine a homonculoid

message receiver standing somewhere along the optic tract. He would not

have access to information about the image at the eye, instead he would

know only that at a particular moment in time certain nerve cells were or

were not firing. Now let us suppose that our homonculus is a perfect

processor of information. By definition, he will do at least as well as

the brain. How is his performance limited?

Any information which can be extracted from the nerve cell firing must

be extracted from a sequence of zeroes and ones, since such a sequence is

sufficient to tell us which nerves are firing at a .given moment. Sup-

pose that we are told that the sequences observed at certain times were

produced by objects in class A ("Cats"), while sequences observed at other

times were produced by examples of other classes (Dogs, rabbits, etc.).

The classification problem becomes one of finding a rule for classifying

strings of binary numbers (zeroes and ones.) This is a manageable, although

difficult, computing problem, for which several techniques are known ( 868,78).

51

v

**»

Now let us return to the medical diagnosis problem. A patient can be

described by stating the result of all the tests a physician could make(including the psuedo-result, "Test not made.") Any one test could be

described by a sequence of zeroes and ones which uniquely specified its

outcome, so obviously any patient could be described by a still longer

sequence of zeroes and ones . Some of the sequences would correspond topatients with one type of disease, some to patients suffering from anotherWe have returned to the problem of classifying sequences of binary

numbers . Yet, somehow, this problem is different . While it is reasonable

to think of the nervous system as imposing little coding on its inputpatterns, somehow we think that the medical problem, and problems like it,will be done in a different way. There is no formal difference, but thereis a psychological difference . . . .perhaps . Studies of pattern recognition

and concept learning have reflected this assumption.

Psychological studies of "concept learning" practically all fit thefollowing paradigm. Stimuli are defined by their values on clearly

identified attributes, e.g., color=red, green, blue, shape = square,triangle, circle. The attributes and values are specified to the subject

in advance. In the experiment proper the subject is shown a sample of the

stimuli, and told the class membership of each item in the sample. In

most studies this is a trial by trial procedure, in- which the subject

guesses the class membership of an object, and then is told the correct

answer. The experiment is continued until the subject demonstrates, either

by performance or verbal statement, that he understands the classification

rule. A great many such experiments have been carried out, and a fairly

clear picture of human conceptual performance on this limited task has

52

emerged (13, 44). If the classification rule itself is simple, (e.g., All

red objects belong to class 1, all others to class 2), it is reasonably

accurate to say that the subject chooses hypotheses at random until he

hits upon the correct one (13) . This conclusion must be qualified some-

what, since a computer simulation of this extremely simple task indicates

that when a person is shown that a particular hypothesis is wrong, he

probably will not make that guess again, immediately, and in addition

is unlikely to choose an hypothesis which will lead him to repeat the

classification error which he has just made (38, 43).If the correct answer is more complex, it takes a better organized

computer program to simulate his behavior. Suppose that the correct answer

is a disjunction, "Class 1 objects are either red or have triangle shapes."This is a much more difficult problem. Computer simulations of problem

solving at this level of complexity have been developed (48,50)and shown to 'imitate the superficial aspects of a human solution.What sort of picture of a problem solver do they give?

Consider a person attempting a complex concept learning task, such

as the one just given. He might note that most, but not all, of the

class 1 objects were red. A quick check would show that no Class 2 objects

were red. This suggests the first approximation, and the one taken by the

simulation programs, that "All red objects are in Class 1." Attention is

now turned to the remaining objects . It will be found that in this set,

all the remaining Class 1 objects are triangle shaped. Combining the two

rules, the program will obtain the general rule "If it is red, it is

Class 1. Otherwise, if it is triangular, . then it is Class 1, otherwise

it is Class 2." This rule can be thought of as a sequential decision pro-

cedure, and represented by the decision tree shown in Figure 6-1.

53

Object isin Class 1

Object is inClass 1

Object is inClass 2

A Decision Tree in Concept Learning

Very briefly, the concept learning programs do two things. First, they

check for a simpler rule which will correctly classify all objects observed

so far. If such a rule exists, the problem is solved. If not, the

original problem is split into two subproblems, and the subproblems solved

by the same method. This could generate more subproblems. The resulting

problem and subproblem organization is presented graphically in the tree

structure of Figure 6-1.

If people were doing something like this, you would expect the com-

plexity of the tree diagram corresponding to the correct rule to be a good

predictor of task difficulty. In fact, it is (34). Another point which we

might think would be crucial would be the way in which subproblems are

FIGURE 6-1

Is color = Red?

Is shape = Triangle?

54

|

I

i

j

iI.1

defined. Somewhat surprisingly, extensive studies of the effects of

different ways of defining subproblems has shown that this is not the case

If you compare the performance of programs, which define subproblems in

different, though sensible, ways you find that there is little differencein their behavior. Since the programs do not vary among themselves, it is

hard to point out one of them as being the correct simulation of human

behavior (47,48) .An interesting, unanswered question about human concept learning has

to do with the role of memory. It can be shown that if concept learning

proceeds on the trial at a time basis, memory for specific trials plays

an important part in determining what the subjects' hypotheses will be

(43,75)- Similarly, the performance of different concept learning programs

will be affected by the way in which they store information about previous

trials (23,14-7). As yet, no detailed attempts have been made to develop

this factor in simulations of human memory.

Now let us look at the other side of the coin, visual pattern

recognition. Two distinct lines of research have emerged.

Computers have been used to explore the implications of physiological

theories of learning. Historically much of this work can be traced to

Hebb's (40) proposal for a neurophysiological theory. One of Hebb's central

notions was the idea that if neuron A fired immediately after receiving

input from neuron B, then on subsequent occasions the probability of

neuron A's firing after receiving input from B would be increased. By

elaborating this idea to sets of neurons, Eebb made the plausible verbal

argument that stimuli originally incapable of causing a nerve net to

respond would eventually acquire this capacity, by being paired with stimuli

55which had the capability of causing a response from the start. The

parallel to classical conditioning is obvious . But would this work? We

know now that it would not. In an early computer simulation study (77)a program to simulate Hebb type nerve net reorganization was constructed.

Experiments with it showed that the system was incapable of acquiring

differential responses to stimuli unless inhibitory neurons were included17in the network. This study illustrates an important but sometimes over-

looked use of computer simulations, they can show that apparently plausible

theories will not work.

A more distant descendant of Hebb ' s theorizing is the Perceptron model

of brain behavior proposed by Rosenblatt (79,80) . The perceptron is a

design for a machine, built from abstract, nerve-like elements, which is

capable of learning to discriminate stimuli. There are a variety of

perceptrons, each of which have somewhat different powers. The earlier writ

ing on this topic was somewhat confused, more recently perceptrons have

been related to a precisely defined class of mathematically describablepattern recognizers ( 68) . We shall describe simple perceptrons briefly, and

make a few remarks concerning their implications for psychology and arti-

ficial intelligence.

The perceptron is, in the abstract, a scheme for learning classifica-

tions of vectors of binary digits. . .like our "ultimate neural representa-

tion" of a stimulus. The binary digits representing the stimulus are

referred to as S (stimulus) units, as each digit is thought of as indicating

the firing state of a neuron. Each S unit is connected to one or more A

units in a random, fixed arrangement. The A units are thought of as associa-

tion neurons, by analogy they represent those neurons which are involved in

learning. The A units, in turn, are connected in a random but variable,

manner to a set of R, or response, units. The over-all arrangement is

schematized in Figure 6-2.

56

iij

i

j!"!

!1

Aji

M

ii

j

FIGURE 6-2

Random, variableConnections

Random, FixedConnections

Schema of a Simple Perception

The connections between S and A units, and between A and R units,

each have weights associated with them. These may be either positive or

negative. Biologically, the weights can be interpreted as the strength

of synaptic connections between two neurons. As shown in the diagram,

the S-A connections are chosen at random, but once chosen, are fixed, while

the A-R connections are initially random but may be modified by experience.

Thus the input to an individual unit in the A or R areas will be the

algebraic sum of the connections between it and the active units in the

S or A areas, (This allows for the case of active units which are not

connected to the receiving unit, since they can be thought of as having

a connection with weight zero.) If the input to a unit in the Aor R region

exceeds some threshold, 9, then it will fire. Otherwise it does not.

57J.i

frn

ii

!'■

M

)

This can be summarized in a few equations. Let S = s ,s . . . .s . .s be1' 2 i nSthe vector of stimulus units, A= a . a . .a be the association units, andJ AR = rl'"rk ,,rn be the response units. Each of these will have the value 1Rif a unit fires, and a value of oif it does not. We also need two matrices,or tables, of connection values. C - {c J will be the set of connections

between the ith stimulus unit, b± . and the jth association unit, a . Simi-larly, let D = [d^] be the set of connection weights between a and r . AtJ ka single stimulus presentation the following sequence of steps takes place:

(a) The stimulus is presented, establishing values for S

(b) The set of numbers X = x , j = 1... nis computed, by the rule n s

If x 9, then a = 1, otherwise a = 0.

equation( ° "" "* °f ""**" * " *r ' 'V " "a, is then «**- T th.

nay* -j^j'V

If yk >0, rk =1. (This can be thought of as the response rR having occurredOtherwise r, =0.k

To make a perceptron a pattern recognition machine we need one morething, a training rule. Suppose that we have decided, in advance, that for cer-tain arrangements of the stimulus vector (i.e., certain patterns) we want r,kto be 1, and for other arrangements we want it to be zero. In mathematical

terms, we want to compute a function on the vector S which will have as itsvalue a specific arrangement of the vector R, depending on the value of S. For

any such function (i.e., for any classification of the possible S vectors) there

exists a perceptron which will compute it (20). Finding it is another matter,since there is no assurance that a particular perceptron, chosen at random from

58■i -■r

doI

il '■t

i

■\

all the perceptrons which could be created by the random S-A and A-R connec-

tions, will compute the function. If we are free to rearrange the A-R connections within a specific perceptron, however, we may be able to "train" it to

the desired computation. Extensive experiments have been conducted with

different training rules, or schemes for changing A-R weights as the

perceptron is shown different S vectors and required to classify them.It can be shown that no matter what the initial A-R connections are, it

is possible to adjust them by a simple rule which will always eventually

discover a correct set of weights (i.e., one that gives the correct

classification for all stimuli used) providing that the stimuli are such

that if each of them is represented as a point in an nA dimensionalspace—a space with as many dimensions as there are association units—

a line (hyperplane) can be drawn between all points representing one classof stimuli and the points representing another class . More generally, thisstatement is true of the class of linear threshold machines, an abstract

characterization of pattern recognition devices which includes, but isnot limited to, perceptrons (68) .

There is an interesting, and sometimes illuminating analogy describing

perceptrons and similar pattern recognizers . You can think of each A unitas a device for computing a fixed test on the stimulus, then recommending

that the stimulus be classified 1 or 0 depending on the outcome of the

test. The response unit computes a weighted sum of these recommendations,

then makes the final decision. Learning is equivalent to the search for

good rule for assigning weights to the individual recommendation

Since the S-A connections are fixed, each A unit can be thought of

a feature detector. Its associated feature is simply the set of combinations

of S units which are sufficient to fire the A unit. For example, suppose

59

:■

■!l''F:i !

I

il

d

that the S .units were arranged in a circle, and a particular A unit had a

positive connection to every S unit on the vertical diameter of the circle.The A unit would then be a feature detector for "vertical line in the

center," so that if a particular pattern were surimposed on the circle

the A unit would receive input to the extent that the pattern did contain

a solid vertical line in its center. Viewed this way, the classificationat the A-R level, where learning takes place, is a classification of an

ensemble of features, rather than of an unprocessed "visual" image. In

the perceptron, as originally presented, the features would be defined by

whatever random S-A connections occurred, but if something were known about

the environment to be classified one could easily construct a special

purpose perceptron which contained useful feature detectors. 1^ This argu-

ment increases the attractiveness of the perceptron as a model of biological

pattern recognition, since there is every reason to believe that verte-

brates do have fixed feature detectors in their sensory systems (42,52).Presumably these have been produced by evolutionary selection.

One need not take the position that learning always involves obtain-

ing better judgment schemes, while features remain fixed. It would be

possible to develop an abstract pattern recognizer which learned by adjust-

ing its features, the S-A connections, instead of the A-R connections.

As for the biological significance of such a demonstration, the fact that

there are some fixed feature detectors in the nervous system does not mean

that all feature detection is inflexible. In fact, as has often been

pointed out, much human learning requires that we. first learn to detect

classes, then classes of classes, etc. In concept learning, the "value of

an attribute" is itself a classification which must be learned.

60

!i

I

■|

'II

i

This line of reasoning has been followed in an extensive series ofpattern recognition experiments by Uhr and his associates (91, 9^ 95) .They wrote a program which selects trial features by randomly copying partsof the patterns it is given to classify. The program then uses the presence

or absence of these features as keys in solving the classification problem.

The program keeps track of the utility of each feature it copies, if aparticular feature does not appear to help in classification it is dropped

and a new one copied from the input patterns . This scheme is easy to

illustrate. Suppose the program was given the problem of discriminatingbetween hand printed A's and B's. Typical examples of one class of input

would be Aj A, /\,/t} A, A. and of the other class, fi, g^ g g Theprogram would detect that the following "features," or subparts

/ , ""M ** Sorae of these features are almost completely unique toA's, others to B's. By keeping track of some simple statistics, the pro-gram can detect this, and thus select features on which it bases its

classifying rule.

This is a powerful method of pattern recognition. The Uhr-Vosslerprogram exceeds human performance in classifying two dimensional patternsif the stimuli are unfamiliar to people. As might be expected, in

classifying stimuli for which people have already learned many discriminat-ing features (e.g., cartoon faces) the program takes second place. (91,95),Uhr, Vossler, and Uleman, 1962) . Perhaps more interesting to the psychol-ogist is the fact that the relative difficulty of pattern recognition

tasks is the same for the program as for human subjects. This, of course,

does not prove that the two are using the same pattern recognition tech-

nique, but it is a suggestive observation.

ii

61

■

i

A :

i

This section has only skimmed the surface of a very large literatureon machine pattern recognition. A great deal of selectivity has been

exercised, partly because of the volume of the literature, and partly

because much of the work on machine pattern recognition seems to me

to have limited significance for psychology. Some very general points

about machine pattern recognition, "biological electronics," and the

general implications of computer control techniques for biology have been

made in the survey books (7,98). More specialized discussions of pattern

recognition of the American (68,78), and Soviet (8) literature are alsoavailable . Several of the better original reports have been combined in

a book of readings (93).

Conclusion

Artificial intelligence is now an established, and almost a respect-.-

able, part of the computer science curriculum in many major universities .What has it told us about human thought?

Directly, very little. It is impossible to prove that man operates

in a certain way by mimicking his actions with a digital computer program.

If we could make this mimicry very accurate, and extend it over a wide range

of behavior, we would begin to suspect that there was more than an

accidental coincidence. But this has not been done. Indeed, the ques-

tion "What is a good simulation?" has turned out to be quite a difficult

one to answer. Early optimistic predictions that computer simulation would

become the vehicle for psychological theory were indisputably incorrect.

What computer simulation can do is provide a very careful analysis of

what is required to solve a given problem. Writing and studying the com-

puter program lets us define and exercise an idealized man. We learn that

62 I

vI

Mi

I

certain problem solving techniques, such as feature detection or hierarchiallyorganized memory, will influence the behavior of this idealized person, and

we learn how these influences will manifest themselves. Many examples of

such observations have been given, it would be pointless to repeat them

here. The nature of computer programming as a tool in psychological

theory construction is the issue. At one time this was thought to be the

wind tunnel of psychology. By programming, the theoretician was to be

able to test his models in a realistically complex situation. The tests

and the models were to be so detailed that there would be a directrelation to behavior outside of the psychologist's laboratory. This

hope has not been realized. Human thought is too complex. What the

computer has done is provide an arena for the study of pure thought . In-

stead of the wind tunnel, the proper analogy is to a vacuum chamber.

#"

i

REFERENCES

1. Abelson, R. P. (1963). Computer simulation of hot cognition, in

Tomkins, S. and Messick, D. (cd.) Computer simulation of

personality. New York; Wiley.

fi

2. Abelson, R. P. (1966) Heuristic processes in the human application of

verbal structure in new situations. Proc. XVIII InternationalCongr. Psychol. Sympos. 25, 5-14.

3. Abelson, R. P. (1967) Simulation of social behavior, in Lindsey, G.and Aronson, E. (cd.) Handbook of Social Psychology (in press).

4 Abelson, R. P. and Carroll, J. D. (1965) Computer simulation ofindividual belief systems. Amer. Behav. Sci. 1965, 8, 24-30.

5 Abelson, R. P. and Rosenberg, M. (1958) Symbolic psychologic. A

model of attitudinal cognition. Behav. Sci. 3, 5-13.6 Amarel, S. (1966) On machine representations of problems of

reasoning about actions. The missionaries' and cannibals

problem. RCA laboratories technical report.

7. Arbib, M. (1964) Brains, machines, and ntathematics. New York,

McGraw-Hill.j

8. Arkadeev, A. and Braverman, E. (1967) Computers and Pattern

Recognition. Washington, Thompson 1967.9. Atkinson, R. and Schiffrin, R. (1967) Human memory: A proposed

System and its Control Processes, in Spence, K. (cd.) The

Psychology of Learning and Motivation New York: Acad. Press.

10. Baylor, G., and Simon, H. (1965) A chess mating combination program

Proc. Spring Joint Comp. Conf. 28, 431-447.

63

64

11. Bobrow, D. (1963) A question answering system for high school algebra

word problems. Proc. Fall Joint Comp. Conf. 24, 365-387.12. Bourne, L. (1966) Human conceptual behavior. Boston: Allyn and Bacon.

13. Bower, G. and Trabasso, T. (1964) Concept identification, in Atkinson,R* (cd *) Studies in mathematical psychology. Stanford: Stanford U.Press.

14. Broadbent, D. (1958) Perception and Communication London: Pergamon

Press .15. Broadbent, D. (1963) Flow of information within the organism.

J. Verbal Learning and Verbal Behavior .16. Clarkson, G. (1963) A model of the trust investment process, in

Feigenbaum, E. and Feldman, J. (cd.) Computers and Thought. NewYork: McGraw-Hill.

17. Colby, K. (1963) Computer simulation of a neurotic process, in

Tomkins, S. and Messick, D. (eds.) Computer Simulation of

Personality. New York: McGraw-Hill.

18. Colby, K. (1965) Computer simulation of neurotic processes, inStacy, R. and Waxman, B. (eds.) Computers in Biomedical Research.

New York: Academic Press.

19. Colby, K. and Gilbert, J. (1964) Programming a computer model of neurosisJ. Math. Psychol. 1, 405-417.

20. Daly, J., Joseph, R. and Ramsey, D. (1965) Perceptrons as models of

neural processes, in Stacy, R. and Waxman, B. Computers in

Bio-?,

medical Research. Vol. I, New York: Acad. Press.

21. De Groot, A. (1966) Perception, memory, and thought: Some old ideas

and some recent findings, ln Kleinmuntz, B. (cd.) Problem Solving:

Research and Theory. New York: McGraw-Hill.

65

I

22. De Groot, A. and Jongman, W. (1966) Heuristics in perceptual

processes: An investigation of chess perception. Proc. XVIII

23. Diehr, G. and Hunt, E. (1968) A comparison of memory allocationalgorithms in a logical pattern recognizer. Dept. of Psychology,U. of Washington Tech. Report, 1968.

24. Ernst, G. and Newell, A. (1967a) Some issues of representation in a

general problem solver. Proc. Spring Joint Comp. Conf. 31, 19.25. Ernst, G. and Newell, A. (1967b) Generality and GPS. Carnegie-Mellon

University. Technical report. Dept. of Computer Sciences.26. Favret, A. (1965) Introduction to digital computer applications.

New York: Relnhold.

27- Feigenbaum, E. (1961) The simulation of verbal learning behavior.Proc. Western Joint Comp. Conf. 19, 121- 132.

28. Feigenbaum, E. (1963) The simulation of verbal learning behavior, inFeigenbaum, E. and Feldman, J. (eds.) Computers and Thought.

New York: McGraw-Hill.

29. Feigenbaum, E. (1967) Information Processing and Memory. Fifth

Berkeley Symposium of Math. Statistics and Probability, Vol. IV,

37-51.30. Feldman, J. (1961) Simulation of behavior in the binary choice

experiment. Proc. Western Joint Computer Conf. 19, 133-14431. Feldman, J. (1963) Simulation of behavior in the binary choice

experiment, in Feigenbaum, E. and Feldman, J. Computers and

Thought. New York: McGraw-Hill

32. Feldman, J., Hanna, J. (1966) The structure of responses to a sequence

of binary events. J. Math. Psychol. 1966, 2, 371-387.

International Congr. of Psychology Sympos. 25, 15-24.

66

33- Flexner, L. , Flexner, J. and Roberts, R. (1967) Memory in mice

analyzed with antibiotics. Science. 155, 1377-1382.34. Gelernter, H. (1963) Realization of a geometry theorem proving machine,

in Feigenbaum, E. and Feldman, J. (eds.) Computers and Thought


35. Gelernter, H., Hansen, J. and Loveland, D. (1963) Empirical

explorations of a geometry theorem proving machine, in Feigenbaum, E

and Feldman, J. (eds.) Computers and Thought. New York: McGraw-Hill36. Greenblatt, R., Eastlake, D. and Crocker, S. (1967) The Greenblatt

chess program. Proc. Fall Joint Comp. Conf. AFIPS 31 801-810.37. Gregg, L. (1967) Internal representation of sequential concepts, in

Kleinmuntz, B. (cd.) Concepts and the structure of memory. New

York: Wiley

38. Gregg, L. and Simon, H. (1967) Process models and stochastic theories

of simple concept formation. J. Math. Psychol. 4, 246-276.39- Haygood, R. and Bourne, L. (1965) Attribute and rule learning aspects

of conceptual behavior. Psychol. Rev. 72, 175-195.40. Hebb, D. (1948) The organization of behavior. New York: Wiley

41. Hintzman, D. (1967) Explorations with a discrimination net model of

paired associates learning. J. Math. Psychol, (in press).42. Hubel D. and Weisel, T. (1959) Receptive fields of single neurons in the

cat's visual cortex. J. Physiol. 148, 574-591.43. Hunt, E. (1961) Memory effects in current learning. J. exp. psychol

62, 598-604.44. Hunt, E. (1962) Concept learning: An information processing problem.

New York. Wiley.

67

i

45. Hunt, E. B. (1963) Simulation and analytic models of memory. J. verbal

learning and verbal behavior. 2, 49-59.46. Hunt, E. B. (1966) A model of information acquisition and use. Proc.

XVIII Int's. Congress. Psychol. Symposium 18, Supplement.

47. Hunt, E. (1967) Utilization of memory in concept learning systems, in

Kleinmuntz, B. (cd.) Concepts and the structure of memory. New

York: Wiley.

48. Hunt, E., Marin, J. and Stone, P. (1966) Experiments in induction.

New York: Academic Press.

49. John, E. R. (1967) Mechanisms of memory. New York: Acad. Press.

50. Johnson, E. (1964) An information processing model of one kind of

51. Ledley, R. (1962) Programming and utilization of digital computers


52. Lettvin, J., Maturana, H., McCulloch, W. and Pitts, W. (1959) What

the Frog's Eye Tells the Frog's Brain. Proc. I.R.E. 47, 1940- 1951.53- Luce, R. P. and Raiffa, H. (1956) Games and decisions. New York: Wiley.

54. McCarthy, J., et al. (1966) Information, Scientific American whole issue,Sept. 1966.

55. McGaugh, J. (1966) Time Dependent Processes in Memory Storage.

Science . 153, 1351-1358.56. Miller, G., Galanter, E. and Pribram, K. (i960) Plans and the

structure of behavior. New York: Holt.

58. Murdock, B. (1967) Discussion of papers by Lee W. Gregg and Earl B.

Hunt, in Kleinmuntz, B. (cd.) Concepts and the structure of memory.

New York: Wiley.

problem solving. Psychol. Monogr. whole no. 581.

57. Milner, P., (1957) The cell assembly-Mark 11. Psychol. Rev. 64,245-252.

59 Neisser, U. (1963) The Imitation of Man by Machine. Science. 139,

193-197.

60 Newell, A., Shaw, J. C, and Simon, H. (1957) Empirical explorations

with the logic theory machine. Proc. Western Joint Computer Conference

Thought, New York: McGraw-Hill,

Newell, A., Shaw, J. C, and Simon,

1963).61 H. (1958) Elements of a theory of

Rev. 65, 151-166.human problem solving. Psychol

Newell, A., Shaw, J. c, and Simon,62. H. (1959) Report on a general

problem solving program for a computer. Proc. International Conf.on Information Processing. Paris: UNESCO House.

63. Newell, A., Shaw, J. C, and Simon, H. (1963) Chess playing and the

problem of Complexity, in Feigenbaum, E. and Feldman, J. Computers

and Thought. New York: McGraw-Hill.

64. Newell, A. and Simon, H. (1963) GPS, A program that simulates human

thought, in Feigenbaum, E. and Feldman, J. (cd.) Computers andThought . New York: McGraw-Hill.

65. Newell, A. and Simon, H. (1965a) Programs as theories of higher mental

processes, in Stacy, R. and Waxman, B. (eds.) Computers in

Biomedical Research. Vol. 11. New York: Academic Press.

66. Newell, A. and Simon, H. (1965b) An example of human chess play in the

light of chess playing programs, in Weiner, N. and Schade, P.

(eds.) Progress in biocybernetics. Amsterdam: Elsevier.

67. Newman, C. and Uhr, L. (1965) Bogart: A discovery and inductionprogram for games. Proc. 20th National Cont. ACM 176-186.

68

218-230 (reprinted in Feigenbaum, E. and Feldman, J. Computers and

68. Nilsson, N. (1965) Learning machines. New York: McGraw-Hill.

691

I

69. Paige, G. and Simon, H. (1966) Cognitive processes in solving algebra

word problems, in Kleinmuntz, B. (cd.) Problem Solving. Research.Method and Theory. New York: Wiley.

70. Peterson, C. and Beach, L. (1967) Man as an inactive statistician.

71. Polya, G. (1957) Induction and analogy in mathematics. Princeton:Princeton U. Press.

72. Quinlan, J. R. and Hunt, E. B. (1968). A formal deductive theorem prover.

U. Wash. Computer Science Gp. Technical Report.

73- Rapoport, A. (i960). Fights, games, and debates. Ann Arbor, Mich.U. Mich. Press

74. Reitman, W. (1965) Cognition and thought. New York: Wiley

75. Restle, F. and Emmerich, D. (1966) Memory in concept attainment effectof giving several problems concurrently. J. Expt'l. Psychol. 71,794-799.

76. Robinson, J. (1965) A machine oriented logic based on the resolutionprinciple. J. ACM. 12, 23-4l

77. Rochester, M., Holland, J., Haibt, L. and Duda, W. (1956) Test of a cellassembly theory of the action of the brain using a large digitalcomputer. IVR.E. Trans. Information Theory. PGIT-2 80-93.

78. Rosen, C. (1967) Pattern classification by adaptive machines. Science.156, 38-44.

79- Rosenblatt, F. (1958) The perceptron: A probabilistic model for informa-tion storage and organization in the brain. Psychol. Rev. 65, 386-408

80. Rosenblatt, F. (1962) Principles of Neurodynamics. Washington; Spartan

Press.

Psychol. Bull. 68, 29-46.

70

1

81. Samuel, A. (19^3) Some studies of machine learning using the game of

checkers, in Feigenbaum, E. and Feldman, J. Computers and Thought.


82. Simon, H. (1957) Models of Man. New York: Wiley.

83. Simon, H. (1967) Motivational and anotional Controls of Cognition.

Psychol. Rev. 74, 29-39.84. Simon, H. and Feigenbaum, E. (1964) An information processing theory

of some effects of similarity, familiarization, and meaningfulness

in verbal learning. J. Verbal Learning and Verbal Behavior. 3,

385-396.85. Simon, H. and Kotovsky, K. (1963) Human acquisition of concepts for

serial patterns. Psychol. Rev. 70, 534-546.86. Simon, H. and Newell, A. (1958) Heuristic programming: the next advance

in operations research. Operations Research. 6, 1-10.

87. Slagle, J. (1963) A heuristic program that solves symbolic integration

problems in freshman calculus. J. ACM. 10, 507-520.88. Sperling, G. (i960) The information available in brief visual

89. Stefferud, E. (1963) The lo.gic theory machine. A model heuristic

program. RAND Corporation Technical report RM 3731-CC: Rand Corp.,

Santa Monica, California.

90. Turing, A. M. (1958) Chap. 25 in Bowden, B. Faster Than Thought

London: Pittman.

91. Uhr, L. (1y64) Recognition of letters, pictures, and speech by a

discovery and learning program. Proc. Western Electronics Show and

Convention. 1-5 "

presentations. Psychol. Monogr. 74, whole No. 498

71

92

95

96

97

98

99

100

i

Uhr, L. (1965a) Complex dynamic models of living organisms, in

Stacy, R. and Waxman, B. (cd. ) Computers in Biomedical ResearchVol. 11. New York: Acad. Press.

Uhr, L. and Vossler, C. (1963) A pattern recognition program that

generates, evaluates, and adjusts its own operators, in

Feigenbaum, E. and Feldman, J. (cd.) Computers and Thought.


Uhr, L., Vossler, C. and Uleman, J. (1962) Pattern recognition over

distortion by human subjects and by a computer simulation modelfor visual pattern recognition. J. Exp. Psychol. 63, 227-234.

Yon Neuman, J. (1958) The computer and the brain. New Haven: YalePress .

Waugh, N. and Norman, D. (1965) Training memory. Psychol. Rev. 72,89-104.

Woolridge, D. (1963) The machinery of the brain. New York: McGraw-Hill.Yntema, D. and Meuser, G. (1962) Keeping track of variables that have

few or many states. J. Exp. Psychol. 63, 3yl-395.

Yntema, D. and Meuser, G. (i960) Remembering the present state of a numberof variables. J. Exp. Psychol. 60, 18-22.

93- Uhr, L. (1965b) Pattern Recognition. New York: Wiley.

94. Uhr, L. and Vossler. C. (IM)6^) A

mtt^rn

Tv^r^n-i-n"

72

i

FOOTNOTES

1. The preparation of this paper has been supported in part by theNational Science Foundation, Grant No. NSF 87-1438 R, to the Universityof Washington. I would like to thank Dr. H. H. Wells for his helpfulcomments on a preliminary draft.

2. We ignore the possibility of physiological intervention, which is

seldom possible in human psychology.

3- Victor Frankenstein built the monster. The monster was not named

Frankenstein.

4. For the purist, my remarks are strictly applicable to general

purpose digital computers.

5. This touches on another point of Neisser's. In computers infor-

mation processing is done by a sequence of very rapid, reliable steps.

In biological systems it may be that many things are done in par-

allel using redundant but error prone "computing devices." This

seems to me to be a false issue, since a parallel process can be

simulated by serial one. Simon (83) has an interesting rebuttal

to Neisser's comments concerning parallel vs. serial computing in

simulation.

6. The original work was done on the RAND corporation's JOHNNIAC

computer, a very early machine. Present day machines operate at

perhaps one hundred times the speed of the JOHNNIAC

7. Three missionaries and three cannibals must cross a river. There

is a boat, which carries just two people. For culinary reasons, the

number of missionaries in the boat or on either side of the river must

always be equal to or greater than the number of cannibals. In what

order do the travelers cross the river? In the more general form, there

are M missionaries and cannibals, and the boat carries kCM people.

The problem has several interesting generalizations (6).

73

i

8. The computer can hold a n-dimensional picture as an n-dimensional

array of point, and operate numerically on this array.

9. The program is based on the principle that any formal axiomatic

system can be represented as a set of statements in the first order

predicate calculus, i.e.,. as aset of statements joined together by

the connectives "And, "Or," and "Not." One begins by asserting

jointly all premises and the negation of the desired conclusion.

Using the single inference rule that, if A, B, and C are statements,

then the compound statements (A or B) and (C or Not (B) ) jointly

imply the compound statement (A or C) . In stating a problem, the

user asserts the negation of the hypothesis, and shows that in

conjunction with the premises, this will lend to an assertion of

the form (A) and (Not (A)), which is inconsistent.

10. Fraudulent Chess "machines," in which people were hidden, appeared

in the ISOO's, if not earlier.

11. Strictly, the strategy is rational for a zero sum game, where one

player's wins are another's losses This is all we shall consider,

in general.See (53) for a discussion of games

12. Given any linear rating scheme, it is possible to duplicate it with

an ordered comparison scheme if the individual attributes are

scored sufficiently accurately. Similarly, with suitable chosen

coefficients a linear weighting scheme can imitate ordered comparisons

13. A descriptive term coined by Prof. R. P. Abelson of Yale University.

14. These seem easy because we know it so well. Either series re-

quires the memorization of twenty-five arbitrary connections.

H

■f

74

15. In case anyone doubts this, let us try another example. Can Ipresent a sentence such that you cannot understand it, even thoughit is in front of you, because your immediate memory is overloaded,?Hot only can this be done, I can build up to it. The followingsentences are all grammatical, and make good semantic sense, yetthey rapidly become incomprehensible.

The rat ate the malt.

The rat the cat ate ate the malt

The rat the cat the dog chased ate ate the maltThe rat the cat the dog the man owned chased ate ate the maltThe rat the cat the doG the nan the woman knew owed chased ate

ate the malt .etc.

3y considering how the Simon-Kotovsky program works, we get anidea of why the incomprehensibility occurs.

16. This oversimplifies the picture. Information about the intensityof external stimulation could be obtained by considering the fre-quency and nature of changes in the firing pattern over time, tocite just one complication. By and large, computer simulations haveignored the more sophisticated uses of changes in firing patterns,which may very well be relevant information to the brain.

17- About the same time a similar conclusion was published, independentlyof the computer research (56).

71

i

18 This is biologically unrealistic, since it implies that a single

neuron facilitates the firing of some of its efferent connectors,

and inhibits the firing of others. Insofar as is known, individual

neurons in mammals arc -either facilitory or inhibitory. The units

of a perceptron could be reinterpreted as centers of neural activity,

removing the analomy.

19 Of course, the resulting device is no longer a perceptron. This,

however, is an argument over words. The idea is what is important

d*pt. con tw* -z^^ rnv seriespv828kw6551/pv828... · 2015-10-21 · 2 "transparent" box,...

Documents

dpt. con tw -z^^ rnv seriespv828kw6551/pv828... · 2015-10-21 · 2 "transparent" box,...