part ii artificial intelligence as representa- tion...

PART II

ARTIFICIAL INTELLIGENCE AS REPRESENTA-TION AND SEARCH

A PROPOSAL FOR THE DARTMOUTH SUMMER RESEARCH PROJECT ONARTIFICIAL INTELLIGENCE (url IIa)

We propose that a 2 month, 10 man study of artificial intelligence be carried out during thesummer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed onthe basis of the conjecture that every aspect of learning or any other feature of intelligence can inprinciple be so precisely described that a machine can be made to simulate it. An attempt will bemade to find how to make machines use language, form abstractions and concepts, solve kinds ofproblems now reserved for humans, and improve themselves. We think that a significant advancecan be made in one or more of these problems if a carefully selected group of scientists work on ittogether for a summer.

J. MCCARTHY, Dartmouth CollegeM. L. MINSKY, Harvard UniversityN. ROCHESTER, I.B.M. CorporationC.E. SHANNON, Bell Telephone Laboratories

August 31, 1955

Introduction to Representation and Search

From an engineering perspective, the description of artificial intelligence presented in Section 1.3may be summarized as the study of representation and search through which intelligent activity canbe enacted on a mechanical device. This perspective has dominated the origins and growth of AI.

The first modern workshop/conference for AI practitioners was held at Dartmouth College inthe summer of 1956. The proposal for this workshop is presented as the introductory quotation forPart II. This workshop, where the name artificial intelligence itself was chosen, brought togethermany of the then current researchers focused on the integration of computation and intelligence.There were also a few computer programs written by that time reflecting these early ideas. Themain topics for discussion at this conference, abridged here from the original workshop proposal(url IIa), were:

1. Automatic Computers

If a machine can do a job, then an automatic calculator can be programmed to simulate themachine.

2. How Can a Computer be Programmed to Use a Language

It may be speculated that a large part of human thought consists of manipulating wordsaccording to rules of reasoning and rules of conjecture.

3. Neuron Nets

How can a set of (hypothetical) neurons be arranged so as to form concepts?

1

4. Theory of the Size of a Calculation

If we are given a well-defined problem (one for which it is possible to test mechanicallywhether or not a proposed answer is a valid answer) one way of solving it is to try all possibleanswers in order. This method is inefficient, and to exclude it one must have some criterionfor efficiency of calculation.

5. Self-improvement (Machine Learning)

Probably a truly intelligent machine will carry out activities which may best be described asself-improvement.

6. Abstractions

A number of types of “abstraction” can be distinctly defined and several others less distinctly.A direct attempt to classify these and to describe machine methods of forming abstractionsfrom sensory and other data would seem worthwhile.

7. Randomness and Creativity

A fairly attractive and yet clearly incomplete conjecture is that the difference between creativethinking and unimaginative competent thinking lies in the injection of some randomness.

It is interesting to note that the topics proposed for this first conference on artificial intelligencecapture many of the issues, such as complexity theory, methodologies for abstraction, languagedesign, and machine learning that make up the focus of modern computer science. In fact, many ofthe defining characteristics of computer science as we know it today have their roots in AI. AI hasalso had its own historical and political struggles, with several of these early topics proposed forresearch, such as “neuron nets” and “randomness and creativity” were put into background modefor decades.

A powerful new computational tool, the Lisp language, emerged at about this time, built underthe direction of John McCarthy, one of the original proposers of the Dartmouth Workshop. Lispaddressed several of the topics of the Workshop, supporting the ability to create relationships thatcould themselves be manipulated by other structures of the language. Lisp gave artificial intelligenceboth a highly expressive language, rich in abstraction, as well as a medium for interpretation ofthese expressions.

The availability of the Lisp programming language did shape much of the early development ofAI, in particular, the use of the predicate calculus as a representational medium as well as search toexplore the efficacy of different logical alternatives, what we now call graph search. Prolog, createdin the late 1970s, would offer AI a similar powerful computational tool.

An introduction to the fundamental representation and search techniques supporting work inartificial intelligence make up the five chapters of Part II. The predicate calculus, graph search,heuristic and stochastic methods, and architectures (control systems) for intelligent problem solvingmake up the material of Part II. These technologies reflect the dominant techniques explored bythe AI community during its first two decades.

Representational Systems

The function of any representation scheme is to capture - often called abstract out - the criticalfeatures of a problem domain and make that information accessible to a problem-solving procedure.Abstraction is an essential tool for managing complexity as well as an important factor in assuringthat the resulting programs are computationally efficient. Expressiveness (the result of the features

2

abstracted) and efficiency (the computational complexity of the algorithms used on the abstractedfeatures) are major dimensions for evaluating knowledge representation languages. Sometimes,expressiveness must be sacrificed to improve an algorithm’s efficiency. This must be done withoutlimiting the representation’s ability to capture essential problem-solving knowledge. Optimizing thetrade-off between efficiency and expressiveness is a major task for designers of intelligent programs.

Knowledge representation languages can also be tools for helping humans solve problems. Assuch, a representation should provide a natural framework for expressing problem-solving knowl-edge; it should make that knowledge available to the computer and assist the programmer in itsorganization.

The computer representation of floating-point numbers illustrates these trade-offs (see Figure1).To be precise, real numbers require an infinite string of digits to be fully described; this cannot beaccomplished on a finite device such as a computer. One answer to this dilemma is to representthe number in two pieces: its significant digits and the location within those digits of the decimalpoint. Although it is not possible to actually store a real number in a computer, it is possible tocreate a representation that functions adequately in most practical applications.

38 PART II / ARTIFICIAL INTELLIGENCE AS REPRESENTATION AND SEARCH

Floating-point representation thus sacrifices full expressive power to make the repre-sentation efficient, in this case to make it possible. This representation also supports algo-rithms for multiple-precision arithmetic, giving effectively infinite precision by limitinground-off error to any pre-specified tolerance. It also guarantees well-behaved round-offerrors. Like all representations, it is only an abstraction, a symbol pattern that designates adesired entity and not the entity itself.

The array is another representation common in computer science. For many prob-lems, it is more natural and efficient than the memory architecture implemented in com-puter hardware. This gain in naturalness and efficiency involves compromises inexpressiveness, as illustrated by the following example from image processing. Figure II.2is a digitized image of human chromosomes in a stage called metaphase. The image isprocessed to determine the number and structure of the chromosomes, looking for breaks,missing pieces, and other abnormalities.

The visual scene is made up of a number of picture points. Each picture point, orpixel, has both a location and a number value representing its intensity or gray level. It isnatural, then, to collect the entire scene into a two-dimensional array where the row andcolumn address gives the location of a pixel (X and Y coordinates) and the content of thearray element is the gray level at that point. Algorithms are designed to perform operationslike looking for isolated points to remove noise from the image, finding threshold levelsfor discerning objects and edges, summing contiguous elements to determine size or den-sity, and in various other ways transforming the picture point data. Implementing thesealgorithms is straightforward, given the array representation and the FORTRAN language,for example. This task would be quite cumbersome using other representations such as thepredicate calculus, records, or assembly code, because these do not have a natural fit withthe material being represented.

When we represent the picture as an array of pixel points, we sacrifice fineness ofresolution (compare a photo in a newspaper to the original print of the same picture). Inaddition, pixel arrays cannot express the deeper semantic organization of the image. Forexample, a pixel array cannot represent the organization of chromosomes in a single cellnucleus, their genetic function, or the role of metaphase in cell division. This knowledge ismore easily captured using a representation such as predicate calculus (Chapter 2) orsemantic networks (Chapter 7). In summary, a representational scheme should be ade-quate to express all of the necessary information, support efficient execution of the result-ing code, and provide a natural scheme for expressing the required knowledge.

The real number:

The decimal equivalent:

The floating point representation:

The representation in computer memory:

3.1415927 . . .

31416 1

11100010

Exponent

Mantissa

π

Figure II.1 Different representations of the real number π.Figure 1: Different representations of the real number π.

Floating-point representation thus sacrifices full expressive power to make the representationefficient, in this case to make it possible. This representation also supports algorithms for multiple-precision arithmetic, giving effectively infinite precision by limiting round-off error to any pre-specified tolerance. It also guarantees well-behaved round-off errors. Like all representations, it isonly an abstraction, a symbol pattern that designates a desired entity and not the entity itself.

The array is another representation common in computer science. For many problems, it ismore natural and efficient than the memory architecture implemented in computer hardware. Thisgain in naturalness and efficiency involves compromises in expressiveness, as illustrated by thefollowing example from image processing. Figure 2 is a digitized image of human chromosomes ina stage called metaphase. The image is processed to determine the number and structure of thechromosomes, looking for breaks, missing pieces, and other abnormalities.

The visual scene is made up of a number of picture points. Each picture point, or pixel, hasboth a location and a number value representing its intensity or gray level. It is natural, then, tocollect the entire scene into a two-dimensional array where the row and column address gives thelocation of a pixel (X and Y coordinates) and the content of the array element is the gray levelat that point. Algorithms are designed to perform operations like looking for isolated points toremove noise from the image, finding threshold levels for discerning objects and edges, summingcontiguous elements to determine size or density, and in various other ways transforming the picturepoint data. Implementing these algorithms is straightforward, given the array representation andthe FORTRAN language, for example. This task would be quite cumbersome using other repre-sentations such as the predicate calculus, records, or assembly code, because these do not have a

3

natural fit with the material being represented.When we represent the picture as an array of pixel points, we sacrifice fineness of resolution

(compare a photo in a newspaper to the original print of the same picture). In addition, pixel arrayscannot express the deeper semantic organization of the image. For example, a pixel array cannotrepresent the organization of chromosomes in a single cell nucleus, their genetic function, or the roleof metaphase in cell division. This knowledge is more easily captured using a representation such aspredicate calculus (Chapter 2) or semantic networks (Chapter 7). In summary, a representationalscheme should be adequate to express all of the necessary information, support efficient executionof the resulting code, and provide a natural scheme for expressing the required knowledge.

PART II / ARTIFICIAL INTELLIGENCE AS REPRESENTATION AND SEARCH 39

In general, the problems AI attempts to solve do not lend themselves to the represen-tations offered by more traditional formalisms such as arrays. Artificial intelligence is con-cerned with qualitative rather than quantitative problem solving, with reasoning ratherthan numeric calculation, and with organizing large and varied amounts of knowledgerather than implementing a single, well-defined algorithm.

For example, consider Figure II.3, the arrangement of blocks on a table. Suppose wewish to capture the properties and relations required to control a robot arm. We must deter-mining which blocks are stacked on other blocks and which blocks have clear tops so thatthey can be picked up.The predicate calculus offers a medium to capture this descriptiveinformation. The first word of each expression (on, ontable, etc.) is a predicate denotingsome property or relationship among its arguments (appearing in the parentheses). Thearguments are symbols denoting objects (blocks) in the domain. The collection of logicalclauses describes the important properties and relationships of this blocks world:

clear(c)clear(a)ontable(a)ontable(b)on(c, b)cube(b)cube(a)pyramid(c)

Figure II.2 Digitized image of chromosomes in metaphase.Figure 2: Digitized image of chromosomes in metaphase.

In general, the problems AI attempts to solve do not lend themselves to the representationsoffered by more traditional formalisms such as arrays. Artificial intelligence is concerned with qual-itative rather than quantitative problem solving, with reasoning rather than numeric calculation,and with organizing large and varied amounts of knowledge rather than implementing a single,well-defined algorithm.

For example, consider Figure 3, the arrangement of blocks on a table. Suppose we wish tocapture the properties and relations required to control a robot arm. We must determine whichblocks are stacked on other blocks and which blocks have clear tops so that they can be picked up.The predicate calculus offers a medium to capture this descriptive information. The first word ofeach expression (on, ontable, etc.) is a predicate denoting some property or relationship among itsarguments (appearing in the parentheses). The arguments are symbols denoting objects (blocks) inthe domain. The collection of logical clauses describes the important properties and relationshipsof this blocks world:

clear(c)

clear(a)

ontable(a)

ontable(b)

4

on(c, b)

cube(b)

cube(a)

pyramid(c)


Predicate calculus provides artificial intelligence programmers with a well-definedlanguage for describing and reasoning about qualitative aspects of a system. Suppose, inthe blocks world example, we want to define a test to determine whether a block is clear,that is, has nothing stacked on top of it. This is important if the robot hand is to pick it upor stack another block on top of it. We can define a general rule:

∀ X

¬

∃ Y on(Y,X)

⇒ clear(X)

This is read “for all X, X is clear if there does not exist a Y such that Y is on X.” This gen-eral rule can be applied to a variety of situations by substituting different block names, a,b, c, etc., for X and Y. By supporting general inference rules, predicate calculus allowseconomy of representation, as well as the possibility of designing systems that are flexibleand general enough to respond intelligently to a range of situations.

The predicate calculus may also be used to represent the properties of individuals and groups. Itis often not sufficient, for example, to describe a car by simply listing its component parts; we maywant to describe the ways in which those parts are combined and the interactions between them.This view of structure is essential to a range of situations including taxonomic information, such asthe classification of plants by genus and species, or a description of complex objects such as a dieselengine or a human body in terms of their component parts. For example, a simple description of abluebird might be “a bluebird is a small blue-colored bird and a bird is a feathered flying verte-brate”, which may be represented as the set of logical predicates:

hassize(bluebird,small)hascovering(bird,feathers)hascolor(bluebird,blue)hasproperty(bird,flies)isa(bluebird,bird)isa(bird,vertebrate)

This predicate description can be represented graphically by using the arcs, or links,in a graph instead of predicates to indicate relationships (Fig. II.4). This semantic network,is a technique for representing semantic meaning. Because relationships are explicitlydenoted in the graph, an algorithm for reasoning about a problem domain could make rel-evant associations by following the links. In the bluebird illustration, for example, the pro-gram need only follow one link to se that a blubird flies and two links to determine that abluebird is a vertebrate.

a b

c

Figure II.3 A blocks world.Figure 3: A blocks world

Predicate calculus provides artificial intelligence programmers with a well-defined language fordescribing and reasoning about qualitative aspects of a system. Suppose, in the blocks worldexample, we want to define a test to determine whether a block is clear, that is, has nothingstacked on top of it. This is important if the robot hand is to pick it up or stack another block ontop of it. We can define a general rule:

∀X(¬∃Y on(Y,X)→ clear(X))

This is read “for all X, X is clear if there does not exist a Y such that Y is on X.” Thisgeneral rule can be applied to a variety of situations by substituting different block names, a, b,c, etc., for X and Y . By supporting general inference rules, predicate calculus allows economy ofrepresentation, as well as the possibility of designing systems that are flexible and general enoughto respond intelligently to a range of situations.

The predicate calculus may also be used to represent the properties of individuals and groups.It is often not sufficient, for example, to describe a car by simply listing its component parts; wemay want to describe the ways in which those parts are combined and the interactions betweenthem. This view of structure is essential to a range of situations including taxonomic information,such as the classification of plants by genus and species, or a description of complex objects suchas a diesel engine or a human body in terms of their component parts. For example, a simpledescription of a bluebird might be “a bluebird is a small blue-colored bird and a bird is a featheredflying vertebrate”, which may be represented as the set of logical predicates:

hassize(bluebird,small)

hascovering(bird,feathers)

hascolor(bluebird,blue)

hasproperty(bird,flies)

isa(bluebird,bird)

isa(bird,vertebrate)

This predicate description can be represented graphically by using the arcs, or links, in a graphinstead of predicates to indicate relationships (Figure 4). This semantic network, is a techniquefor representing semantic meaning. Because relationships are explicitly denoted in the graph, analgorithm for reasoning about a problem domain could make relevant associations by following thelinks. In the bluebird illustration, for example, the program need only follow one link to see thata bluebird flies and two links to determine that a bluebird is a vertebrate.

5


Perhaps the most important application for semantic networks is to represent mean-ings for language understanding programs. When it is necessary to understand a child’sstory, the details of a journal article, or the contents of a web page, semantic networks maybe used to encode the information and relationships that reflect the knowledge in thatapplication. Semantic networks are discussed in Chapter 6, and their application to lan-guage understanding in Chapter 15.

Search

Given a representation, the second component of intelligent problem solving is search.Humans generally consider a number of alternative strategies on their way to solving aproblem. A chess player typically reviews alternative moves, selecting the “best” accord-ing to criteria such as the opponent’s possible responses or the degree to which variousmoves support some global game strategy. A player also considers short-term gain (suchas taking the opponent’s queen), opportunities to sacrifice a piece for positional advan-tage, or conjectures concerning the opponent’s psychological makeup and level of skill.This aspect of intelligent behavior underlies the problem-solving technique of state spacesearch.

Consider, for example, the game of tic-tac-toe. Given any board situation, there isonly a finite number of moves that a player can make. Starting with an empty board, thefirst player may place an X in any one of nine places. Each of these moves yields a differ-ent board that will allow the opponent eight possible responses, and so on. We can repre-sent this collection of possible moves and responses by regarding each boardconfiguration as a node or state in a graph. The links of the graph represent legal movesfrom one board configuration to another. The resulting structure is a state space graph.

The state space representation thus enables us to treat all possible games of tic-tac-toeas different paths through the state space graph. Given this representation, an effectivegame strategy will search through the graph for the paths that lead to the most wins andfewest losses and play in a way that always tries to force the game along one of these opti-mal paths, as in Figure II.5.

vertebrate

bird

bluebird

hasproperty

hascolor

hascovering

hassize

feathers

small

flies

blue

isa

isa

Figure II.4 Semantic network description of a bluebird.Figure 4: Semantic network description of a bluebird.

Perhaps the most important application for semantic networks is to represent meanings forlanguage understanding programs. When it is necessary to understand a child’s story, the detailsof a journal article, or the contents of a web page, semantic networks may be used to encode theinformation and relationships that reflect the knowledge in that application. Semantic networksare discussed in Chapter 6, and their application to language understanding in Chapter 15.

Search

Given a representation, the second component of intelligent problem solving is search. Humansgenerally consider a number of alternative strategies on their way to solving a problem. A chessplayer typically reviews alternative moves, selecting the “best” according to criteria such as theopponent’s possible responses or the degree to which various moves support some global game strat-egy. A player also considers short-term gain (such as taking the opponent’s queen), opportunitiesto sacrifice a piece for positional advantage, or conjectures concerning the opponent’s psycholog-ical makeup and level of skill. This aspect of intelligent behavior underlies the problem-solvingtechnique of state space search.

Consider, for example, the game of tic-tac-toe. Given any board situation, there is only a finitenumber of moves that a player can make. Starting with an empty board, the first player may placean X in any one of nine places. Each of these moves yields a different board that will allow theopponent eight possible responses, and so on. We can represent this collection of possible movesand responses by regarding each board configuration as a node or state in a graph. The links of thegraph represent legal moves from one board configuration to another. The resulting structure is astate space graph.

The state space representation thus enables us to treat all possible games of tic-tac-toe asdifferent paths through the state space graph. Given this representation, an effective game strategywill search through the graph for the paths that lead to the most wins and fewest losses and playin a way that always tries to force the game along one of these optimal paths, as in Figure 5.

As an example of how search is used to solve a more complicated problem, consider the task ofdiagnosing a mechanical fault in an automobile. Although this problem does not initially seem tolend itself to state space search as easily as tic-tac-toe or chess, it actually fits this strategy quitewell. Instead of letting each node of the graph represent a “board state,” we let it represent a stateof partial knowledge about the automobile’s mechanical problems. The process of examining thesymptoms of the fault and inducing its cause may be thought of as searching through states ofincreasing knowledge. The starting node of the graph is empty, indicating that nothing is known

6


As an example of how search is used to solve a more complicated problem, considerthe task of diagnosing a mechanical fault in an automobile. Although this problem doesnot initially seem to lend itself to state space search as easily as tic-tac-toe or chess, it actu-ally fits this strategy quite well. Instead of letting each node of the graph represent a“board state,” we let it represent a state of partial knowledge about the automobile’smechanical problems. The process of examining the symptoms of the fault and inducingits cause may be thought of as searching through states of increasing knowledge. Thestarting node of the graph is empty, indicating that nothing is known about the cause of theproblem. The first thing a mechanic might do is ask the customer which major system(engine, transmission, steering, brakes, etc.) seems to be causing the trouble. This is repre-sented by a collection of arcs from the start state to states that indicate a focus on a singlesubsystem of the automobile, as in Figure II.6.

Each of the states in the graph has arcs (corresponding to basic diagnostic checks) thatlead to states representing further accumulation of knowledge in the diagnostic process.

X X XX X X

X X X

XXXXXXXXXXX X0 0

0 0 00 0

0 0 0

0 0

XXXXXXXXXX X0 0 0 0 0 0 0 0 0 0 0

X XX

X X X

X X XX X X

X0

Figure II.5 Portion of the state space for tic-tac-toe.Figure 5: Portion of the state space for tic-tac-toe.

about the cause of the problem. The first thing a mechanic might do is ask the customer whichmajor system (engine, transmission, steering, brakes, etc.) seems to be causing the trouble. Thisis represented by a collection of arcs from the start state to states that indicate a focus on a singlesubsystem of the automobile, as in Figure 6.

Each of the states in the graph has arcs (corresponding to basic diagnostic checks) that lead tostates representing further accumulation of knowledge in the diagnostic process. start ask:


For example, the engine trouble node has arcs to nodes labeled engine starts andengine won t start. From the won t start node we may move to nodes labeled turns overand won t turn over. The won t turn over node has arcs to nodes labeled battery deadand battery ok, see Figure II.7. A problem solver can diagnose car trouble by searchingfor a path through this graph that is consistent with the symptoms of a particular defective

engine trouble transmission brakes . . .

start ask:where is the problem?

engine startsask: . . .

engine won't startask:

Will engine turn over?

turns overask: . . .

won't turn overask:

Do lights come on?

batterydead

batteryok

. . .


engine troubleask:

does the car start?

transmissionask: . . .

brakesask: . . .

yes no

yesno

yes no

Figure II.7 State space description of the automotivediagnosis problem.

Figure II.6 State space description of the first step indiagnosing an automotive problem.Figure 6: State space description of the first step in diagnosing an automotive problem.

For example, the engine trouble node has arcs to nodes labeled engine starts and engine won’tstart. From the won’t start node we may move to nodes labeled turns over and won’t turn over.The won’t turn over node has arcs to nodes labeled battery dead and battery ok, see Figure 7. Aproblem solver can diagnose car trouble by searching for a path through this graph that is consistentwith the symptoms of a particular defective car. Although this problem is very different from that

7


For example, the engine trouble node has arcs to nodes labeled engine starts andengine won t start. From the won t start node we may move to nodes labeled turns overand won t turn over. The won t turn over node has arcs to nodes labeled battery deadand battery ok, see Figure II.7. A problem solver can diagnose car trouble by searchingfor a path through this graph that is consistent with the symptoms of a particular defective

engine trouble transmission brakes . . .


engine startsask: . . .

engine won't startask:

Will engine turn over?

turns overask: . . .

won't turn overask:

Do lights come on?

batterydead

batteryok

. . .


engine troubleask:

does the car start?

transmissionask: . . .

brakesask: . . .

yes no

yesno

yes no

Figure II.7 State space description of the automotivediagnosis problem.

Figure II.6 State space description of the first step indiagnosing an automotive problem.

Figure 7: State space description of the automotive diagnosis problem.

of finding an optimal way to play tic-tac-toe or chess, it is equally amenable to solution by statespace search.

In spite of this apparent universality, state space search is not, by itself, sufficient for automatingintelligent problem-solving behavior; rather it is an important tool for the design of intelligentprograms. If state space search were sufficient, it would be fairly simple to write a program thatplays chess by searching through the entire space for the sequence of moves that brought a victory,a method known as exhaustive search. Though exhaustive search can be applied to any statespace, the overwhelming size of the space for interesting problems makes this approach a practicalimpossibility. Chess, for example, has approximately 10120 different board states. This is a numberlarger than the number of molecules in the universe or the number of nanoseconds that have passedsince the big bang. Search of this space is beyond the capabilities of any computing device, whosedimensions must be confined to the known universe and whose execution must be completed beforethe universe succumbs to the ravages of entropy.

Humans use intelligent search: a chess player considers a number of possible moves, a doctorexamines several possible diagnoses, a computer scientist entertains different designs before begin-ning to write code. Humans do not use exhaustive search: the chess player examines only movesthat experience has shown to be effective, the doctor does not require tests that are not somehowindicated by the symptoms at hand. Human problem solving seems to be based on judgmentalrules that guide search to those portions of the state space that seem most “promising”.

These rules are known as heuristics, and they constitute one of the central topics of AI research.A heuristic (the name is taken from the Greek word “to discover”) is a strategy for selectively

8

searching a problem space. It guides search along lines that have a high probability of successwhile avoiding wasted or apparently stupid efforts. Human beings use a large number of heuristicsin problem solving. If you ask a mechanic why your car is overheating, she may say somethinglike, “Usually that means the thermostat is bad.” If you ask a doctor what could cause nausea andstomach pains, he might say it is “probably either stomach flu or food poisoning.”

State space search gives us a means of formalizing the problem-solving process, and heuristicsallow us to infuse that formalism with intelligence. These techniques are discussed in detail in theearly chapters of this book and remain at the heart of most modern work in AI. In summary, statespace search is a formalism, independent of any particular search strategies, and used as a launchpoint for many problem solving approaches.

Throughout the text we continue to explore the theoretical aspects of knowledge representationand search and the use of this theory in building effective programs. The treatment of knowledgerepresentation begins with Chapter 2 and the predicate calculus. Chapter 3 introduces searchin the context of game graphs and other applications. In Chapter 4, heuristics are introducedand applied to graph search, including games. In Chapter 5 we present stochastic (probabilistic)techniques for building and organizing search spaces; these will be used later in areas includingmachine learning and natural language processing. Finally, Chapter 6 introduces the productionsystem, blackboards and other software architectures that integrate representation and search, thussupporting the building of intelligent problem solvers.

9

CHAPTER 2

FORMAL LOGIC

We come to the full possession of our power of drawing inferences, the last of our faculties; forit is not so much a natural gift as a long and difficult art.

—C. S. PIERCE

The essential quality of a proof is to compel belief.

—FERMAT

2.0. Introduction

In this chapter we introduce the predicate calculus as a representation language for artificial intel-ligence. The importance of the predicate calculus was discussed in the introduction to Part II; itsadvantages include a well-defined formal semantics and sound and complete inference rules. Thischapter begins with a brief overview of the concept of a formal logical system (Section 2.1) and thenproceeds to a review of the propositional calculus (Section 2.2) which serves a simple example ofa formal logical system. Section 2.3 defines the syntax and semantics of the predicate calculus. InSection 2.4 we discuss predicate calculus inference rules and their use in problem solving. Finally,the chapter demonstrates the use of the predicate calculus to implement a knowledge base for fi-nancial investment advice. This illustrates the logic underlying expert systems, which are discussedlater in course.

2.1. Formal Logical Systems

A formal logical system consists of:

1. a language, the elements of which are variously called expressions, propositions, statements,formulas, etc.,

2. a subset of the language, the elements of which are taken to serve as axioms,

3. some inference rules which say how, given certain expressions of the language (or ones havingcertain “forms”), one can derive another expression of the language (another one havingsome particular “form”). In such a derivation, the given expressions are the premises and thederived expression is the conclusion.

Given these components, the theorems consist of all expressions that can be derived from the axiomsby repeated applications of the inference rules. The basic structure of a formal logic system is shownin the top half of Figure 8.

The term “formal” refers to the fact that all derivations are carried out as purely mechanicaloperations performed on symbols and without reference to any meanings for the expressions; i.e.,we are conerned only with the expressions’ syntactic forms. Once a formal logical system has beendefined, however, it is customary to associate it with a semantics which prescribes meanings forthe expressions of the language, together with a means for determining whether the meanings aretrue or false (valid or invalid, etc.). This is illustrated in the bottom half of Figure 8. Once thesemantics has been defined it is of interest to establish the following two results:

• Soundness (or Consistency): If an expression is a theorem, then it is true (or valid, etc.).

10

• Adequacy (or Completeness): If an expression is true (or valid, etc.), then it is a theorem.

Together these properties ensure that the formal system correctly captures the logic inherent in thesemantics. In fact, the formal system and semantics are normally designed with the express purposeof being able to establish that the two properties will hold. Soundness/consistency means that onecannot prove things that are not true, and adequacy/completenenss means that the axioms andrules are sufficient to prove everything that is true.

Axioms Theorems Inference

Rules

Expressions

Interpretation

Meanings

Truth/Validity

Figure 8: Structure of a formal logical system.

An inference rule is said to be valid in a given semantics if it only allows one to derive true(valid) conclusions from true (valid) premises, i.e., true (valid) assumptions cannot lead to false(invalid) conclusions. Note that this is a purely semantic notion and has nothing to do with formalderivability. An inference rule may be called formally valid if it only allows one to drive theoremsfrom other theorems. A well-known inference rule is Modus Ponens, which states

From P and P → Q infer Q

When this is taken as one of the inference rules of a formal logical system it is formally valid bydefault because its use is to derive new theorems. However, some inference rules may not be givenas part of the definition of the system but can be derived from the given axioms and inference rules.A typical example is Hypothetical Syllogism

From P → Q and Q→ R infer P → R

Verification that this rule is formally valid is based on the fact that it can be so derived (if it canbe), where the derivation amounts to showing that, if P → Q and Q→ R are theorems, then so isP → R. Further examples will be seen in the following.

If a formal system is sound and complete with respect to its semantics, then an inference rulewill be formally valid if and only if it is semantically valid. This is an important result that makes

11

formal logical systems useful as a model of natural reasoning processes. It also effectively servesand the basis for the Prolog programming language, which will be covered later in this course.

The result may easily be established as follows. Consider an inference rule that says, from someexpressions of the form F1, . . . , Fn one may derive the expression of the form F . Suppose first thatthe rule is formally valid. To see that this implies it is semantically valid, suppose that E1, . . . , En

and E are expressions having the indicated forms and that all of the premises E1, . . . , En are true(valid). It is desired to show that this implies that E is true (valid). To see this, observe that, byCompleteness, since the premises are true (valid) they must be theorems. Then, because the ruleis logically valid, E must be a theorem. Then, by Soundness, E must be true (valid).

Now suppose that the rule is semantically valid. To see that this implies it is logically valid,suppose that E1, . . . , En and E are expressions having the indicated forms and that all of thepremises E1, . . . , En are theorems. It is desired to show that this implies that E is a theorem. Tosee this, observe that, by Soundness, since the premises are theorems they must be true (valid).Then, because the rule is semantically valid, E must be a true (valid). Then, by Completeness, Emust be a theorem.

2.2. The Propositional Calculus

We now turn to our first example of a formal logical system. The language of the propositionalcalculus may be defined as follows. The symbols will be:

1. propositional symbols: the notations p1, p2, etc.

2. truth symbols: true, false

3. logical connectives: ¬, ∨, ∧, →, ≡

4. parentheses: ( and )

Then the propositions are defined by:

1. propositional symbols and truth symbols are propositions.

2. if P and Q are propositions, then so are (¬P ), (P ∨Q), (P ∧Q), (P → Q), and (P ≡ Q).

3. Nothing is a proposition except as required by items 1 and 2.

Parentheses may dropped when the intended grouping is clear. It is normally assumed that ¬ haspriority over ∨ and ∧, and that these have priority over → and ≡.

In an application, propositional symbols are used to represent specific propositions expressed ina natural language, e.g., “the car is red” or “water is wet”. It is assumed that all such propositionsare either true or false statements about the real world.

In propositions of the form P ∨Q, P and Q are called disjuncts. In P ∧Q, P and Q are calledconjuncts. In an implication, P → Q, P is the premise or antecedent and Q is the conclusion orconsequent. Propositions are also called well-formed formulas or WFFs.

An expression is a proposition, or well-formed formula, of the propositional calculus if and onlyif it can be formed of legal symbols through some sequence of the foregoing rules. For example, ifP, Q, and R are propositions,

(((P ∧Q)→ R) ≡ (((¬P ) ∨ (¬Q)) ∨R))

is a proposition (WFF) because:

12

(P ∧Q), the conjunction of two propositions, is a proposition.((P ∧Q)→ R), the implication of a proposition for another, is a proposition.(¬P ) and (¬Q), the negations of propositions, are propositions.((¬P ) ∨ (¬Q)), the disjunction of two propositions, is a proposition.(((¬P ) ∨ (¬Q)) ∨R), the disjunction of two propositions, is a proposition.(((P ∧Q)→ R) ≡ (((¬P )∨ (¬Q))∨R)), the equivalence of two propositions, is a proposition.

The latter is our original proposition, which has been constructed through a series of applicationsof legal rules and is therefore “well formed”. In accordance with the foregoing policy for droppingparentheses, this same proposition may be written in the more readable form as

(P ∧Q→ R) ≡ (¬P ∨ ¬Q) ∨R

An interpretation (sometimes called a truth value assignment) for the propositional calculus isa mapping v : propositions→ {T,F} defined by:

1. v(pi) ∈ {T,F}, for all i

2. v(true) = T and v(false) = F

3. v(¬P ) = T iff v(P ) = F

4. v(P ∨Q) = T iff v(P ) = T or v(Q) = T

5. v(P ∧Q) = T iff v(P ) = T and v(Q) = T

6. v(P → Q) = T iff v(P ) = F or v(Q) = T

7. v(P ≡ Q) = T iff v(P ) = v(Q)

The notations T and F are truth values.This semantics defines the meanings of the truth symbols and the logical connectives. The

symbols ¬, ∨, ∧, →, and ≡ correspond respectively to the English language NOT, OR, AND,IMPLIES, and IF AND ONLY IF. It is easy to see that these formulations capture the usual truthtables for these connectives. For example, item 5 encodes the truth table for the logical ∧ as shownin Figure 9.

CHAPTER 2 / THE PREDICATE CALCULUS 49

¬ (¬ P) ≡ P

(P ∨ Q) ≡ (¬ P → Q)

the contrapositive law: (P → Q) ≡ (¬ Q → ¬ P)

de Morgan’s law: ¬ (P ∨ Q) ≡ (¬ P ∧ ¬ Q) and ¬ (P ∧ Q) ≡ (¬ P ∨¬ Q)

the commutative laws: (P ∧ Q) ≡ (Q ∧ P) and (P ∨ Q) ≡ (Q ∨ P)

the associative law: ((P ∧ Q) ∧ R) ≡ (P ∧ (Q ∧ R))

the associative law: ((P ∨ Q) ∨ R) ≡ (P ∨ (Q ∨ R))

the distributive law: P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (P ∨ R)

the distributive law: P ∧ (Q ∨ R) ≡ (P ∧ Q) ∨ (P ∧ R)

Identities such as these can be used to change propositional calculus expressions intoa syntactically different but logically equivalent form. These identities may be usedinstead of truth tables to prove that two expressions are equivalent: find a series of identi-ties that transform one expression into the other. An early AI program, the Logic Theorist(Newell and Simon 1956), designed by Newell, Simon, and Shaw, used transformationsbetween equivalent forms of expressions to prove many of the theorems in Whitehead andRussell’s Principia Mathematica (1950). The ability to change a logical expression into adifferent form with equivalent truth values is also important when using inference rules(modus ponens, Section 2.3, and resolution, Chapter 14) that require expressions to be in aspecific form.

Figure 2.2 Truth table demonstrating the equivalence ofP → Q and ¬ P ∨ Q.

Figure 2.1 Truth table for the operator ∧.

T T F T T T

P Q ¬P ¬P Q P Q (¬P Q)=(P Q)>

T F F F F T

F T T T T T

F F T T T T

⇒ > ⇒

T T T

P Q P Q>

T F F

F T F

F F F

Figure 9: Truth table for the connective ∧.

A proposition P is a tautology if v(P ) = T for all truth value assignments v. In other words, nomatter how one assigns truth values to the propositional symbols appearing in P , the truth valueof P will be T. Two propositions P and Q are equivalent if their logical equivalence P ≡ Q is atautology. This may be demonstrated by showing that P and Q have the same truth tables. Forexample, a proof of the equivalence of P → Q and ¬P ∨Q is given by the truth table in Figure 10.One can similarly demonstrate the following equivalences for any propositions P , Q, and R:

13


¬ (¬ P) ≡ P

(P ∨ Q) ≡ (¬ P → Q)

the contrapositive law: (P → Q) ≡ (¬ Q → ¬ P)

de Morgan’s law: ¬ (P ∨ Q) ≡ (¬ P ∧ ¬ Q) and ¬ (P ∧ Q) ≡ (¬ P ∨¬ Q)

the commutative laws: (P ∧ Q) ≡ (Q ∧ P) and (P ∨ Q) ≡ (Q ∨ P)

the associative law: ((P ∧ Q) ∧ R) ≡ (P ∧ (Q ∧ R))

the associative law: ((P ∨ Q) ∨ R) ≡ (P ∨ (Q ∨ R))

the distributive law: P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (P ∨ R)

the distributive law: P ∧ (Q ∨ R) ≡ (P ∧ Q) ∨ (P ∧ R)

Identities such as these can be used to change propositional calculus expressions intoa syntactically different but logically equivalent form. These identities may be usedinstead of truth tables to prove that two expressions are equivalent: find a series of identi-ties that transform one expression into the other. An early AI program, the Logic Theorist(Newell and Simon 1956), designed by Newell, Simon, and Shaw, used transformationsbetween equivalent forms of expressions to prove many of the theorems in Whitehead andRussell’s Principia Mathematica (1950). The ability to change a logical expression into adifferent form with equivalent truth values is also important when using inference rules(modus ponens, Section 2.3, and resolution, Chapter 14) that require expressions to be in aspecific form.

Figure 2.2 Truth table demonstrating the equivalence ofP → Q and ¬ P ∨ Q.

Figure 2.1 Truth table for the operator ∧.

T T F T T T

P Q ¬P ¬P Q P Q (¬P Q)=(P Q)>

T F F F F T

F T T T T T

F F T T T T

⇒ > ⇒

T T T

P Q P Q>

T F F

F T F

F F F

Figure 10: Truth table demonstrating the equivalence of P → Q and ¬P ∨Q.

P ∧ ¬P ≡ falseP ∨ false ≡ P¬(¬P ) ≡ PP → Q ≡ ¬P ∨Qthe contrapositive law: P → Q ≡ ¬Q→ ¬Pde Morgan’s laws: ¬(P ∨Q) ≡ ¬P ∧ ¬Q and ¬(P ∧Q) ≡ ¬P ∨ ¬Qthe commutative laws: P ∧Q ≡ Q ∧ P and P ∨Q ≡ Q ∨ Pthe associative law: (P ∧Q) ∧R ≡ P ∧ (Q ∧R)the associative law: (P ∨Q) ∨R ≡ P ∨ (Q ∨R)the distributive law: P ∨ (Q ∧R) ≡ (P ∨Q) ∧ (P ∨R)the distributive law: P ∧ (Q ∨R) ≡ (P ∧Q) ∨ (P ∧R)

Identities such as these can be used to change propositional calculus expressions into syntacti-cally different but logically equivalent forms. These identities may be used instead of truth tables toprove that two expressions are equivalent: find a series of identities that transform one expressioninto the other. An early AI program, the Logic Theorist (Newell and Simon 1956), designed byNewell, Simon, and Shaw, used transformations between equivalent forms of expressions to provemany of the theorems in Whitehead and Russell’s Principia Mathematica (1950). The ability tochange a logical expression into a different form with equivalent truth values is also importantwhen using inference rules (modus ponens, Section 2.3, and resolution, Chapter 14) that requireexpressions to be in a specific form.

We will forego elaborating some axioms and rules for the propositional calculus, as we do notneed this for the current discussion. Numerous versions exist, however, for which the desiredsoundness and adequacy properties have been established. One well-known reference is (Mendelson1987); a similar treatment is (Hamilton 1988); another, more advanced, treatment is (Shoenfield1969). Given a suitable axiomatization of the propositional calculus, one can establish the followingversions of the aforementioned results:

• Soundness/consistency: All theorems are tautologies.

• Adequacy/completeness: All tautologies are theorems.

2.3. The Predicate Calculus

In propositional calculus, each propositional symbol (p1, p2, etc.) denotes a single assertion. Thereis no way to access the components of an individual assertion. Predicate calculus provides thisability. For example, instead of letting a single propositional symbol, pi, denote the entire sentence“it rained on Tuesday”, we can create a predicate weather that describes a relationship betweena date and the weather: weather(tuesday, rain). Through inference rules we can manipulatepredicate calculus expressions, accessing their individual components and inferring new sentences.

14

Predicate calculus also allows expressions to contain variables. Variables let us create generalassertions about classes of entities. For example, we could state that for all values of X, where X

is a day of the week, the statement weather(X, rain) is true; i.e., it rains every day. As we didwith the propositional calculus, we will first define the syntax of the language and then discuss itssemantics. There are various ways to do this (e.g., see abovementioned references to Mendelson,Hamilton, and Shoenfield). Here, as a preparation for a discussion of the Prolog programming thatwill appear later in this course, we adopt the syntax of that language.

2.3.1 First-0rder Languages

Before defining the syntax of correct expressions in the predicate calculus, we define an alphabetand grammar for creating the symbols of the language. This corresponds to the lexical aspect ofa programming language definition. Predicate calculus symbols, like the tokens in a programminglanguage, are irreducible syntactic elements: they cannot be broken into their component parts bythe operations of the language.

We begin with some alphabet characters consisting of:

1. the English alphabet letters, both uppercase and lowercase,

2. the numerals 0,1,. . .,9, and

3. the underscore character, .

A symbol expression is any string of alphabet characters that begin with an English letter. Asexamples, some legitimate characters in the alphabet of predicate calculus symbols are

a 1 R 6 9 p z

Some examples of characters not in the alphabet are

# % @ / &

The symbols are:

1. logical connectives: ¬, ∨, ∧, →, ≡

2. punctuation: left and right parentheses and comma

3. quantifiers: ∀ and ∃

4. constant symbols: symbol expressions beginning with a lowercase letter

5. variable symbols: symbol expressions beginning with an uppercase letter

6. function symbols: symbol expressions beginning with a lowercase letter (distinguished fromconstant symbols by context), each having an arity indicating the number of arguments

7. predicate symbols: same as function symbols, also distinguished by context

8. truth symbols: true and false.

15

Some examples of legitimate predicate calculus symbols are:

George fire3 tom and jerry bill XXXX friends of

Some examples of strings that are not legal symbols are:

3jack no blanks allowed ab%cd ***71 duck!!!

Constant symbols, function symbols, and predicate symbols are used to denote specific objects,functions, and predicates (relations) in some world of discourse. As with most programming lan-guages, the use of “words” that suggest the symbol’s intended meaning assists us in understandingprogram code. Thus, even though l(g,k) and likes(george,kate) are formally equivalent (i.e.,they have the same structure), the second can be of great help (for human readers) in indicat-ing what relationship the expression represents. It must be stressed that these descriptive namesare intended solely to improve the readability of expressions. The only “meaning” that predicatecalculus expressions have is given through their formal semantics.

Variable symbols are symbols beginning with an uppercase letter. Thus George, BILL, andKAte are legal variables, whereas geORGE and bill are not. Constant symbols, function symbols,and predicate symbols begin with a lowercase letter. Some sample constant symbols are:

george the house theHouse two theNumber2

Some sample function symbols are:

f f 2 plus father descendant of price average

Some sample predicate synbols are:

likes (mentioned above) less than near sibling on partOf

Note that our definition of predicate calculus symbols does not include numbers or arithmeticoperators. The number system is not included in the predicate calculus primitives; instead itis defined axiomatically using “pure” predicate calculus as a basis (Manna and Waldinger 1985).While the particulars of this derivation are of theoretical interest, they are less important to the useof predicate calculus as an AI representation language. For convenience, we assume this derivationand include arithmetic in the language. To accommodate this we permit constant symbols that areintended to be interpreted as numbers to be numerals, like 25, even though these officially are notlegal symbols in the sense defined above.

The terms are defined by:

1. Constant symbols and variable symbols are terms.

2. If f is an n-ary function symbol and t1, . . . , tn are terms, then f(t1, . . . , tn) is a term.

3. Nothing is a term except as required by items 1 and 2.

Some examples of terms are:

cat kate blue X mother(sarah) times(2,X) times(X,plus(Y,Z))

A term is closed if it does not contain variable symbols. A term of the kind that involves afunction symbol may be called a function expression and the number n is referred to as the symbol’sarity. For example,

f(X,Y) father(david) price(bananas) plus(2,3)

are all closed well-formed function expressions; father and price have arity 1; and f and plus havearity 2. It follows that a term is either a constant, a variable, or a function expression.

The sentences (also called (first-order) formulas) are defined by:

16

1. If p is an n-ary predicate symbol and t1, . . . , tn are terms, then p(t1, . . . , tn) is a sentence,known as an atomic sentence. The truth symbols true and false also are atomic sentences.

2. If s1 and s2 are sentences, then so are (¬s1), (s1 ∧ s2), (s1 ∨ s2), (s1 → s2), and (s1 ≡ s2).

3. If s is a sentence, then so are ∀Xs and ∃Xs.

4. Nothing is a sentence except as required by the above.

A sentence is closed if either it does not contain variable symbols or all its variable symbols arewithin the scopes of quantifiers.

In the above definition for atomic sentence, the number n is the arity of the given predicate sym-bol p. An atomic sentence is the most primitive unit of the predicate calculus language. Examplesof atomic sentences are:

likes(george,kate)

likes(X,george)

likes(george,susie)

likes(X,X)

likes(george,sarah,tuesday)

friends(bill,richard)

friends(bill,george)

friends(father of(david),father of(andrew))

helps(bill,george)

helps(richard,bill)

The predicate symbols in these expressions are likes, friends, and helps. A predicate symbol maybe used with different numbers of arguments. In this example there are two different likes, one withtwo and the other with three arguments. By the foregoing definitions, this is not allowed to occurin the same language, but the same symbol may be used with different arities in different languages.As with some programming languages, however, a symbol such a likes may be “overloaded”, inwhich case the symbol is identified by the combination of its name and arity. Prolog adopts thispolicy, and it may be used occasionally in the following.

The symbol ∀ is known as the universal quantifier, and ∃ is known as the existential quantifier.Some examples of sentences employing these symbols are

∃Y friends(Y,peter)

∀X likes(X,ice cream)

A discussion of some possible meanings for these sentences appears in the following. For somefurther examples of well-formed expressions let times and plus be function symbols of arity 2 andlet equal and foo be predicate symbols with arity 2 and 3, respectively. Then

plus(two,three) is a function expression (and thus not an atomic sentence).

equal(plus(two,three),five) is an atomic sentence.

equal(plus(2,3),seven) is an atomic sentence. Note that this sentence, given the standardinterpretation of plus and equal, is false. Well-formedness and truth are independent issues.

∀X foo(X,two,plus(two,three))∧equal(plus(two,three),five) is a sentence since bothconjuncts are sentences.

17

(foo(two,two,plus(two,three)))→(equal(plus(three,two),five)≡true) is a sentencebecause all its components are sentences, appropriately connected by logical operators.

The definition of predicate calculus sentences and the examples just presented suggest a methodfor verifying that an expression is a sentence. This is written as a recursive algorithm, verify sentence.The algorithm takes as argument a candidate expression and returns success if the expression isa sentence.

function verify sentence(expression);

begin

case

expression is an atomic sentence: return SUCCESS;

expression is of the form Q X s, where Q is either ∀ or ∃, X is a variable,

if verify sentence(s) returns SUCCESS

then return SUCCESS

else return FAIL;

expression is of the form ¬ s:

if verify sentence(s) returns SUCCESS

then return SUCCESS

else return FAIL;

expression is of the form s1 op s2, where op is a binary logical operator:

if verify sentence(s1) returns SUCCESS and

verify sentence(s2) returns SUCCESS

then return SUCCESS

else return FAIL;

otherwise: return FAIL

end

end.

Different first-order languages are defined by taking different selections of constant symbols,function symbols, and predicate symbols. The other synbols are common to all first-order languages.

2.3.2 Semantic Interpretations—Part 1

An interpretation I for a first-order language L consists of:

1. A nonempty set DI serving as the domain, the elements of which are called individuals.

2. For each constant symbol a in L, assignment of a unique individual aI in DI . In this case,let I(a) denote aI .

3. For each variable symbol V in L, assignment of a subset VI of DI (to serve as the range forpossible values of V ).

4. For each n-ary function symbol f in L, assignment of a function fI : DnI → DI .

5. For each n-ary predicate symbol p in L, assignment of a predicate (relation) pI on DnI .

18

Accordingly, in this semantics, constant symbols are used to represent individual objects insome domain of discourse (a given set of objects); variable symbols are taken to range over certainspecified subsets of the domain of discourse (in effect, the subset specifies a certain type of object);function symbols are used to represent specific functions defined on the domain of discourse; andpredicate symbols are used to represent specific predicates (relations) defined on the domain ofdiscourse. A consequence of this definition is that a first-order language can have many differentinterpretations (typically infinitely many) depending in the choice of domain and choice of meaningsfor the constant symbols, functions symbols, and predicate symbols in that domain.

Given an interpretation I for a first-order language L, a term valuation is a mapping I :closed terms→ DI , having the properties:

1. If t is a constant symbol a, I(t) = aI .

2. If t is a function expression f(t1, . . . , tn), then I(t) = fI(I(t1), . . . , I(tn)).

In effect, the valuation of a closed term is obtained by applying any of its indicated functions tothe values (evaluations) of its indicated arguments. To illustrate this, consider the closed terms

father of(david) father of(andrew)

If the function symbol father of is interpreted as the real-world “father” function, if the constantsymbols david and andrew are evaluated as some real-world persons named David and Andrew,and it happens that the fathers of David and Andrew are some persons named George and Bill,then the first term evaluates to George, and the second evaluates to Bill. As a furthe example,consider the closed term

plus(2,3)

If the function symbol plus is interpreted as ordinary addition of integers, and 2 and 3 are inter-preted as the integers 2 and 3, then the term evaluates to the integer 5.

Let I be an interpretation with domain DI for a first-order language L, and let L(DI) be thelanguage obtained from L by adding a new constant symbol d̂ for each element d ∈ DI . The constantd̂ will be the name of the element d, and we set I(d̂) = d. If P is a sentence of L having variablesymbols that are not bound by any quantifiers (i.e. P is not closed), an instance of P in I will be aclosed sentence that is obtained by substituting names of elements in DI for the unbound variablesin P . The truth valuation determined by I is a mapping vI : closed sentences of L(DI) → {T,F}defined by:

1. vI(true) = T and vI(false) = F.

2. For P atomic, having the form p(t1, . . . , tn), vI(s) = T iff pI holds for (I(t1), . . . , I(tn)).

3. For P of the form ¬Q, vI(P ) = T iff vI(Q) = F.

4. For P of the form P ∨Q, vI(P ) = T iff vI(Q) = T or vI(R) = T.

5. For P of the form Q ∧R, vI(P ) = T iff vI(Q) = T and vI(R) = T.

6. For P of the form Q→ R, vI(P ) = T iff vI(Q) = F or vI(R) = T.

7. For P of the form Q ≡ R, vI(P ) = T iff vI(Q) = vI(R).

8. For P of the form ∀XQ, vI(P ) = T iff vI(Q′) is true for all closed instances Q′ of Q in I.

19

9. For P of the form ∃XQ, vI(P ) = T iff vI(Q′) is true for some closed instance Q′ of Q in I.

Note that the given definitions for the logical connectives are essentially identical to thoseprovided for the propositional calculus. When a variable appears as an argument in a sentence, itrefers to unspecified objects in the domain. The quantifiers constrain the meaning of a sentencecontaining a variable. As examples, consider:

∃Y friends(Y,peter)∀X likes(X,ice cream)

Under the normal interpretation of friends, the first example expresses the statement that thereis at least one individual in the domain that is a friend of peter. Under the usual interpretation oflikes, the second example expresses the statement that everyone in the domain likes ice cream.

In the foregoing sentences, bill, george, kate, etc., are constant symbols and represent objectsin the problem domain. The arguments to a predicate are terms and may also include variables orfunction expressions. For example, if friends is a predicate symbol that is being interpreted asthe usual real-world friends relationship, and father of(david) and father of(andrew) evaluateto George and Bill in the manner described above, then the sentence

friends(father of(david),father of(andrew))

describes a relationship between George and Bill, namely the relationship of being friends. Asanother example, if less than and plus are interpreted as the usual relation < and function + onintegers, then

less than(plus(2,3),7)

expresses the assertion (2 + 3) < 7.Quantification of variables is an important part of predicate calculus semantics. When a vari-

able appears in a sentence, such as X in likes(george X), the variable functions as a placeholder.Any constant allowed under the interpretation can be substituted for it in the expression. Substi-tuting kate or susie for X in likes(george,X) forms the statements likes(george,kate) andlikes(george,susie) as we saw earlier.

The variable X stands for all constants that might appear as the second parameter of thesentence. This variable name might be replaced by any other variable name, such as Y or PEOPLE,without changing the meaning of the expression. Thus the variable is said to be a dummy. In thepredicate calculus, variables must be quantified in either of two ways: universally or existentially.A variable is considered free if it is not within the scope of either the universal or existentialquantifiers. An expression is closed if all of its variables are quantified. A ground expression hasno variables at all. In the predicate calculus all variables must be quantified.

Parentheses are often used to indicate the scope of quantification, that is, the instances ofa variable name over which a quantification holds. Thus, for the symbol indicating universalquantification, ∀:

∀X(p(X)∨q(Y)→r(X))

indicates that X is universally quantified in both p(X) and r(X).Universal quantification introduces problems in computing the truth value of a sentence, because

all the possible values of a variable symbol must be tested to see whether the expression remainstrue. For example, to test the truth value of ∀X likes(george,X), where X ranges over theset of all humans, all possible values for X must be tested. If the domain of an interpretation isinfinite, exhaustive testing of all substitutions to a universally quantified variable is computationallyimpossible: the algorithm may never halt. Because of this problem, the predicate calculus is said

20

to be undecidable. Because the propositional calculus does not support variables, sentences canonly have a finite number of truth assignments, and we can exhaustively test all these possibleassignments. This is done with the truth table, Section 2.1.

As seen in the foregoing, variables may also be quantified existentially, indicated by the symbol∃. In the existential case the expression containing the variable is said to be true for at least onesubstitution from its domain of definition. The scope of an existentially quantified variable is alsoindicated by enclosing the quantified occurrences of the variable in parentheses.

Evaluating the truth of an expression containing an existentially quantified variable may be noeasier than evaluating the truth of expressions containing universally quantified variables. Supposewe attempt to determine the truth of the expression by trying substitutions until one is found thatmakes the expression true. If the domain of the variable is infinite and the expression is false underall substitutions, the algorithm will never halt.

Several relationships between negation and the universal and existential quantifiers are givenbelow. Some of these relationships are used in resolution refutation systems described in Chapter14. For predicates p and q and variables X and Y:

∀Xp(X)≡ ¬∃¬p(X)∃Xp(X)≡ ¬∀¬p(X)¬∀Xp(X)≡ ∃X¬p(X)¬∃Xp(X)≡ ∀X¬p(X)∃Xp(X)≡ ∃Yp(Y)∀Xp(X)∀∃Yp(Y)∀X(p(X)∧q(X))≡ ∀Xp(X)∧∀Xq(X)∃X(p(X)∧q(X))≡ ∃Xp(X)∨∃Xq(X)

By straightforward appeal to the foregoing definition of interpretation, it can be verified that, forany interpretation I, the value of vI for all of these sentences will be T. The import of the third andfourth of the above is that the choice of the variable that is quantified has no effect on the semanticinterpretation (meaning) of the sentence. For this reason such variables are sometimes called dummyvariables. They are basically just place holders, and their symbolic names are irrelevant.

Here we provide an example of the use of predicate calculus to describe a simple world. Thedomain of discourse is a set people upon which is defined a collection of family relationships in abiblical genealogy:

mother(eve,abel)

mother(eve,cain)

father(adam,abel)

father(adam,cain)

∀X∀Y(father(X,Y)∨mother(X,Y)→parent(X,Y))

∀X∀Y∀Z(parent(X,Y)∧parent(X,Z)→sibling(Y,Z))

∀Y∀Z(∃(parent(X,Y)∧parent(X,Z))→sibling(Y,Z)

In this example we use the predicates mother and father to define a set of parent-child relation-ships. The implications give general definitions of other relationships, such as parent and sibling,in terms of these predicates. Intuitively, it is clear that these implications can be used to inferfacts such as sibling(cain,abel). To formalize this process so that it can be performed on acomputer, care must be taken to define inference algorithms and to ensure that such algorithmsindeed draw correct conclusions from a set of predicate calculus assertions. It turns out that thelast two of the above sentences are “semantically equivalent” in the sense that each can be inferred

21

from the other by means of established inference rules. We address the issue of inference rules andsemantic equivalence in Section 2.3.

Many grammatically correct English sentences can be represented in the first-order predicatecalculus using the symbols, connectives, and variable symbols defined in this section. It is impor-tant to note that there is no unique mapping of sentences into predicate calculus expressions; infact, an English sentence may have any number of different predicate calculus representations. Amajor challenge for AI programmers is to find a scheme for using these predicates that optimizesthe expressiveness and efficiency of the resulting representation. Examples of English sentencesrepresented in predicate calculus are:

If it doesn’t rain on Monday, Tom will go to the mountains.¬weather(rain,monday)→go(tom,mountains)

Emma is a Doberman pinscher and a good dog.gooddog(emma)∧isa(emma,doberman), or maybeisa(emma,gooddog)∧isa(emma,doberman)

All basketball players are tall.∀X(basketball player(X)→tall(X))

Some people like anchovies.∃X(person(X)∧likes(X,anchovies))

If wishes were horses, beggars would ride.equal(wishes, horses)→ride(beggars)

Nobody likes taxes.¬∃Xlikes(X,taxes)

The term “first-order” refers to the fact that individual variables are interpreted as ranging overthe objects in an underlying domain of discourse. This distinguishes it from second-order systemswhich allow variables to also range over predicates (and functions). For example, the expression

∀p(object1,object2)

asserts that object1 and object2 are related by all binary predicates defined over the given domain.An interesting and important example of this is it use in formulating the theory of arithmetic. Infirst-order arithmetic, the principle of mathematical induction is formulated by adopting as axiomsall formulas having the form:

p(0)∧∀X(p(N)→p(N+1))→ ∀Np(N)

where p is any unary predicate symbol, 0 is interpreted and the natural number zero, N is avariable symbol that ranges over the domain of natural numbers, and + is interpreted as additionof numbers. Accordingly, since arithmetic has infinitely many unary predicates, there are infinitelymany such axioms. In second-order arithmetic, this same principle is formulated by means of thesingle axiom

∀p[p(0)∧∀X(p(N)→p(N+1))→ ∀Np(N)]

Thus the ability to quantify over predicates lends considerable expressive power. Some researchers(McCarthy 1968, Appelt 1985) have used higher-order languages to represent knowledge in naturallanguage understanding programs. We will not need this for the present treatment, however.

2.3.3 A “Blocks World” Example of Semantic Meaning

22

Here we digress to give an extended example of a truth value assignment to a set of predicatecalculus expressions. Suppose we want to model the blocks world of Figure 11 to design, forexample, a control algorithm for a robot arm. We can use predicate calculus sentences to representthe qualitative relationships in the world: does a given block have a clear top surface? can we pickup block a? etc. Assume that the computer has knowledge of the location of each block and thearm and is able to keep track of these locations (using three-dimensional coordinates) as the handmoves blocks about the table.

We must be very precise about what we are proposing with this “blocks world” example. First,we are creating a set of predicate calculus expressions that is to represent a static snapshot of theblocks world problem domain. As we will see in Section 2.3, this set of blocks offers an interpretationand a possible model for the set of predicate calculus expressions.

Second, the predicate calculus is declarative, that is, there is no assumed timing or order forconsidering each expression. Nonetheless, in the planning section of this book, Section 8.4, we willadd a “procedural semantics”, or a clearly specified methodology for evaluating these expressionsover time. A concrete example of a procedural semantics for predicate calculus expressions isProlog. This situation calculus we are creating will introduce a number of issues, including theframe problem and the issue of non-monotonicity of logic interpretations, that will be addressedlater in this book. For this example, however, it is sufficient to say that our predicate calculusexpressions will be evaluated in a top-down and left-to-right fashion.


creating will introduce a number of issues, including the frame problem and the issue ofnon-monotonicity of logic interpretations, that will be addressed later in this book. For thisexample, however, it is sufficient to say that our predicate calculus expressions will beevaluated in a top-down and left-to-right fashion.

To pick up a block and stack it on another block, both blocks must be clear. In Figure2.3, block a is not clear. Because the arm can move blocks, it can change the state of theworld and clear a block. Suppose it removes block c from block a and updates theknowledge base to reflect this by deleting the assertion on(c,a). The program needs to beable to infer that block a has become clear.

The following rule describes when a block is clear:

∀ X (¬ ∃ Y on(Y,X) → clear(X))

That is, for all X, X is clear if there does not exist a Y such that Y is on X.This rule not only defines what it means for a block to be clear but also provides a

basis for determining how to clear blocks that are not. For example, block d is not clear,because if variable X is given value d, substituting b for Y will make the statement false.Therefore, to make this definition true, block b must be removed from block d. This iseasily done because the computer has a record of all the blocks and their locations.

Besides using implications to define when a block is clear, other rules may be addedthat describe operations such as stacking one block on top of another. For example: tostack X on Y, first empty the hand, then clear X, then clear Y, and then pick_up X andput_down X on Y.

∀ X ∀ Y ((hand_empty ∧ clear(X) ∧ clear(Y) ∧ pick_up(X) ∧ put_down(X,Y))→ stack(X,Y))

a d

c b

on(c,a)

on(b,d)

ontable(a)

ontable(d)

clear(b)

clear(c)

hand_empty

Figure 2.3 A blocks world with its predicatecalculus description.

Figure 11: A blocks world with its predicate calculus description.

To pick up a block and stack it on another block, both blocks must be clear. In Figure 11, blocka is not clear. Because the arm can move blocks, it can change the state of the world and cleara block. Suppose it removes block c from block a and updates the knowledge base to reflect thisby deleting the assertion on(c,a). The program needs to be able to infer that block a has becomeclear. The following rule describes when a block is clear:

∀X¬∃Yon(Y,X)→clear(X))

That is, for all X, X is clear if there does not exist a Y such that Y is on X.

23

This rule not only defines what it means for a block to be clear but also provides a basis fordetermining how to clear blocks that are not. For example, block d is not clear, because if variableX is given value d, substituting b for Y will make the statement false. Therefore, to make thisdefinition true, block b must be removed from block d. This is easily done because the computerhas a record of all the blocks and their locations.

Besides using implications to define when a block is clear, other rules may be added that describeoperations such as stacking one block on top of another. For example: to stack X on Y, first emptythe hand, then clear X, then clear Y, and then pick up X and put down X onY, to wit:

∀X∀Y((hand empty∧clear(X)∧clear(Y)∧pick up(X)∧put down(X,Y))→stack(X,Y))

Note that in implementing the above description it is necessary to “attach” an action of therobot arm to each predicate such as pick up. As noted previously, for such an implementationit was necessary to augment the semantics of predicate calculus by requiring that the actions beperformed in the order in which they appear in a rule premise. How- ever, much is gained byseparating these issues from the use of predicate calculus to define the relationships and operationsin the domain.

Figure 11 gives a semantic interpretation of these predicate calculus expressions. This inter-pretation maps the constants and predicates in the set of expressions into a domain D, here theblocks and relations between them. The interpretation gives truth value T to each expression inthe description. Another interpretation could be offered by a different set of blocks in anotherlocation, or perhaps by a team of four acrobats. The important question is not the uniquenessof interpretations, but whether the interpretation provides a truth value for all expressions in theset and whether the expressions describe the world in sufficient detail that all necessary inferencesmay be carried out by manipulating the symbolic expressions. The next section uses these ideas toprovide a formal basis for predicate calculus inference rules.

2.3.4 Semantic Interpretations—Part 2

In Section 2.2.2 the concept of interpretation was defined for the closed sentences of a first-orderlanguage. Note that it makes intuitive sense that only closed formulas can be determined to betrue or false. For example, in a first-order theory of arithmetic, a sentence such as

∀X∀Y(X=Y→Y=X)

would be true in the domain of natural numbers because the statement is clearly true for all possiblechoices of numbers for X and Y. However, the sentence

X=Y→Y=X

cannot have a truth value since the variables X and Y are indeterminate; without a specific designa-tion for the variables, the sentence is neither true nor false. For this reason we extend the foregoingnotion of interpretation to one that includes sentences that are not closed as follows. Let I be aninterpretation with domain D for a first-order language L, and let L(D) be the language, as definedin Section 2.3.2, obtained from L by adding a new constant symbol d̂ for each element d ∈ D. Asentence P of L is valid in I if:

1. P is closed and vI(P ) = T, or

2. P is not closed and vI(P ′) = T for all closed instances P ′ of P in I.

As with the propositional calculus, there are numerous axiomatizations for the predicate cal-culus. Examples may be found in the same references as given in Section 2.2. Here we need only

24

know that, for any such axiomatization, there is a set of theorems as described in Section 2.1. Afirst-order theory T will be a formal logical system having a first-order language L(T ) and someaxioms and inference rules, thereby having a set of theorems expressed as sentences of L(T ).

By a model of T is meant any interpretation of L(T ) in which the theorems of T are valid. Asimple example is provided by the discussion of the Blocks World in Section 2.3.3. If the propositionsshown in the lower part of Figure 11 are taken as axioms for some first-order theory T (along withother axioms essential to all first-order theories), then the diagram shown in the upper part of thefigure would be an interpretation in which all the given axioms are valid (in fact, true). Then,assuming the inference rules of T preserve truth (a hopefully safe assumption), this means that alltheorems derivable form these axioms will be valid in this same interpretation, in which case theinterpretation is a model of T .

Just as L(T ) can have many (typically infinitely many) different interpretations (c.f., Section2.3.2), a theory can have many (typically infinitely many) models. A sentence of L(T ) is said to belogically valid if it is valid in every interpretation for L(T ). A sentence of L(T ) is said to be validin T , if it is valid in every model of T . Thus, by definition, logically valid sentences will be valid inany first-order theory T . Section 2.3.2 listed some sentences expressing relationships between theuniversal and existential quantifiers. It is not difficult to see that these are all logically valid. Forexample, consider the first one,

∀Xp(X)≡ ¬∃¬p(X)

Let T be any theory and I be any interpretation for the language L of T , where L containsthe predicate symbol p. By definition of “sentence” for first-order languages, the above sentencewill be a sentence in L. Let D be the domain associated with I. It is desired to verify thatboth sides of the expressed equivalence will always have the same truth value in I. Suppose firstthat vI(∀Xp(X)) = T. Then, by definition of “interpretation” (for the universal quantifier ∀),vI(p(d̂)) = T, for all d ∈ D. Then, by definition of “interpretation” (for the connective ¬), thereis no d ∈ D for which vI(¬p(d̂)) = T. Then, by definition of “interpretation” (for the existentialquantifier ∃), vI(∃X¬p(X)) = F. Then, again by definition of “interpretation” (for the connective¬), vI(¬∃X¬p(X)) = T. One can similarly argue in the reverse direction that, if vI(¬∃X¬p(X)) = T,then vI(∀Xp(X)) = T. This verifies the equivalence. The other equivalences in Section 2.3.2 can beverified by similar arguments.

Given these concepts, we have:

• Soundness/consistency: All theorems of T are valid in T .

• Adequacy/completeness: All sentences valid in T are theorems of T .

A first-order theory is inconsistent if it does not have a model. Typically, this will be becausethe theory allows for the derivation of a theorem having the form P ∧ ¬P . A sentence having thisform is called a contradiction. Clearly, there is no interpretation that can make such a sentencevalid (or true), so there can be no model. Note, however, that inconsistency is a purely semanticnotion, whereas contradiction is a syntactic notion. A theory is consistent if it is not inconsistent,i.e., if it has a model.

In Section 2.1 was introduced the general notion of a semantically valid inference rule. For thepurpose of employing formal logic in automated reasoning systems, it is important to be assuredthat all the inference rules being applied are valid. We here make this precise for the case ofpredicate logic and first-order theories. Let T be a first-order theory, let P be a sentence in thelanguage of T , let S be a set of sentences of this same language, and let I be an interpretation for

25

this language. Then I is said to satisfy P , if P is valid in I, and I is said to satisfy S if it satisfiesall the sentences in S. We say that a sentence P logically follows from a set of sentences S, if everyinterpretation that satisfies S also satisfies P . Then an inference rule is sound if the conclusionlogically follows from the premises. We shall say that two sentences P and Q are semanticallyequivalent if each logically follows from the other.

For the purposes of the ensuing discussion of expert reasoning systems and the Prolog program-ming language, we shall here elaborate a collection of valid sentences, sound inference rules, andlogically valid reasoning patterns. First let us observe that, for any sentence P that contains somevariables X1,. . .,Xn and which does not contain any quantifiers,

P is logically equivalent with ∀X1 . . . ∀XnP .

This is easily verified as a consequence of the definitions of validity for the two types of sentences.It should be noted, however, that this is a fact about the semantics, and the two sentences typicallywill not be formally derivable from one another in the relevant first-order theory. This semanticequivalence happens to play an important role in the Prolog programming language, which doesnot employ quantifiers, by allowing sentences that contain variables to be interpreted as if all thevariables are universally quantified.

A variable is free in a sentence P if it does not occur within the scope of any quantifier. Anotherfact that will be useful is that, if X is not free in Q, then

∀X(P → Q) is logically equivalent with ∃XP → Q.

This can be verified as follows:

∀X(P → Q)

∀X(¬P ∨Q) A→ B ≡ ¬A ∨B

∀X¬P ∨Q follows because X is not free in Q

¬∃XP ∨Q ∀¬ is equivalent with ¬∃

∃XP → Q A→ B ≡ ¬A ∨B

Here we have been able to employ formal equivalences determined earlier for all steps exceptthe third, which requires a semantic argument. Suppose we have an interpretation I for whichvI(∀X(¬P ∨ Q)) = T. Then, by definition of interpretation, vI(¬P ∨ Q)(d̂/X) = T for all d ∈ DI .But since X does not occur free in Q, this is equivalent with saying that vI(¬P (d̂/X) ∨Q) = T forall d ∈ DI . Hence vI(∀x¬P ∨Q) = T.

Some inference rules that are sound are the following:

Modus Ponens: From P and P → Q infer Q.

Modus Tollens: From P → Q and ¬Q infer ¬P .

And Elimination: From P ∧Q infer P ; from P ∧Q infer Q.

And Introduction: From P and Q infer P ∧Q.

Universal Instantiation: From ∀XP infer P (t/X) for any term t.

Aristotelean Syllogism: From ∀X(P → Q) and P (t/X) infer Q(t/X) for any term t.

26

where the notations P (t/X) and Q(t/X) represent the result of substituting the term t for thevariable X in P and Q.

The first four of the above rules are sound for both the propositional calculus and the predicatecalculus. This can be easily established by inspecting the definitions of their respective interpreta-tions for the relevant logical connectives. For example, the soundness of Modus Ponens can be seenfrom the interpretation of the connective→; in order to have both vI(P ) = T and vI(P → Q) = T,it is necessary that vI(Q) = T. This can also be seen by inspecting the truth table for → andobserving that, in the only row where both P and P → Q are true, Q is also true.

The latter two of the above involve quantifiers and so only apply to the predicate calculus. Thefirst of these follows by the foregoing discussion about the equivalence of ∀X1 . . . ∀XnP and P . If∀XP is valid in some interpretation, then by that discussion P (with free variable X) is valid; whichmeans that, no amtter what the term t might evaluate to, P (t/X) will be valid. The latter followsby a simple two-step argument which can be illustrated by the example from which the rule derivesits name. The philosopher Aristotle presented Modus Ponens in the famous statement:

All men are mortal; Socrates is a man; therefore Socrates is mortal.

This can be formulated as

If ∀X(man(X)→ mortal(X)) and man(socrates), then mortal(socrates).

This can be proven to be sound by showing that the rule can be derived from two previous rules(which we assume have already been proven sound). First, from

∀X(man(X)→ mortal(X))

by Universal Instantiation we can infer

man(socrates)→ mortal(socrates)

i.e., replacing X with socrates. Then, from this and

man(socrates)

we can infer

mortal(socrates)

by Modus Ponens.

The procedure in this proof where we replaced X in man(X)→ mortal(X) with socrates so thatthe premise of this inference will match with the sentence man(socrates) has come to be calledunification and happens to play an important role in our applications of formal logic to automatedexpert reasoning systems. We study this in depth in Section 2.3.5.

2.4 Knowledge Representation, Decidability, and Horn Logic

Due to the work of Whitehead and Russell, discussed in Chapter 1, it is known that first-orderpredicate logic is adequace for expressing all of standard set theory, and since set theory formsthat basis of virtually all of mathematics (arithemetic, relational algebra, integral calculus, etc.)this means that the logic is adequate for formalizing the entirety of mathematical knowledge. Thismakes it especially compelling for purposes of knowledge representation. Anything that can beclearly articulated can be expressed in such a system.

Unfortunately, it happens that predicate logic is undecidable. This means that, for any first-order theory, there is no algorithm that can determine whether a given sentence is a theorem. Theproof of this fact is quite complex, but is now textbook knowledge; for example, see Corollary 7.47

27

in (Hamilton 1988). This may be contrasted with the propositional calculus, which is decidable.By the soundness and adequacy theorems, Section 2.2, a proposition of the calculus is a theorem ifand only if it is a tautology, and one can determine if a proposition is a tautology by constructingits truth table. Thus truth table construction serves as a decision procedure. For the predicatecalculus, however, this is not possible. Intuitively, this may be understood as a consequence of thefact that a first-order theory can have infinitely many models, and it is impossible to determinealgorithmically if a given sentence is valid in all such models.

In turn, this undecidability result means that the full first-order logic is not amenable to directimplementation in automated reasoning systems. However, if one restricts the language to a certainsublanguage of the full predicate calculus, then one does have decidability. This is due to a resultby Alfred Horn (1951).

A literal is a positive (not negated) or negative (negated) atomic formula. That is, if P isatomic, both P and ¬P are literals. A Horn clause is a disjunction of literals, at most one of whichis positive. Thus a Horn clause is a sentence of the form

¬P1 ∨ · · · ∨ ¬Pn ∨ P

or

¬P1 ∨ · · · ∨ ¬Pn

It is easy to see that the former is equivalent with

P1 ∧ · · · ∧ Pn → P

and the latter is equivalent with

P1 ∧ · · · ∧ Pn → false

Simply write out the truth tables for the two pairs of sentences and observe that, in each pair, thetruth tables are the same.

Because of the fact discussed in Section 2.3.4 that

P is logically equivalent with ∀X1 . . . ∀XnP .

one can treat a Horn clause as one that is universally quantified (i.e., has ∀ quantifiers for all itsvariables). In effect, this means that Horn clauses cannot involve existential quantifiers, and sotheir expressive capability is considerably weaker than that of the full predicate logic. Nonetheless,in practice it has been found that Horn clauses are sufficiently expressive to be useful in a broadrange of applications. They in fact comprise the full syntax of the Prolog programming language,and they will be used in the ensuing discussion of expert reasoning systems.

Being restricted to a language that does not feature the existential quantifier is a limitation,but it turns out that much can still be expressed, and many useful applications can be created,using only universally quantified (or free-variable) expressions. In addition, it is sometimes possibleto replace an expression involving existential quantifiers with an equivalent one that does not. Forexample, it was noted in Section 2.3.4 that, if X is not free in Q, then

∀X(P → Q) is logically equivalent with ∃XP → Q.

Thus the former can be taken in place of the latter.In addition, one can sometimes remove existential quantifiers through a process known as

Skolemization. This is named after the Norwegian mathematician, Thoralf Skolem, who intro-duced this procedure in his study of arithmetic. The basic idea is that an existential quantifier thatasserts that there is an object having some property can be removed by replacing the quantifiedvariable with a function that produces the object whose existence is being asserted. For example,

28

the expression ∀X∃Y mother(X,Y), asserts that for every X there is a Y such that Y is the motherof X (every X has a mother). Here the value of the existentially quantified variable Y dependson the value of X. In this case, Skolemization replaces Y with a term such as motherOf(X) wheremotherOf is a function symbol representing a function that returns the mother of whatever objectis associated with the variable X. This yeilds the sentence ∀X mother(X,motherOf(X). In order toimplement this process in an automated system, it is necessary that the function associated withthe new function symbol be computable. In the present example, if the interpretation domain issmall, not having many people, this could be accomplished by providing a database that associatespeople with their mothers. If the domain were the entire world, however, this is clearly not feasible.Thus, in general, it is not always possible to apply Skolemization effectively in automated systems.

While a restriction to Horn logic limits the expressive power of the language, it turns out thatthis logic is nonetheless agequate for many practical applications. As will be seen, it serves as thebasis for the Prolog progarmming language.

2.5 Unification

To apply inference rules such as ModusPponens, an inference system must be able to determinewhen two expressions are the same or match. In propositional calculus, this is trivial: two expres-sions match if and only if they are syntactically identical. In predicate calculus, the process ofmatching two sentences is complicated by the existence of variables in the expressions. Universalinstantiation allows universally quantified variables to be replaced by terms from the domain. Thisrequires a decision process for determining the variable substitutions under which two or moreexpressions can be made identical (usually for the purpose of applying inference rules, like ModusPonens).

Unification is an algorithm for determining the substitutions needed to make two predicatecalculus expressions match. We have already seen this done in the previous subsection, wheresocrates in man(socrates) was substituted for X in ∀X(man(X) → mortal(X)). This allowedthe application of Modus Ponens and the conclusion mortal(socrates). Another example ofunification was seen previously when dummy variables were discussed. Because p(X) and p(Y) areequivalent, Y may be substituted for X to make the sentences match.

Unification and inference rules such as Modus Ponens allow us to make inferences on a set oflogical assertions, where the assertions are expressed as first-order sentences. We shall refer to sucha set of sentences as a logical database. In order to perform the inferencing, the logical databasemust be expressed in an appropriate form. An essential aspect of this form is the requirement thatall sentences be logically equivalent to Horne clauses. Thus all variable occurrences are free. Thisallows full freedom in computing substitutions.

Recall from the foregoing that a sentence with free variables is semantically equivalent with thesentence obtained by applying universal quantifiers to those variables. Also recall that, under somecircumstances, it is possible to eliminate an existential quantifier via Skolemization or by replacingthe sentence with an equivalent one that does not contain the quantifier. Once the existentiallyquantified variables have been removed from a logical database, unification may be used to matchsentences in order to apply inference rules such as Modus Ponens.

Unification is complicated by the fact that a variable may be replaced by any term, includingother variables and function expressions of arbitrary complexity. These expressions may themselvescontain variables. For example, father(jack) may be substituted for X in man(X) to infer that jack’sfather is mortal.

Some instances of the expression

foo(X,a,goo(Y)).

29

generated by legal substitutions are given below:

1. foo(fred,a,goo(Z))

2. foo(W,a,goo(jack))

3. foo(Z,a,goo(moo(Z)))

In this example, the substitution instances or unifications that would make the initial expressionidentical to each of the other three are written as the sets:

1. {fred/X,Z/Y}

2. {W/X,jack/Y}

3. {Z/X,moo(Z)/Y}

As in the foregoing, the notation E/X indicates that the expression E is substituted for the variableX in the original expression.

In defining the unification algorithm that computes the substitutions required to match twoexpressions, a number of issues must be taken into account. First, although a constant may besystematically substituted for a variable, any constant is considered a “ground instance” and maynot be replaced. Neither can two different ground instances be substituted for one variable; onemust have uniform substitution of the same ground instance for the same variable throughout theexpression. Second, a variable cannot be unified with a term containing that variable. For example,X cannot be replaced by p(X) as this leads an infinite expression: p(p(p(p(...X)...))) (i.e., theautomated routine that performs the substitution would fall into an infinite recursion). The testfor this situation is called the occurs check. Strictly speaking, the occurs check should be performedbefore each substitution is attempted. As this adds considerable computational overhead, however,it is customary for the occurs check to be dropped. This is justified on the grounds that anexperienced programmer would know how to write logical databases in such a manner that thisparticular problem would not arise.

Furthermore, a problem-solving process often requires multiple inferences and, consequently,multiple successive unifications. Logic problem solvers must maintain consistency of variable sub-stitutions. Similarly as with ground instances, as discussed above, is important that any unifyingsubstitution be made consistently across all occurrences within the scope of the variable in bothexpressions being matched. This was seen before when socrates was substituted not only for thevariable X in man(X) but also for the variable X in mortal(X).

Once a variable has been replaced, future unifications and inferences must take this same re-placement into account. If a variable is replaced by a constant, that variable may not be replacedby any other expression in a future unification. If a variable Y is substituted for another variableX and at a later time Y is replaced by a constant, then this is regarded as having replaced the X

with that constant. The set of substitutions used in a sequence of inferences is important, becauseit may contain the answer to a query (Section 14.2.5). For example, if p(a,X) unifies with thepremise of p(Y,Z)→q(Y,Z) with substitution set {a/Y,X/Z}, giving p(a,X)→q(a,X), Modus Po-nens lets us infer q(a,X) under the same substitution. If we unify this result with the premise ofq(W,b)→r(W,b) by means of the substitution set {a/W,b/X}, we infer r(a,b).

Another important concept is the composition of unification substitutions. If S and S′ are twosubstitution sets, then the composition of S and S′ (written SS′) is obtained by applying S′ to the

30

elements of S and adding the results to S′. This procedure goes as follows.1 For each substitutionE1/E2 in S′ for which there exist one or more substitutions E2/E3,1, . . . , E2/E3,n in S, add thesubstitutions E1/E3,1 . . . E1/E3,n to SS′. Next add to SS′ all substitutions in S that were not anysubstitutions of the kind E2/E3,1, . . . , E2/E3,n as just described. Finally, add all the substitutionsin S′ to SS′. As an example, consider the task of composing the following two substitution sets:

{X/Y,W/Z}{V/X}

with the former being S and the latter being S′. Following the procedure, we see that we have V/X

in S′ and X/Y in S, so we put V/Y in SS′. Then we add W/Z to SS′ because it was not a substitionof the kind we just used. Finally, we add the contents of S′ to SS′. This gives

{V/Y, W/Z, V/X}

To continue with this example, suppose now that we wish to compose this substitution set withthe set

{a/V, f(b)/W}.

Taking S = {V/Y, W/Z, V/X} and S′ = {a/V, f(b)/W} we see that we have a/V in S′ and V/Y in S,so we add a/Y to SS′. In addition, observe that we have a/V in S′ with V/X in S, so we add a/X

to SS′. Futhermore, we have f(b)/W in S′ and W/Z in S, so we add f(b)/Z to SS′. There are nofurther substitutions to consider in S. So last we add the contents of S′ to SS′. This gives

{a/Y, a/X, f(b)/Z, a/V, f(b)/W}

It can be seen that, if we follow the same procedure to first compose {V/X} with {a/V,f(b)/W},we get

{a/X, a/V, f(b)/W}.

Then if we again apply the procedure to compose {X/Y,W/Z} with the above, we get

{a/Y, f(b)/Z, a/X, a/V, f(b)/W}

which is equal to the SS′ obtained above. This shows that whether we compose the substitutionsets

{X/Y,W/Z}{V/X}{a/V,f(b)/W}

in the order of left to right or in the order of right to left, we get the same result. In general,composition is associative but not commutative. The exercises present these issues in more detail.Composition is the method by which unification substitutions are combined and returned in therecursive function unify, presented next.

A further requirement of the unification algorithm is that the unifier be as general as possible:that the most general unifier be found. This is important, as will be seen in the next example,because, if generality is lost in the solution process, it may lessen the scope of the eventual solutionor even eliminate the possibility of a solution entirely.

For example, in unifying p(X) and p(Y) any constant substitution such as {fred/X,fred/Y}will work. However, fred is not the most general unifier; any variable would produce a moregeneral expression, e.g., {Z/X,Z/Y}. The solutions obtained from the first substitution instancewould always be restricted by having the constant fred limit the resulting inferences; i.e., fredwould be a unifier, but it would lessen the generality of the result.

1It may be convevient to imagine that the substutions are being applied to some expression E, in which case ESS′

suggests that one first applies S to E getting some expression E’ and then applies S′ to E’.

31

A most general unifier (mgu) for a set of expressions E is any substitution set G such that, ifthe substitution set S is a unifier for E, then there exists a substitution set S′ such that S = GS′.Continuing with the foregoing example, take E = {p(X), p(Y)}, S = {fred/X, fred/Y}, and G ={Z/X, Z/Y}. Then taking S′ = {fred/Z}, gives S = GS′.

The most general unifier for a set of expressions is unique except for alphabetic variations; i.e.,whether a variable is eventually called X or Y does not make any difference to the generality of theresulting unifications.

Unification is important for any artificial intelligence problem solver that uses the predicatecalculus for representation. Unification specifies conditions under which two (or more) predicatecalculus expressions may be said to be equivalent. This allows use of inference rules, such asResolution, with logic representations, a process that often requires backtracking to find all possibleinterpretations.

We next present pseudo-code for a function, unify, that can compute the unifying substitutions(when this is possible) between two predicate calculus expressions. Unify takes as arguments twoexpressions in the predicate calculus and returns either the most general set of unifying substitutionsor the constant FAIL if no unification is possible. It is defined as a recursive function: first,it recursively attempts to unify the initial components of the expressions. If this succeeds, anysubstitutions returned by this unification are applied to the remainder of both expressions. Theseare then passed in a second recursive call to unify, which attempts to complete the unification.The recursion stops when either argument is a symbol (a predicate, function name, constant, orvariable) or the elements of the expression have all been matched.

To simplify the manipulation of expressions, the algorithm assumes a slightly modified syntax.Because unify simply performs syntactic pattern matching, it can effectively ignore the predicatecalculus distinction between predicates, functions, and arguments. By representing an expressionas a list (an ordered sequence of elements) with the predicate or function name as the first elementfollowed by its arguments, we simplify the manipulation of expressions. Expressions in which anargument is itself a predicate or function expression are represented as lists within the list, thuspreserving the structure of the expression. Lists are delimited by parentheses, and list elements areseparated by spaces. This syntax is borrowed from the LISP programming language. Examples ofexpressions in both predicate calculus (PC) and list syntax include:

PC SYNTAX

p(a,b)

p(f(a),g(X,Y))

equal(eve,mother(cain))

LIST SYNTAX

(p a b)

(p (f a) (g X Y))

(equal eve (mother cain))

We next present the function unify:

function unify(E1, E2);

begin

case

both E1 and E2 are constants or the empty list: %recursion stops

if E1 = E2 then return {}else return FAIL;

32

E1 is a variable:

if E1 occurs in E2 then return FAIL %occurs check

else return E2/E1;

E2 is a variable:

if E2 occurs in E1 then return FAIL %occurs check

else return E1/E2

either E1 or E2 are empty then return FAIL %the lists are of different sizes

otherwise: %both E1 and E2 are lists

begin

HE1 := first element of E1;

HE2 := first element of E2;

SUBS1 := unify(HE1,HE2);

if SUBS1 : = FAIL then return FAIL;

TE1 := apply(SUBS1, rest of E1);

TE2 : = apply (SUBS1, rest of E2);

SUBS2 : = unify(TE1, TE2);

if SUBS2 = FAIL then return FAIL;

else return composition(SUBS1,SUBS2)

end

end %end case

end %end unify

A Unification Example

The behavior of the preceding algorithm may be clarified by tracing the call

unify((parents X (father X) (mother bill)), (parents bill (father bill) Y)).

When unify is first called, because neither argument is an atomic sentence, the function willattempt to recursively unify the first elements of each expression, calling

unify(parents, parents).

This unification succeeds, returning the empty substitution, {}. Applying this to the remainder ofthe expressions creates no change; the algorithm then calls

unify((X (father X) (mother bill)), (bill (father bill) Y)).

A tree depiction of the execution at this stage appears in Figure 12.In the second call to unify, neither expression is atomic, so the algorithm separates each

expression into its first component and the remainder of the expression. This leads to the call

unify(X, bill).

This call succeeds, because both expressions are atomic and one of them is a variable. The callreturns the substitution {bill/X}. This substitution is applied to the remainder of each expressionand unify is called on the results, as in Figure 13:

unify(((father bill) (mother bill)), ((father bill)Y)).

The result of this call is to unify (father bill) with (father bill). This leads to the calls

unify(father, father)

unify(bill, bill)

unify(( ), ( ))

33

All of these succeed, returning the empty set of substitutions as seen in Figure 2.6.Unify is then called on the remainder of the expressions:

unify(((mother bill)), (Y)).

This, in turn, leads to calls

unify((mother bill), Y)

unify(( ),( )).

In the first of these, (mother bill) unifies with Y. Notice that unification substitutes the wholestructure (mother bill) for the variable Y. Thus, unification succeeds and returns the substitution{(mother bill)/Y}. The call

unify(( ),( ))

returns {}. All the substitutions are composed as each recursive call terminates, to return theanswer {bill/X (mother bill)/Y}. A trace of the entire execution appears in Figure 14. Eachcall is numbered to indicate the order in which it was made; the substitutions returned by each callare noted on the arcs of the tree.


In the first of these, (mother bill) unifies with Y. Notice that unification substitutes thewhole structure (mother bill) for the variable Y. Thus, unification succeeds and returns thesubstitution {(mother bill)/Y}. The call

unify(( ),( ))

returns { }. All the substitutions are composed as each recursive call terminates, to returnthe answer {bill/X (mother bill)/Y}. A trace of the entire execution appears in Figure 2.6.Each call is numbered to indicate the order in which it was made; the substitutionsreturned by each call are noted on the arcs of the tree.

1. unify((parents X (father X) (mother bill)), (parents bill (father bill) Y))

Unify first elementsand apply

substitutions to rest

return { }

2. unify(parents, parents) 3. unify((X (father X) (mother bill)),(bill (father bill) Y))

Figure 2.4 Initial steps in the unification of (parents X(father X) (mother bill)) and (parents bill (father bill) Y).

Figure 2.5 Further steps in the unification of (parents X(father X) (mother bill)) and (parents bill (fatherbill) Y).




return { }




return {bill/X}

4. unify(X,bill) 5. unify(((father bill) (mother bill)),((father bill) Y))

Figure 12: Initial steps in the unification of (parents X (father X) (mother bill)) and(parents bill (father bill) Y).

34


In the first of these, (mother bill) unifies with Y. Notice that unification substitutes thewhole structure (mother bill) for the variable Y. Thus, unification succeeds and returns thesubstitution {(mother bill)/Y}. The call

unify(( ),( ))

returns { }. All the substitutions are composed as each recursive call terminates, to returnthe answer {bill/X (mother bill)/Y}. A trace of the entire execution appears in Figure 2.6.Each call is numbered to indicate the order in which it was made; the substitutionsreturned by each call are noted on the arcs of the tree.




return { }


Figure 2.4 Initial steps in the unification of (parents X(father X) (mother bill)) and (parents bill (father bill) Y).

Figure 2.5 Further steps in the unification of (parents X(father X) (mother bill)) and (parents bill (fatherbill) Y).




return { }




return {bill/X}

4. unify(X,bill) 5. unify(((father bill) (mother bill)),((father bill) Y))

Figure 13: Further steps in the unification of (parents X (father X) (mother bill)) and(parents bill (father bill) Y).

35





return { }

2. unify(parents, parents) 3. unify((X (father X) (mother bill)), (bill (father bill) Y))



return {bill/X}

4. unify(X,bill) 5. unify(((father bill) (mother bill)), ((father bill) Y))

return {(mother bill)/Y}

return {(mother bill)/Y, bill/X}



return { }

6. unify((father bill), (father bill)) 11. unify(((mother bill)), (Y))





12. unify((mother bill), Y) 13. unify((), ())

return { }



7. unify(father, father) 8. unify((bill), (bill))

return { }



9. unify(bill, bill) 10. unify((), ())

return { }

return { }

return { }

Figure 2.6 Final trace of the unification of (parents X(father X) (mother bill)) and (parents bill (father bill) Y).

Figure 14: Final trace of the unification of (parents X (father X) (mother bill)) and(parents bill (father bill) Y).

36

2.6 Application: A Logic-Based Financial Advisor

As a final example of the use of predicate calculus to represent and reason about problem do-mains, we design a financial advisor using predicate calculus. Although this is a simple example,it illustrates many of the issues involved in realistic applications.

The function of the advisor is to help a user decide whether to invest in a savings account orthe stock market. Some investors may want to split their money between the two. The investmentthat will be recommended for individual investors depends on their income and the current amountthey have saved according to the following criteria:

1. Individuals with an inadequate savings account should always make increasing the amountsaved their first priority, regardless of their income.

2. Individuals with an adequate savings account and an adequate income should consider ariskier but potentially more profitable investment in the stock market.

3. Individuals with a lower income who already have an adequate savings account may want toconsider splitting their surplus income between savings and stocks, to increase the cushion insavings while attempting to increase their income through stocks.

The adequacy of both savings and income is determined by the number of dependents anindividual must support. Our rule is to have at least $5,000 in savings for each dependent. Anadequate income must be a steady income and supply at least $15,000 per year plus an additional$4,000 for each dependent.

To automate this advice, we translate these guidelines into sentences in the predicate calculus.The first task is to determine the major features that must be considered. Here, they are theadequacy of the savings and the earnings. These are represented by the predicates savings account

and earnings, respectively. Both of these are unary predicates, and their argument could be eitherof the constants adequate or inadequate. Thus,

savings account(adequate).

savings account(inadequate).

income(adequate).

income(inadequate).

are their possible values.Conclusions are represented by the unary predicate investment, with possible values of its

argument being stocks, savings, or combination (implying that the investment should be split).In Section 2.3.2 it was noted that there can be many different ways to translate English language

expressions into the Predicate Calculus. Here we may note that another quite natural approachto representing the above facts would be to employ some variables Savings account, Income, andInvestment, introduce an equality predicate equals, and, for example, write

equals(Savings account, adequate)

to express the first of the above four statements. This, however, would require building in ax-ioms and rules for managing the equality predicate, which although certainly feasible would addconsiderable complexity to the overall system. The current approach of taking the truth ofsavings account(adequate) as expressing the same fact greatly simplifies the reasoning. Sim-ilar considerations apply to the other predicates employed in the following.

37

Using these predicates, the different investment strategies are represented by implications. Thefirst rule, that individuals with inadequate savings should make increased savings their main pri-ority, is represented by

savings account(inadequate) → investment(savings).

Similarly, the remaining two possible investment alternatives are represented by

savings account(adequate) ∧ income(adequate) → investment(stocks).savings account(adequate) ∧ income(inadequate) → investment(combination).

It is assumed that the reasoning pertains to some unnamed individual and makes use of certainfacts about that person. In particular, the individual will have a certain number of dependents, acertain amount of money in savings, and a certain amount of earnings, where these are all given asnumeric values, except that income has an additional qualifier as being steady or unsteady.

The advisor must determine when savings and income of the individual in question are adequateor inadequate. This will also be done using an implication. The need to do arithmetic calculationsrequires the use of functions. We are here assuming that we have at least enough arithmetic toperform the necessary extralogical computations. To determine the minimum adequate savings,the function minsavings is defined as

minsavings(X) = 5000 × X.

where X is the number of the individual’s dependents. We also assume that we have a predicatedependents where dependents(X) becomes true when X is replaced with the correct number ofthe individual’s dependents. This enables us to have the adequacy of savings be determined by therules

∀X(amount saved(X)∧∃Y(dependents(Y)∧greater(X, minsavings(Y)))

→savings account(adequate)).

∀X(amount saved(X)∧∃Y(dependents(Y)∧¬greater(X, minsavings(Y)))

→savings account(inadequate)).

In words, the first line says that, for all possible amounts saved X for the individual in question,if there is a Y such that Y is the number of the individual’s dependents, and X is greater than theminimum savings required for that number of dependents, then the savings are adequate. (greateris interpreted as the arithmetic > relation). The second line may be interpreted similarly.

A function minincome is defined as

minincome(X) = 15000 + (4000 × X).

where X is the number of the individual’s dependents. Analogously as with savings account, theadequacy or inadequacy of the individual’s income may be determined by the rules:

∀X(earnings(X, steady)∧∃Y(dependents(Y)∧greater(X, minincome(Y)))

→ income(adequate)).

∀X(earnings(X, steady)∧∃Y(dependents(Y)∧¬greater(X, minincome(Y)))

→ income(inadequate)).

∀X(earnings(X, unsteady) → income(inadequate)).

In order to perform a consultation, a description of a particular investor is added to this set ofpredicate calculus sentences using the predicates amount saved, earnings, and dependents. Thus,an individual with three dependents, $22,000 in savings, and a steady income of $25,000 would bedescribed by

38

amount saved(22000).earnings(25000, steady). dependents(3).

This yields a logical system consisting of the following sentences:

1. savings account(inadequate) → investment(savings).

2. savings account(adequate) ∧ income(adequate) → investment(stocks).

3. savings account(adequate) ∧ income(inadequate) → investment(combination).

4. ∀X(amount saved(X)∧∃Y(dependents(Y)∧greater(X, minsavings(Y)))


5. ∀X(amount saved(X)∧∃Y(dependents(Y)∧¬greater(X, minsavings(Y)))

→savings account(inadequate)).

6. ∀X(earnings(X, steady)∧∃Y(dependents(Y)∧greater(X, minincome(Y)))

→ income(adequate)).

7. ∀X(earnings(X, steady)∧∃Y(dependents(Y)∧¬greater(X, minincome(Y)))

→ income(inadequate)).

8. ∀X(earnings(X, unsteady) → income(inadequate)).

9. amount saved(22000).

10. earnings(25000, steady).

11. dependents(3).

where minsavings(X) = 5000 × X and minincome(X) = 15000 + (4000 × X).Now note that, by means of the fact established in Section 2.3.4 that, if X is not free in Q, then

∀X(P → Q) is logically equivalent with ∃XP → Q, one can remove the existential quantifiers ∃Y ineach of items 4 through 7 by prefixing the sentence with the universal quantifier ∀X. For example,item 4 becomes

∀X∀Y(amount saved(X)∧(dependents(Y)∧greater(X, minsavings(Y)))


Next note that by the fact established in Section 2.3.4 that P is logically equivalent with∀X1,. . .,∀XnP , one can drop all the universal quantifiers and obtain a semantically equivalentsystem, namely

1. savings account(inadequate) → investment(savings).

2. savings account(adequate) ∧ income(adequate) → investment(stocks).

3. savings account(adequate) ∧ income(inadequate) → investment(combination).

4. amount saved(X)∧(dependents(Y)∧greater(X, minsavings(Y)))

→savings account(adequate).

39

5. amount saved(X)∧(dependents(Y)∧¬greater(X, minsavings(Y)))

→savings account(inadequate).

6. earnings(X, steady)∧(dependents(Y)∧greater(X, minincome(Y)))

→ income(adequate).

7. earnings(X, steady)∧(dependents(Y)∧¬greater(X, minincome(Y)))

→ income(inadequate).

8. earnings(X, unsteady) → income(inadequate).

9. amount saved(22000).

10. earnings(25000, steady).

11. dependents(3).

This set of logical sentences describes the problem domain. It is an example of what waspreviously referred to as a logical database. It may also be referred to as a knowledge base. Theprocedure for employing this set to make a recommendation for the individual whose savings,earnings, and number of dependents has been provided goes as follows. Starting with item 1and proceeding downward through the list, we try to validate the premises of a rule so that wecan apply Modus Ponens to derive the rule’s conclusion, and whenever this happens, we add thederived conclusion to the bottom of the list.

Thus, to begin, we note that none of the premises in items 1, 2, or 3 are presently valid, sincewe do not currently have any explicit information in our knowledge base that states whether thesepremises are true or false. Thus consideration goes to item 4, where an attempt is made to validatethe expression amount saved(X). It turns out that this can be done by unification with item 9,substituting the number 22000 in place of the variable X. Having done this, attention next turnsto the expression dependents(Y). This can be unified with item 11, by substituting 3 in place ofY. Then minsavings(Y) can be computed with 3 in place of Y, giving the value 15000. Now thegreater predicate can be evaluated with the given replacement for X, namely 22000 and 15000,thus satisfying the last expression in the rule’s condition. Thus Modus Ponens can be applied,yielding the conclusion, which can be added to our knowledge base as

12. savings account(adequate).

Continuing downward to item 5, we find that not all the expressions in the premise can be satis-fied, so this rule is skipped and attention turns to item 6. earnings(X, steady) can be unified

with item 10 by substituting 25000 in for X. However, minincome for three dependents

evaluates to 27000, and the amount put in place of X is not greater than this, so the

premises cannot be satisfied, and attention turns next to item 7. Here the premises

can be satisfied through unification with items 10 and 11, yielding the conclusion

13. income(inadequate

Processing continues through the remained of the knowledge base, with no actions to perform,and then cycles back to the top. Here, the premise of item 1 is not satisfied, so attention turnto item2. Here, savings account(adequate) unifies with item 12, but income(adequate) is notsatisfied. So finally, attention turns to item 3, where the two premises unify with items 12 and 13,and the conclusion

40

14. investment(combination)

is added to the knowledge base. This is the answer to the original query.This example illustrates how predicate calculus may be used to reason about a realistic problem,

drawing correct conclusions by applying inference rules to the initial problem description. We havenot discussed exactly how an algorithm can determine the correct inferences to make to solve agiven problem or the way in which this can be implemented on a computer. These topics arepresented in Chapters 3, 4, and 6.

2.7 Epilogue and References

In this chapter we introduced predicate calculus as a representation language for AI problemsolving. The symbols, terms, expressions, and semantics of the language were described and defined.Based on the semantics of predicate calculus, we defined inference rules that allow us to derivesentences that logically follow from a given set of expressions. We defined a unification algorithmthat determines the variable substitutions that make two expressions match, which is essential forthe application of inference rules. We concluded the chapter with the example of a financial advisorthat represents financial knowledge with the predicate calculus and demonstrates logical inferenceas a problem- solving technique.

Predicate calculus is discussed in detail in a number of computer science books, including: TheLogical Basis for Computer Programming by Zohar Manna and Richard Waldinger (1985), Logicfor Computer Science by Jean H. Gallier (1986), Symbolic Logic and Mechanical Theorem Provingby Chin-liang Chang and Richard Char-tung Lee (1973), and An Introduction to MathematicalLogic and Type Theory by Peter B. Andrews (1986). We present more modern proof techniquesin Chapter 14, Automated Reasoning.

Books that describe the use of predicate calculus as an artificial intelligence representationlanguage include: Logical Foundations of Artificial Intelligence by Michael Genesereth and NilsNilsson (1987), Artificial Intelligence by Nils Nilsson (1998), The Field of Automated Reasoning byLarry Wos (1995), Computer Modelling of Mathematical Reasoning by Alan Bundy (1983, 1988),and Readings in Knowledge Representation by Ronald Brachman and Hector Levesque (1985). SeeAutomated Reasoning by Bob Ver- off (1997) for interesting applications of automated inference.The Journal for Automated Reasoning (JAR) and Conference on Automated Deduction (CADE)cover current topics.

Additional References

Mendelson, Elliot, Introduction to Mathematical Logic, Third Edition, Wadsworth & Brooks/Cole,1987.

Hamilton, A. G., Logic for Mathematicians, Revised Edition, Cambridge University Press, 1988.Shoenfield, Joseph R., Mathematical Logic, Association for Symbolic Logic, 1967.Horn, Alfred, On sentences which are true of direct unions of algebras, Journal of Symbolic Logic,

16 (1) (1951) 14–21.

2.8 Exercises

1. Using truth tables, prove the Propositional Calculus identities of Section 2.2.

2. A new operator, ⊕, or exclusive-or, may be defined by the following truth table (Figure 15).Create a propositional calculus expression using only ¬, ∨, and ∧ that is equivalent to P ⊕Q.Prove their equivalence using truth tables.

41


2.5 Epilogue and References

In this chapter we introduced predicate calculus as a representation language for AIproblem solving. The symbols, terms, expressions, and semantics of the language weredescribed and defined. Based on the semantics of predicate calculus, we defined inferencerules that allow us to derive sentences that logically follow from a given set of expres-sions. We defined a unification algorithm that determines the variable substitutions thatmake two expressions match, which is essential for the application of inference rules. Weconcluded the chapter with the example of a financial advisor that represents financialknowledge with the predicate calculus and demonstrates logical inference as a problem-solving technique.

Predicate calculus is discussed in detail in a number of computer science books,including: The Logical Basis for Computer Programming by Zohar Manna and RichardWaldinger (1985), Logic for Computer Science by Jean H. Gallier (1986), Symbolic Logicand Mechanical Theorem Proving by Chin-liang Chang and Richard Char-tung Lee(1973), and An Introduction to Mathematical Logic and Type Theory by Peter B. Andrews(1986). We present more modern proof techniques in Chapter 14, Automated Reasoning.

Books that describe the use of predicate calculus as an artificial intelligencerepresentation language include: Logical Foundations of Artificial Intelligence by MichaelGenesereth and Nils Nilsson (1987), Artificial Intelligence by Nils Nilsson (1998), TheField of Automated Reasoning by Larry Wos (1995), Computer Modelling of Mathemati-cal Reasoning by Alan Bundy (1983, 1988), and Readings in Knowledge Representationby Ronald Brachman and Hector Levesque (1985). See Automated Reasoning by Bob Ver-off (1997) for interesting applications of automated inference. The Journal for AutomatedReasoning (JAR) and Conference on Automated Deduction (CADE) cover current topics.

2.6 Exercises

1. Using truth tables, prove the identities of Section 2.1.2.

2. A new operator, ⊕, or exclusive-or, may be defined by the following truth table:

Create a propositional calculus expression using only ∧, ∨, and ¬ that is equivalent to P ⊕ Q.Prove their equivalence using truth tables.

P Q P ⊕ Q

T T F

T F T

F T T

F F F

Figure 15: The exclusive-or operator.

3. The logical operator “↔” is read “if and only if” (similarly as ≡. P ↔ Q is defined as beingequivalent to (P → Q) ∧ (Q → P ). Based on this definition, show that P ↔ Q is logicallyequivalent to (P ∨Q)→ (P ∧Q):

(a) By using truth tables.

(b) By a series of substitutions using the Propositional Calculus identities of Section 2.2.

4. Prove that implication is transitive in the propositional calculus, that is, that ((P → Q) ∧(Q→ R))→ (P → R).

5. (a) Prove that Modus Ponens is sound for propositional calculus. Hint: use truth tables toenumerate all possible interpretations.

(b) Abduction is an inference rule that infers P from P → Q and Q. Show that abductionis not sound (give an example where it fails).

(c) Show that Modus Tollens, from P → Q and ¬Q infer ¬P , is sound (here also use truthtables).

6. Attempt to unify the following pairs of expressions. Either show their most general unifiersor explain why they will not unify.

(a) p(X,Y) and p(a,Z)

(b) p(X,X) and p(a,b)

(c) ancestor(X,Y) and ancestor(bill,father(bill))

(d) ancestor(X,father(X)) and ancestor(david,george)

(e) q(X) and ¬q(a)

7. (a) Compose the substitution sets {a/X, Y/Z} and {X/W, b/Y}.(b) Prove that composition of substitution sets is associative (just illustrate this with an

example; the general proof is quite complex).

(c) Construct an example to show that composition is not commutative.

8. Implement the unify algorithm of Section 2.5 in the computer language of your choice.

9. Give two alternative interpretations for the blocks world description of Figure 11.

42

10. Jane Doe has four dependents, a steady income of 30, 000, and15,000 in her savings account.Add the appropriate predicates describing her situation to the general investment advisor ofthe example in Section 2.4 and perform the unifications and inferences needed to determineher suggested investment.

11. Write a set of logical predicates that will perform simple automobile diagnostics (e.g., if theengine won’t turn over and the lights won’t come on, then the battery is bad). Don’t try tobe too elaborate, but cover the cases of bad battery, out of gas, bad spark plugs, and badstarter motor.

12. The following story is from N. Wirth’s (1976) Algorithms + data structures = programs.

I married a widow (let’s call her W) who has a grown-up daughter (call her D). My father(F), who visited us quite often, fell in love with my step-daughter and married her. Hencemy father became my son-in-law and my step-daughter became my mother. Some monthslater, my wife gave birth to a son (S1), who became the brother-in-law of my father, as wellas my uncle. The wife of my father, that is, my step-daughter, also had a son (S2).

Using predicate calculus, create a set of expressions that represent the situation in the abovestory. Add expressions defining basic family relationships such as the definition of father-in-law and use modus ponens on this system to prove the conclusion that “I am my owngrandfather.”

43

part ii artificial intelligence as representa- tion...

Documents