iulg.sitehost.iu.edu · contents chapter 1. introduction 1 1.1. logic for natural language, logic...

Logics for Natural Language Inference

(expanded version of lecture notes from a course

at ESSLLI 2010)

Lawrence S. Moss

November 2010

Contents

Chapter 1. Introduction 11.1. Logic for Natural Language, Logic in Natural Language 1

Chapter 2. Basic Syllogistic Proof Systems 52.1. The Simplest Fragment “of All” 52.2. All and Some 102.3. All and No 142.4. A Logic for S 162.5. Additional Exercises 17

Chapter 3. Syllogistic Logic with Complements: S† 213.1. Background: Completeness of Basic Sentential Logics 213.2. Adding Sentential Boolean Operations to L(all, some) 273.3. Syntax and Semantics of S† 303.4. Completeness via Representation of Orthoposets 333.5. Adding Boolean Connectives on Sentences: S

†bc 38

Chapter 4. Cardinality Comparisons 414.1. Adding ∃≥ to the Boolean Syllogistic Fragment 434.2. Digression: Most 454.3. On the Numerical Syllogistic 47

Chapter 5. Verbs: R 495.1. A Logic for R 525.2. There is no Finite Syllogistic Logic for R 595.3. An Alternative for the Positive Fragment 615.4. Incorporating Background Facts, I 64

Chapter 6. Relative Clauses: RC 676.1. RC: an Indirect System 676.2. Complexity Results for RC, R†, and RC† 706.3. Complete Subsystems 746.4. Incorporating Background Facts, II 75

Chapter 7. Comparative Adjectives 777.1. Requiring Transitivity in R 787.2. Requiring Transitivity and Irreflexivity in RC 80

Chapter 8. Logic Beyond the Aristotle Border 898.1. Fitch-style Proof System for RC† 898.2. Adding Transitivity: L(adj) 100

iii

iv CONTENTS

Chapter 9. Monotonicity and Polarity 1079.1. Introduction 1079.2. Background: Categorial Grammar 1099.3. Background: Preorders and their Opposites 1119.4. Higher-order Terms over Preorders 1149.5. Examples and Discussion 117

Bibliography 123

Index 127

CHAPTER 1

Introduction

1.1. Logic for Natural Language, Logic in Natural Language

Logic is one of the oldest subjects in the western intellectual tradition. It wasinitiated by Aristotle and it served alongside grammar and rhetoric as a compo-nent of the medieval trivium. It has an illustrious contemporary connections totopics as diverse as rhetoric, theoretical computer science, and infinity. It is todaypursued by philosophers, mathematicians, and computer scientists. In more detail,philosophers have always been interested in the central themes of logic: reasoning,representation, meaning, and proof. Around 1900, the field of logic became heavilymathematical, and indeed today there are whole fields of mathematics which areoffshoots of logic. Part of the mathematical interest of logic came about by presen-tations of logic as a foundation of mathematics, and the same is happening todaywith computer science. Indeed, almost every area of computer science makes useof logic, just as most areas of physics end up using branches of mathematics suchas the calculus. And so computer science is a vital source of work in logic.

The specific focus of these notes is the development of areas of logic with an eyetowards natural language. Although logic often is presented as a study of reasoningin language, its main contemporary thrusts are to applications in computer science,mathematics, or philosophy. For example, students of logic might learn that Allcats are animals might be symbolized (∀x)(cat(x) → animal(x)) and that All wholove all animals love all cats might be symbolized as

(∀y)(((∀x)(animal(x)→ love(y, x)))→ ((∀x)(cat(x)→ love(y, x)))) .

In a good first course, the same students might even learn that the second sentencefollows logically from the first. But this notion of “follows logically” is definedentirely on the logical representations, not on the English sentences with which westarted. The interaction of logic and language is a neglected area, and so our topicarea is outside the mainstream of work in logic. But our view is that the topic isof vital interest in several fields. In a sense, the project goes back to Aristotle’ssyllogisms. However, we have in mind a much broader and deeper set of researchquestions: whereas syllogistic reasoning only applies to arguments in very restrictedforms, we would like to study reasoning as close to “surface forms” (real sentences)as possible. Instead of limiting attention to categorical (yes/no) reasoning, weeventually hope to develop connections to default reasoning and reasoning underuncertainty. However, these are not treated in these notes.

These notes present logical systems that attempt to model aspects of naturallanguage inference. Most of these are listed on the map on the next page, but tokeep the chart readable I have left off several of them. Indeed, to keep the notesthemselves readable, I have tried to keep the chapters fairly short by concentrating

1

2 1. INTRODUCTION

Aristotle

Church-Turin

g

Peano-Frege

S

S†

S≥ S≥ adds |p| ≥ |q|R

RC

RC(tr)RC(tr, opp)

R†

RC†

RC†(tr)

RC†(tr, opp)

FOL

FO2 + trans

FO2

first-order logic

FO2 + “R is trans”

2 variable FO logic

† adds full N -negation

RC = R + relative clauses

R = relational syllogistic

RC + (transitive)comparative adjs

RC(tr) + opposites

S + full N -negation

S: all/some/no p are q

on the main ideas in the logical systems instead of presenting all the details, andalso by omitting some proofs entirely.

The chart mostly consists of logical languages, each with an intended semantics.For example, FOL at the top is first-order logic. (Very short explanations of thesystems appear to the right of the vertical line.) In the chart, the lines going downmean “subsystem,” and sometimes the intention is that we allow a translationpreserving the intended semantics. So all of the systems in the chart may beregarded as subsystems of FOL with the exception of S≥.

The smallest system in the chart is S, a system even smaller than the classicalsyllogistic. The syntax of S is quite impoverished and contains only sentences ofthe kinds All p are q and Some p are q, where p and q are variables. It is our intentthat these variables be interpreted as plural common nouns of English, and that amodel M be an arbitrary set M together with interpretations of the variables bysubsets of M .

1.1. LOGIC FOR NATURAL LANGUAGE, LOGIC IN NATURAL LANGUAGE 3

Moving up, S≥ adds additional sentences of the from there are at least as manyp as q. The additions are not expressible in FOL, and we indicate this by settingS≥ on the “outside” of the “Peano-Frege Boundary.”

The language S† adds full negation on nouns to S. For example, one can sayall p are q with the intended reading “no p are q.” One can also say All p are q,and this goes beyond what is usually done in the syllogistic logic. The use of the† notation will be maintained throughout these notes as a mark of systems whichcontain complete noun-level negation.

Moving up the chart, let us discuss the systems R, RC, R†, and RC†. Thesystem R extends S by adding verbs, interpreted as arbitrary relations. (The ‘R’stands for ‘relation.’) This system and the others in this paragraph originate inPratt-Hartmann and Moss [29]. So R would contain Some dogs chase no cats, andthe yet larger system RC would contain relative clauses as exemplified in All wholove all animals love all cats. The languages with the dagger such as S† and R†

are further enrichments which allow subject nouns to be negated. This is ratherun-natural in standard speech, but it would be exemplified in sentences like Everynon-dog runs. The point: the dagger fragments are beyond the Aristotle boundaryin the sense that they cannot be treated by the relatively simpler syllogistic logics.The only known logical systems for them use variables in a key way. The linemarked “Aristotle” separates the logical systems above the line, systems whichcan be profitably studied on their own terms without devices like variables overindividuals, from those which cannot.

Continuing, the chart continues with the additions of comparative adjectivephrases with the systems RC(tr) and RC†(tr).

In addition to the logical systems on the charts, the notes go further by con-sidering logical systems that directly attempt to

Previous work. There are two main predecessor projects to the proposed project.The first is a line of work begun by Johan van Benthem on a calculus of monotonic-ity. It would take us too far afield to present the basic ideas here, but perhaps anexample would suffice. From All cats are animals, we might infer by monotonicitythat Someone who owns a cat owns an animal. To inference of All who love allcats love all animals is by monotonicity in the downward direction. The originalidea is to propose a systematic account of monotonicity inferences in language, anaccount which works on something closer to real sentences than to logic represen-tations. This is a goal which we share in these notes, but the bulk of our work isnot concentrated on the monotonicity calculus.

The second predecessor project studies logics for fragments of language. Manyof the results The chart above is a summary of results in several papers in thearea, written by the proposer and also in a key paper jointly with Pratt-Hartmann.Listed on the chart are a number of tiny pieces of English with names like S, R,etc.

Origin of these notes. These notes are an expanded and modified version ofthe notes for my ESSLLI 2010 course. My intention was, and still is, to presentmuch of what I know about logical systems for linguistic reasoning and to outlinewhat I think are important research topics and directions. The notes are based onpapers on logical systems for natural language inference that were written in thepast years, put together with a uniform notation, point of view, and tone.

For slides of my lectures, please see

4 1. INTRODUCTION

http://logicforlanguage.blogspot.com/

My lectures generally did not cover everything in the slides, and the slides alsocover only a part of what is in these notes. I hope the notes will be useful for twodifferent kinds of readers: First, people who want it should be able to get the bigpicture without being lost in a sea of details. Second, those who really want to diveinto that sea should be able to do so without drowning. For this, I have augmentedthe proofs with more motivation and examples than one would find in a journalpublication, and at the same time I have relegated parts of long proofs to exercises.

Much of these notes is concerned with logical completeness results. We’ll havemore to say about what these are and why we are interested in them in Chapter 2.However, we are also interested in issues of computational complexity pertainingto the logical systems themselves. And here is a version of our map that presentsthe results.

Aristotle

Church-Turin

g

S

S†

BML(tr)EXPTIMELutz & Sattler 2001

in co-NEXPTIME

R

RC

RC(tr)RC(tr, opp)

R†

RC†

RC†(tr)

RC†(tr, opp)

FOL

FO2 + trans

FO2

undecidableChurch 1936Gradel, Otto, Rosen 1999

Co-NEXPTIMEGradel, Kolaitis, Vardi ’97EXPTIMEPratt-Hartmann 2004

Co-NPMcAllester & Givan 1992

lower bounds also open

NLOGSPACE

As you can see, this complexity chart mentions several logical languages that arenot present on the first chart.

CHAPTER 2

Basic Syllogistic Proof Systems

2.1. The Simplest Fragment “of All”

We begin our notes with the simplest logical fragment whatsoever. The sen-tences are all of the form All p are q. So we have a very impoverished language,with only one kind of sentence. But we’ll have a precise semantics, a proof system,and a soundness/completeness theorem which relates the two.

2.1.1. Syntax and semantics. For the syntax, we start with a collection Pof unary atoms (for nouns). We write these as p, q, . . ., Then the sentences of thisfirst fragment are the expressions

All p are q

where p and q are any atoms in P. We call this language L(all)1.The semantics is based on models.

Definition 2.1 A model M for this fragment L(all) is a structure

M = (M, [[ ]])

consisting of a set M , together with an interpretation [[p]] ⊆M for each noun p ∈ P.The main semantic definition is truth in a model :

M |= All p are q iff [[p]] ⊆ [[q]]

We read this in various ways, such as M satisfies All p are q, or All p are q is truein M.

From this definition, we get two further notions: If Γ is a set of sentences, wesay that M |= Γ iff M |= ϕ for every ϕ ∈ Γ. Finally, we say that Γ |= ϕ iff wheneverM |= Γ, also M |= ϕ. We read this as Γ logically implies ϕ, or Γ semanticallyimplies ϕ, or that ϕ is a semantic consequence of Γ.

Example 2.2 Here is an example of the semantics. Let M = {1, 2, 3, 4, 5}.Let [[n]] = ∅, [[p]] = {1, 3, 4}, and [[q ]] = {1, 3}. This is all we need to specify a modelM. Then the following sentences are true in M: All n are n, All n are p, All n areq, All q are p, and All q are q. (In the first two of these example sentences, we usethe fact that the empty set is a subset of every set.) The other four sentences usingn, p, and q are false in M.

1Note that L(all) is really a family of languages, one for each set P of nouns at the outset.This dependence on primitives is true for practically all logical systems. For most of the work inthese notes, we suppress mention of the primitive syntactic items because it makes the notation

lighter and because we rarely need to call attention to the primitives in the first place.

5

6 2. BASIC SYLLOGISTIC PROOF SYSTEMS

All p are pAll p are n All n are q

All p are q

Figure 2.1. The logic of All p are q.

Example 2.3 Here is an example of a semantic consequence which can beexpressed in L(all): We claim that

{All p are q,All n are p} |= All n are q.

To see this, we give a straightforward mathematical proof in natural language. LetM be any model for L(all), assuming that the underlying set P contains n, p, andq. Assume that M satisfies All p are q and All n are p. We must prove that M

also satisfies All n are q. From our first assumption, [[p]] ⊆ [[q ]]. From our second,[[n]] ⊆ [[p]]. It is a general fact about sets that the inclusion relation (written as ⊆here) is transitive, and so we conclude that [[n]] ⊆ [[q ]]. This verifies that indeed M

satisfies All n are q. And since M was arbitrary, we are done.

Example 2.4 Finally, we have an example of a failure of semantic consequence:

{All p are q} 6|= All q are p.

To show that a given set Γ does not logically entail another sentence ϕ, we need tobuild a model M of Γ which is not a model of ϕ. In this example, Γ is {All p are q},and ϕ is All q are p. We can get a model M that does the trick by settingM = {1, 2},[[p]] = {1}, and [[q ]] = {1, 2}. For that matter, we could also use a different model,say N, defined by N = {61}, [[p]] = ∅, and [[q ]] = {61}.

2.1.2. Proof system. We construct a proof system for this language basedon the rules shown in Figure 2.1 and the definition of a proof tree.

Definition 2.5 A proof tree over Γ is a finite tree T whose nodes are labeledwith sentences, and each node is either an element of Γ, or comes from its parent(s)by an application of one of the rules.

Γ ` ϕ means that there is a proof tree T for over Γ whose root is labeled ϕ.We read this as Γ proves ϕ, or Γ derives ϕ, or that ϕ follows in our proof systemfrom Γ.

Example 2.6 Here is an example, chosen to make some several points: Let Γbe

{All l are m,All q are l,All m are p,All n are p,All l are q}Let ϕ be All q are p. Here is a proof tree showing that Γ ` ϕ:

All q are l

All l are m All m are mAll l are m All m are p

All l are pAll q are p

Note that all of the leaves belong to Γ except for one: that is All m are m. Note alsothat some elements of Γ are not used as leaves. This is permitted according to ourdefinition. The proof tree above shows that Γ ` ϕ. Also, there is a smaller proof

2.1. THE SIMPLEST FRAGMENT “OF ALL” 7

tree that does this, since the use of All m are m is not really needed. (The reasonwhy we allow leaves to be labeled like this is so that that we can have one-elementtrees labeled with sentences of the form All l are l.)

The main theoretical question for this chapter is: what is the relation thesemantic notion Γ |= S with the proof-theoretic notion Γ ` S? This kind ofquestion will present itself for all of the logical systems in this course. Probably thefirst piece of work for you is to be sure you understand the question.

Lemma 2.7 (Soundness) If Γ ` ϕ, then Γ |= ϕ.

Proof By strong induction on the number of nodes of proof trees T over Γ.If T is a tree with one node, let ϕ be the label. Either ϕ belongs to Γ, or else ϕis of the form All p are p. In the first case, every model satisfying every sentencein Γ clearly satisfies ϕ, as ϕ belongs to Γ. And in the second case, every modelwhatsoever satisfies ϕ.

Let’s suppose that we know our result for all proof trees over Γ with less thann nodes, and let T be a proof tree over Γ with n nodes. The argument breaks intocases depending on which rule is used at the root. Suppose the root and its parentsare labeled

All p are n All n are qAll p are q

Let T1 and T2 be the subtrees ending at All p are q and All q are n. Then T1 andT2 are proof trees over Γ themselves. For some unary atom q, the root of T1 islabeled All p are q, and the root of T2 is labeled All q are n. Now T1 and T2 bothhave fewer nodes than T. By our induction hypothesis, Γ |= All p are q, and alsoΓ |= All q are n. We claim that Γ |= All p are n. To see this, take any model M

in which all sentences in Γ are true. Then [[p]] ⊆ [[q ]] by our first point above. And[[q ]] ⊆ [[n]] by second. So [[p]] ⊆ [[n]] by transitivity of the inclusion relation on sets.Since the model M here is arbitrary, we conclude that Γ |= All p are n. �

So at this point, we know that our logic is sound: If we have a tree showingthat Γ ` ϕ, then ϕ follows semantically from Γ. This means that the formal logicalsystem is not going to give us any bad results. Now this is a fairly weak point. If wedropped some of the rules, it would still hold. Even if we decided to be conservativeand say that Γ ` ϕ never holds, the soundness fact would still be true. So the moreinteresting question to ask is whether the logical system is strong enough to proveeverything it conceivably could prove. We want to know if Γ |= ϕ implies thatΓ ` ϕ. If this implication does hold for all Γ and ϕ, then we say that our system iscomplete!logical system. Here is a summary of our definitions.

Definition 2.8 A proof system is sound if Γ ` ϕ implies that Γ |= ϕ.The same proof system is complete if the converse holds: if Γ |= ϕ implies that

Γ ` ϕ.

Before we turn to the completeness proof itself, we make a general definitionthat will reappear throughout these notes.

Definition 2.9 Let Γ be a set of sentences in any fragment containing All.Define2 u ≤ v to mean that

(2.1) Γ ` All u are v.

2It is important to note that this relation ≤ is defined in terms of Γ, but that the notation

≤ hides this dependence. We could write it as ≤Γ, and this might make it easier on readers the


Proposition 2.10 ≤ is a preorder: it is reflexive and transitive relation onthe set P of unary atoms.

Proof The reflexivity comes from the fact that we permit single-node trees tobe proofs of statements All u are u. For the transitivity, assume that u ≤ v ≤ w.Then we have proof trees for All u are v and for All v are w. all of whose leavesare in Γ. Putting the trees together and using (Barbara) at the root gives a prooftree showing that Γ ` All u are w. This means that u ≤ w, as desired. �

We are going to use Proposition 2.10 frequently in these notes, and often with-out explicitly mentioning it.

Example 2.11 Let

Γ =

All j are k, All j are l, All k are l,All l are k, All l are m, All k are n,All m are q, All p are q, All q are p

Then the associated preorder ≤ is shown below:

p, q

m n

k, l

��

???????

j

For example, since j ≤ k, we draw j below k. Note also that k ≤ l and l ≤ k. Weindicate this in the picture by situating k and l together.

Theorem 2.12 The logic of Figure 2.1 is complete for L(all).

Proof Suppose that Γ |= All p are q. Build a model M, taking M to be theset of unary atoms. The rest of the structure is given by down-sets: [[u]] =↓u. Thatis,

[[u]] = {v : u ≤ v}.

Claim M |= Γ.

Proof Suppose that All u are v belongs to Γ; that is u ≤ v. We need to showthat [[u]] ⊆ [[v ]]. But this is just a general fact about down-sets in preorders: ifx ∈ [[u]], then x ≤ u. So by transitivity, x ≤ v. �

first time around. But we prefer the more uncluttered notation, and so we drop the symbol Γ.

This point will recur throughout these notes, because practically all of our logical systems haveAll, and we therefore use a notation like u ≤ v frequently, always with a fixed set Γ of sentences

in the background.

2.1. THE SIMPLEST FRAGMENT “OF ALL” 9

Recall that we are assuming that Γ |= All p are q, and that we are proving thatΓ ` All p are q. Hence for the p and q in our statement, [[p]] ⊆ [[q ]]. But we have aone-point tree showing that Γ ` All p are p. This goes to show that p ∈ [[p]]. And sop ∈ [[q ]]; this means that p ≤ q. But this is exactly what we want: Γ ` All p are q.

This completes the proof. �

Example 2.13 We continue a discussion begun in Example 2.11. The modelM obtained following the proof of Theorem 2.12 has M = {j, k, l,m, n, p, q} andalso

[[j ]] = {j}[[k ]] = {j, k, l}[[l ]] = {j, k, l}[[m]] = {j, k, l,m}

[[n]] = {j, k, l, n}[[p]] = {j, k, l,m, p, q}[[q ]] = {j, k, l,m, p, q}

One can check that M |= Γ. But in contrast, consider a sentence which is not asemantic consequence of Γ, such as All n are k. We can check that [[n]] 6⊆ [[k ]] in ourmodel M. The point is that our proof gives something stronger than completeness:it gives a single model M |= Γ such that for all ϕ such that Γ 6` ϕ, M 6|= ϕ. Thatis, the same model falsifies all sentences which do not follow from Γ.

2.1.3. Comments. There are a few comments to be made at this point. Theseare based on questions which I have received in teaching this material, or similarmaterial to bright students.

(1) Doesn’t this proof confuse syntax and semantics? This is a good question,since we are building a model out of the “material from the syntax” (in this case, theunary atoms). But when one thinks about it, there is nothing wrong with buildinga model out of chairs, numbers, abstract objects, or even the same objects that weused in the syntax. It is more interesting that the interpretation function [[ ]] in themodel was defined in terms of a syntactic notion, the proof system. Again, we arefree to define the semantics of a model any way we like. It is interesting that weprove completeness in this way. But in a sense it should not be such a surprise. Forcompleteness is about a relation between syntax and semantics, and so it makessense that it involves a single structure that has aspects of both.

(2) I thought that the semantic assertion Γ |= ϕ meant that all models of Γ areagain models of ϕ. How is it that we only argue completeness on one particularmodel rather than many models? It’s true that our semantic assertion Γ |= ϕ is astatement about all models. But this does not mean that in proving something weneed to use all the models. In a sense, the question can be turned around to makean observation: the model M we built from a set Γ is as “bad” as any model couldpossibly be! It makes true any sentence which does not follow from Γ. So once wehave built a single model that covers for all the models, we can use that one modelto prove completeness.

2.1.4. A Stronger Result. Theorem 2.12 proves the completeness of thelogical system. But it doesn’t give us all the information we would need to have anefficient procedure to decide whether or not Γ ` ϕ in this fragment.3

3The reason is that we still have to examine all possible models on a one element set in order

to check whether Γ |= ϕ or not. It might seem at first glance that there are very few such models.But if the sequent Γ |= ϕ contains k atoms, then there are 2k models to consider. The questionof efficient decidability is for us the question of whether a polynomial-time algorithm exist. For

that, we need to do further work.


Some p are qSome q are p

Some p are qSome p are p

All q are n Some p are qSome p are n

Figure 2.2. The logic of Some and All, in addition to the logic of All.

Theorem 2.14 Let Γ be any set of sentences in L(all), let �∗ be defined fromΓ as above. Let p and q be any atoms. Then the following are equivalent:

(1) Γ ` All p are q.(2) Γ |= All p are q.(3) p �∗ q.

Proof (1)=⇒(2) is by soundness, and (3)=⇒(1) is by induction on �∗. Themost significant part (2)=⇒(3). Consider the model M whose universe is a singleton{∗}, and with [[z ]] = M iff p �∗ z. We claim that all sentences in Γ are true in M.Consider All v are w. We may assume that [[v ]] = M, or else our claim is trivial.Then p �∗ v. But v � w, so we have p �∗ w, as desired. This verifies that M |= Γ.And since [[p]] = M, we have [[q ]] = M as well. Thus p �∗ q, as needed for (3). �

The original definition of the entailment relation Γ |= ϕ involves looking atall models of the language. Theorem 2.14 is important because part (3) gives acriterion the entailment relation that is algorithmically sensible. To see whetherΓ |= All p are q or not, we only need to construct �∗. This is the reflexive-transitiveclosure of a syntactically defined relation, so it is computationally very manageable.

2.2. All and Some

We move from the language L(all) to a bigger language L(all, some) by addingsentences of the form Some p are q. We call these new sentences existentials, sinceformalizing them in first-order logic would use the existential quantifier ∃.

The semantics for the existential sentences is the obvious one:

M |= Some p are q iff [[p]] ∩ [[q ]] 6= ∅.We already have a proof system for All, given in the last section. Since the

logic with All and Some is a bigger system, we keep the rules for All. We merelyextend the logical system by adding the rules in Figure 2.2.

Example 2.15 The first important derivation in the logic:

All n are q

All n are p Some n are nSome n are pSome p are n

Some p are q

That is, if there is a n, and if all ns are ps and also qs, then some p is a q.

2.2. ALL AND SOME 11

Definition 2.16 If Γ is a set of sentences in L(all, some), we write Γall forthe sentences in Γ of the form All p are q, and Γsome for the sentences of the formSome p are q.

Lemma 2.17 Let Γ ⊆ L(all, some). Then there is a model M with the followingproperties:

(1) M |= Γ.(2) If ϕ is any sentence in Some and M |= ϕ, then Γ ` ϕ.

Proof For the universe M of the model, we take Γsome. For an atom p, set

(2.2) [[u]] = {ϕ : some v ≤ u occurs in ϕ}.(Again, ≤ is defined in (2.1).) In other words,

All x are y belongs to [[u]] iff either x ≤ u or y ≤ u.This defines the model M.

We check that M |= Γ. Consider a sentence All x are y in Γ. Then x ≤ y. Thetransitivity of ≤ implies that [[x ]] ⊆ [[y ]]. Second, consider an existential sentenceϕ ∈ Γ, say Some x are y. Then i itself belongs to [[x]] ∩ [[y]], so this intersection isnot empty. Thus M |= Γ, and this is point (1) of our lemma.

For (2), let ϕ be Some p are q, and assume that [[p]] ∩ [[q ]] 6= ∅. Let Some xare y belong to this set. Since Some x are y belongs to [[p]], either x ≤ p or y ≤ p.Similarly, either x ≤ q or y ≤ q. At this point, our proof splits into four cases. Onecase is when x ≤ p and y ≤ q. Recalling that Some x are y belongs to Γ, we havea proof tree as follows:

(2.3)

....All y are q

....All x are p

Some x are ySome y are x

Some y are pSome p are y

Some p are q

The other cases are similar. You might like to check the details to see where thesecond rule of Some gets used. �

Theorem 2.18 The system in Figures 2.1 and 2.2 is complete for L(all, some).

Proof Suppose that Γ |= ϕ. There are two cases, depending on whether ϕis of the form All p are q or of the form Some p are q. The cases are handleddifferently. We leave the first to you as Exercise 1. The second follows immediatelyfrom Lemma 2.17. �

Example 2.19 Let us check that

{Some p are q,Some q are n,All q are m} 6|= Some p are n

by building a model in which the hypotheses hold and the conclusion fails. Let usdo this following the proof of Lemma 2.17.

Let ϕ be Some p are q, and let ψ be Some q are n. We take for a modelM = {ϕ,ψ} with [[p]] = {ϕ}, [[q ]] = {ϕ,ψ}, [[n]] = {ψ}, [[m]] = {ϕ,ψ}. These comefrom (2.2). It is clear that M has the desired properties. Although one could buildsuch a model by hand, our general definitions show how to do this in a perfectlygeneral way.


Exercise 1 Complete the proof of Theorem 2.18 by showing that if Γ is aset of sentences in All and Some, and if Γ |= All p are q, then also Γ ` All p are q.[It will be easier to modify the proof of Theorem 2.12 to account for existentialsentences than to use Lemma 2.17.]

Exercise 2 Let Γ be a set of sentences in All and Some, and let ϕ be asentence in Some. As we know from Lemma 2.17, if Γ 6` ϕ, there is a M |= Γ whichmakes ϕ false. The proof gets a model M whose size of the M will be the numberof existential sentences in Γ. Can we do better?

(1) Show that there is a model as desired whose size is at most 2.(2) Show that 2 is the smallest we can get by showing that if we only look at

one-element models, then

{Some p are q,Some q are n} |= Some p are n

Exercise 3 Let Γ be the set {All p are q}. Prove that there is no model M

such that for all sentences ϕ in the fragment of this section, M |= ϕ iff Γ ` ϕ. Thepoint of this problem is that there is some M′ with the property that for all ϕ ofthe form Some u are v, M |= ϕ iff Γ ` ϕ. But it is not possible to extend thisresult to sentences with All. So we cannot hope to avoid the split in the proof ofTheorem 2.18 due to the syntax of ϕ.

Exercise 4 Give an algorithm which takes finite sets Γ in the fragment ofthis section and also single sentences ϕ and tells whether Γ |= ϕ or not. [You maybe sketchy, as we were in our discussion of this matter at the end of Section 2.1.]

Exercise 5 Suppose that one wants to say that All p are q is true when[[p]] ⊆ [[p]] and also [[p]] 6= ∅. Then the following rule becomes sound:

All p are qSome p are q

Show that if we add this rule to the proof system for this section, then we get acomplete system for the modified semantics. [Hint: Given Γ, let Γ be Γ togetherwith all sentences Some p are q such that All p are q belongs to Γ. Show that Γ ` ϕin the modified system iff Γ ` ϕ in the old system.]

Exercise 6 Check that

{Some p are q,Some q are n,All q are m} 6` Some p are n

by examining proofs. [We have seen in Exercise 2.19 that the hypotheses do notsemantically imply the conclusion, and since the proof system is sound, we knowthat the hypotheses cannot derive the conclusion in the proof system. So thisexercise is unnecessary. Still, it is a good exercise in working with a proof system.]

Exercise 7 This exercise asks you to come up with definitions and to checktheir properties.

(1) Define the appropriate notions of submodel and homomorphism of models.(2) Which sentences ϕ in our language have the property that if M is a sub-

model of M′ and M′ |= ϕ, then also M |= ϕ?(3) Which sentences ϕ in our language have the property that if M is a sur-

jective homomorphic image of M′ and M |= ϕ, then also M′ |= ϕ?

2.2. ALL AND SOME 13

J is JM is JJ is M

J is M M is FJ is F

J is a p J is a qSome p are q

All p are q J is a pJ is a q

M is a p J is MJ is a p

Figure 2.3. The logic of names, on top of the rules in Figures 2.1 and 2.2.

(4) Would anything change if we changed “if” to “iff”?

Exercise 8 What would you do to the system to add sentences of the formSome p exists?

2.2.1. Alternative notation: a first look. Recall that our syntax beginswith a set P of unary atoms that we think of as simple nouns in English such ascat, animal, etc. A a unary literal is an expression of either of the forms p or p,where p is a unary atom. A unary literal is called positive if it is a unary atom;otherwise, negative. We continue to use letters like p, q, x, y, etc., to range overunary atoms, and l, m, n to range over unary literals. With these conventions, anS-formula is an expression of any of the forms

(2.4) ∃(p, l), ∃(l, p), ∀(p, l), ∀(l, p).We provide English glosses for S-formulas as follows:

(2.5)∀(p, q) Every p is a q ∀(p, q) No p is a q∃(p, q) Some p is a q ∃(p, q) Some p is not a q.

Note that S contains the formulas ∃(p, q) and ∀(p, q), which are not glossed in (2.5);however, according to the semantics below, these formulas are logically equivalent to∃(q, p) and ∀(q, p), respectively. We may regard S as the language of the traditionalsyllogistic.

2.2.2. Adding Names. It is natural to add names for individuals to thelogic; we want to proceed with larger and larger fragments, and names would seemto be one of the first things to add. We are going to add names in this section,but for the most part in these notes, we shall set names aside. The reason for thisis that it is usually routine to add names to our fragments, but it makes our workeasier to avoid them.

To our formal language we add names J , M , . . ., (for John and Mary) J1, . . .,Jn, . . ., etc. The sentences we add to the fragment are J is a p and J is M , whereJ and M are names and p is an atom. The semantics is that names denote elementsof models M. That is, we require that [[J ]] ∈M for all names J .

Now we turn to the proof system. Since we are going to add names to L(all, some),the rules of the logic for that language are still sound. So we use the rules in Fig-ures 2.1 and 2.2, and for the names we add the rules in Figure 2.3.


Fix a set Γ of sentences in this fragment. Let ≡ be the relation on namesdefined by

(2.6) J ≡M iff Γ ` J is M.

Lemma 2.20 ≡ is an equivalence relation.

Lemma 2.21 Let Γ be any set of sentences in Some, All, and names. Thenthere is a model M with the following properties:

(1) M |= Γ.(2) If ϕ is any sentence in Some or names and M |= ϕ, then Γ ` ϕ.

Proof As before, we define ≤ to be from (2.1). We also have the equivalencerelation ≡ from (2.6). Let the set of equivalence classes of ≡ be [J1], . . . , [Jm].

We take M to be Γsome ∪ {[J1], . . . , [Jm]}. We assume these sets are disjoint.We define

(2.7)[[Z ]] = {i : either Vi ≤ Z or Wi ≤ Z}

∪ {[J ] : for some M ∈ [J ], Γ `M is a Z}To finish defining our model, we take [[J ]] = [J ]. That is, the semantics of J is theequivalence class [J ].

It is easy to see that the semantics is monotone in the sense that if V ≤ W ,then [[V ]] ⊆ [[W ]]. This implies that all of the universal assertions of Γ are true inour model M. The existential assertions in Γ are Some Vi is Wi for i ≤ n, and foreach i, the number i belongs to [[Vi]]∩ [[Wi]]. The identity sentences J is M from Γare clearly true in M. Finally, consider a sentence J is a n in Γ. Then Γ ` J is a n.So [[J ]] = [J ] ∈ [[Z ]]. This means that our sentence J is a n is true in M.

Let M |= Some p are q. If there is some number i in [[X ]]∩ [[Y ]], then the proofof Theorem 2.18 shows that Γ ` Some p are q. The only alternative is when forsome J , [J ] ∈ [[X ]]∩ [[Y ]]. By the definition in (2.7), there are M ∈ [J ] and N ∈ [J ]such that Γ `M is a p and Γ ` N is a q. We thus have a proof tree over Γ:

....M is a p

....N is a q

....M is J

....J is N

M is NM is a q

Some p are q

So Γ ` Some p are q, as desired.Continuing, let M |= J is M . Then [J ] = [M ]. So by Lemma 2.20, Γ ` J is M .Finally, suppose M |= J is a p. Then for some M , M ∈ [J ] and Γ ` M is a p.

So we see that Γ ` J is a p using the last rule in Figure 2.3. �

Exercise 9 Prove that the logic of Figures 2.1, 2.2, and 2.3 is complete forAll, Some, and names.

2.3. All and No

In this section, we consider the fragment L(all,no) which contains sentences ofthe form All p are q and No p are q. We use the same kinds of models that we havealready seen, and interpret the new sentences by

M |= No p are q iff [[p]] ∩ [[q]] = ∅.

2.3. ALL AND NO 15

To make a proof system for this language, we use the rules of Figure 2.1 and alsothe rules in Figure 2.4.

Example 2.22 We check that

{All p are v, All q are w, No v are w} ` No p are q.

(2.8)All q are w

All p are v No v are wNo p are wNo w are p

No p are q

Lemma 2.23 Let Γ be any set of sentences in L(all,no). Then there is a modelM with the following properties:

(1) M |= Γ.(2) If ϕ is any sentence in L(all,no), and M |= ϕ, then Γ ` ϕ.

Proof Recall that all our syntactic notions are built on a set P of atoms (fornouns). We take for M the set of sets A ⊆ P satisfying the following conditions:

(2.9) A is up-closed : If v ∈ A and v ≤ w, then w ∈ A.(2.10) If v, w ∈ A, then Γ 6` No v are w.

(Note as a special case of (2.10) that if v ∈ A, then Γ 6` No v are v.) We set

(2.11) [[v ]] = {A ∈M : v ∈ A}.We claim that M |= Γ. Condition (2.9) implies that if All v are w belongs to Γ,

then [[v ]] ⊆ [[w ]]. Suppose that No v are w belongs to Γ. Let A ∈ [[v ]], so that v ∈ A.By condition (b), w /∈ A. So A /∈ [[w ]]. This argument shows that [[v ]] ∩ [[w ]] = ∅.

With (1) proved, we turn to (2). Let M |= ϕ. We first deal with the case thatϕ is the of the form All p are q. Let

A = {z : p ≤ z}.Case I: A /∈ M. Then there are v, w ∈ A such that Γ ` No v are w. In this

case, we use the result of Example 2.22 (with q = p) to see that Γ ` No p are p.As a result, we have Γ ` ϕ.

Case II: A ∈M. Then since A ∈ [[p]], we have A ∈ [[q ]]. The semantics in (2.11)tells us that A ∈ [[q ]]. Thus Γ ` All p are q, as desired.

This concludes our work when ϕ is All p are q. Suppose that ϕ is No p are q.Let

A = {z : p ≤ z or q ≤ z}.Note that p, q ∈ A. We claim that A /∈ M. For if A ∈ M, we would have A ∈[[p]] ∩ [[q ]]. And then [[p]] ∩ [[q ]] 6= ∅. But this contradicts M |= ϕ. So indeed,A /∈M. Hence there are v, w ∈ A such that Γ ` No v are w. There are four cases,depending on whether Γ ` All p are v or Γ ` All q are v, and similarly for w.

Case I: Γ ` All p are v, and Γ ` All q are w. It follows from Example 2.22 thatΓ ` ϕ.

Case II: Γ ` All p are v, and Γ ` All p are w.From Example 2.22 (with q = p), we have from Γ that No p are p. So we also

have No p are q.The remaining two cases are similar. �


All p are n No n are qNo p are q

No p are qNo q are p

No p are pNo p are q

No p are pAll p are q

Figure 2.4. The logic of No p are q on top of All p are q.

Theorem 2.24 The logic of Figures 2.1 and 2.4 is complete for L(all,no).

2.4. A Logic for S

The language S contains the syllogistic sentences in All, Some, No which wehave seen. We could also call the language L(all, some,no), following what we havebeen doing earlier. But since we think of this language (and its relative S†) inconnection with syllogisms, this is why we use the letter S.

We know the semantics of S, and it remains to join the proof systems which wehave already seen in Figures 2.1, 2.2, 2.3, and 2.4. We also must add a principlerelating Some and No. For the first time, we face the problem of potential incon-sistency: to say Some p are q is to deny that No p are q. There are no models ofSome p are q and No p are q. We see that according to our semantics, any sentenceϕ whatsoever follows from these two. We thus add the rule of ex falso quodlibet(also called ex contradictione quodlibet) to our system4. We call this rule (X).

(2.12)Some p are q No p are q

ϕ X

The full set of rules of S is listed in Figure 2.6.

Definition 2.25 A set Γ is inconsistent if Γ ` ϕ for all ϕ. Otherwise, Γ isconsistent.

Theorem 2.26 The logic in Figure 2.6 is complete for S.

Proof Let Γ be a set of sentences. Suppose that Γ |= ϕ. We show that Γ ` ϕ.We may assume that Γ is consistent, or else our result is trivial.

Divide Γ into three parts in the obvious way:

Γ = Γall ∪ Γsome ∪ Γno.

The proof now splits into two cases.Case I ϕ is a Some-sentence. Let M be from Lemma 2.21 for Γall ∪ Γsome.

Subcase 1 M |= Γno. Then by hypothesis, M |= ϕ. Then Lemma 2.17 showsthat Γ ` S, as desired.

Subcase 2 There is some No p are q in Γno such that [[p]]∩ [[q ]] 6= ∅. And again,Lemma 2.17 shows that Γall ∪ Γsome∪ ` Some p are q. Using (X), Γis inconsistent.

Case II ϕ is a sentence in All or No. Let M be from Lemma 2.23 for Γall ∪ Γno.

4Please do not confuse this with reductio ad absurdum. See page 53 for more on this.

2.5. ADDITIONAL EXERCISES 17

Subcase 1 M |= Γsome. Then by hypothesis M |= ϕ. By Lemma 2.23, Γ ` ϕ.Subcase 2 there is some sentence Some p are q in Γsome such that M 6|= Some p are q.

But then M |= No p are q. By Lemma 2.23, Γ ` No p are q. So againusing (X), we see that Γ is inconsistent.

�

2.5. Additional Exercises

In these exercises, we are concerned with the full proof system.

Exercise 10 Prove that if Γ ` ϕ, then there are infinitely many proof treeswhich establish that Γ ` ϕ.

Exercise 11 Let T be a proof tree over Γ with more than one node. Provethat either T is a proof tree over ∅, or else every atom and name which occurs in T

also occurs in some sentence in Γ.

Exercise 12 Show that Γ has a model iff Γ is consistent.

Exercise 13 Let Γ∪{Some p are q} be inconsistent. Show that Γ ` No p are q.Similarly, let Γ∪ {No p are q} be inconsistent. Show that Γ ` Some p are q. [Hereit is important to give a proof-theoretic argument. Prove the first part by inductionon the length of the shortest path in a proof tree from a leaf labeled Some p are qto a node labeled with an instance of the rule in (2.12).]

Exercise 14 Let Γ be a consistent set. Using Exercise 13, prove that thereis a consistent Γ′ ⊇ Γ such that for all p and q, Γ′ contains either Some p are q orNo p are q, but Γ′ does not contain more sentences than Γ in All or names. Wecall such a Γ′ a strong extension of Γ. [You may work with the case of Γ a finite set(even though the result holds in general), since the details in the finite case containall the real work of the general case.]

The rest of the exercises outline a different proof of Theorem 2.26. It is impor-tant not to use completeness in working those exercises.

Exercise 15 Let Γ be a consistent set of sentences, and write Γ as

ΓAll ∪ ΓSome ∪ ΓNo ∪ Γnames.

Let ϕ be a sentence in All or No. Assume that Γ |= ϕ, and prove semantically thatΓAll ∪ ΓNo |= ϕ as well. [That is, take M |= ΓAll ∪ ΓNo. Find a model M+ so thatM is a submodel of M+, and M+ |= Γ. Our assumption on Γ tells us that M+ |= ϕ.And the submodel condition implies that M |= ϕ.]

Exercise 16 Let Γ be consistent. Let M be the model from the proof ofLemma 2.21 for ΓAll ∪ ΓSome ∪ Γnames. Show that M |= Γ.

It follows from this fact that every consistent set Γ has a model.

Exercise 17 Use Exercises 13, 15, and 16 to give a different proof of theCompleteness Theorem 2.26. [You’ll need to show that if Γ is consistent and Γ |= ϕ,then Γ ` ϕ. For this, we again need a split into cases. The cases of All and No useTheorem 2.24.]

Exercise 18 The classical syllogisms also considered sentences Some p is nota q. In our setting, it makes sense also to add other sentences with negative verbphrases: J is not a p, and J is not M . Give some sound proof rules for thesesentences (on top of the system we already have).


(x, y, x) (x, y, y)(x, y, u) (x, y, v) (u, v, z)

(x, y, z)

Figure 2.5. The logic of All x which are y are z, written here (x, y, z).

Exercise 19 Adding your rules to those in Figure 2.6, prove the completenessof your system. [It will probably be easiest to use the method of Exercise 17. Havingthe extra sentences around adds a balance to the system and often makes it easierto prove theoretical properties like completeness, despite the additional cases thatcome from a bigger syntax.]

Exercise 20 Consider the language L with All p are q, Some p are q, No pare q, and Some p are not q, But

ϕ = All p which are q are n

is not part of the language L. The problem here is to show that ϕ cannot beexpressed L, not even by a set of sentences. That is, there is no set Γ of sentencesin L such that for all M, M |= Γ iff M |= R. Here is an outline of the proof.

(1) Consider the model M with universe {x, y, a} with [[p]] = {x, a}, [[q ]] ={y, a}, [[n]] = {a}, and also [[u]] = ∅ for other atoms u, Consider also amodel N with universe {x, y, a, b} with [[p]] = {x, a, b}, [[q ]] = {y, a, b},[[n]] = {a}, and the rest of the structure the same as in M. Show that forall sentences ψ in L, M |= ψ iff N |= ψ.

(2) Suppose towards a contradiction that we could express ϕ, say by the setΓ. Then since M and N agree on all sentences of L, they agree on Γ. ButM |= R and N 6|= R, a contradiction.

Exercise 21 As a continuation of Exercise 20, show that in L we cannotexpress No p which are q are n.

Exercise 22 For any sentence S, let S[J/M ] be the same as S except thatall J ’s are replaced by M ’s. Suppose that the set Γ has the property that if S ∈ Γ,then S[J/M ] ∈ Γ. Show that for all S, if Γ |= S, then Γ |= S[J/M ].

2.5.1. Exercises on a new construct: All x which are y are z. We con-clude this chapter with a few exercises on a logic for sentences of the form

All x which are y are z.

To save space, we abbreviate this by (x, y, z). We take this sentence to be true in agiven model M if [[x ]] ∩ [[y ]] ⊆ [[z ]]. Note that All x are x is semantically equivalentto (x, x, y).

Exercise 23 Prove that the logic of All x which are y are z in Figure 2.5 iscomplete. [Hint: you can do this with a one-point model, just as in the completenessresult for L(all), Theorem 2.12.]

Exercise 24 Let Γ be a set of (x, y, z) sentences.

2.5. ADDITIONAL EXERCISES 19

(1) Show that if Γ |= (x, y, z), then Γ |= (y, x, z).(2) Show that if Γ ` (x, y, z), then Γ ` (y, x, z).(3) Suppose that we remove the axiom (x, y, y), and in its place take the

symmetry rule(y, x, z)(x, y, z)

Show that the new system is complete.

Exercise 25 For each sentence ϕ = All p are q, let ϕ∗ = (x, x, y). If Γ is aset of sentences of the first fragment, let Γ∗ = {ϕ∗ : ϕ ∈ Γ}. It is easy to check thatif Γ ` ϕ, then Γ∗ ` ϕ∗. Prove the converse. We say that the system of this sectionis a conservative extension of the system for All.

Sources for this chapter. The material on syllogistic fragments originates withMoss [18].


All p are pAll p are n All n are q

All p are q

Some p are qSome q are p

Some p are qSome p are p

All q are n Some p are qSome p are n

All p are n No n are qNo p are q

No p are qNo q are p

No p are pNo p are q

No p are pAll p are q

Some p are q No p are qϕ

Figure 2.6. The full set of rules of S.

CHAPTER 3

Syllogistic Logic with Complements: S†

This chapter presents a logic for statements of the form All p are q and Some pare q. As in Chapter 2, p and q are intended as (plural) nouns or other expressionswhose natural denotation is as subsets of an underlying universe. The novelty hereis to add an explicit complement operator to the syntax. So we now can say, forexample, All p′ are q, or Some p′ are q. The point of the chapter is to present asound and complete proof system for the associated entailment relation.

The methods that we have seen in Chapter 2 do not work, and so this chapter need an exercisegoes in a different direction, building models using a representation theorem comingfrom quantum logic. Before we turn to our logic, we take a detour in Section 3.1 toremind you about the completeness of classical sentential logic. That section is avery quick presentation, and so it is not a terribly good way to learn about sententiallogic. But its aim is to expose the connection between representation theorems inalgebra and completeness theorems in logic. So to showcase this connection, weeliminate everything that is not related to it.

3.1. Background: Completeness of Basic Sentential Logics

This section presents a short review of sentential logic1, aiming mainly at pre-senting in outline form some arguments for completeness. We do this partly becausewe need the results later on, and partly because we develop variations of those ar-guments.

There are many formulations of sentential logic, each with advantages anddisadvantages. To settle matters, we fix a presentation. We start with a collectionAtSen of atomic sentences p, q, . . .; sometimes we use subscripts for these2. Thiscollection AtSen may be any set, and it will be important in the future to takevarious sets as AtSen. We form sentences over AtSen using the symbols ¬, ∧, and∨. We use Greek letters such as ϕ, ψ, and χ to denote sentences. So our syntaxgenerates sentences such as p ∧ ¬(p ∨ q), ¬¬p, etc. The set of sentences generatedfrom AtSen will be called Sen.

The semantics of sentential logic is given in terms of valuations. These arefunctions v : AtSen→ {T,F}. Each v extends to a function v : Sen→ {T,F} using

1Sentential logic is also called propositional logic.2Note the difference in font between atomic sentences of sentential logic p, q, . . . and unary

atoms of syllogistic languages p, q, . . .

21

22 3. SYLLOGISTIC LOGIC WITH COMPLEMENTS: S†

the truth tables below.ϕ ¬ϕT FF T

ϕ ψ ϕ ∧ ψT T TT F FF T FF F F

ϕ ψ ϕ ∨ ψT T TT F TF T TF F F

We use ϕ→ ψ as an abbreviation for (¬ϕ)∨ψ, and ϕ↔ ψ for (ϕ→ ψ)∧ (ψ → ϕ).Suppose that v(p) = T and v(q) = F. As an example of the calculations, we

verify in detail that v((p ∧ q)→ q) = T:

v((p ∧ q)→ q) = v(¬(p ∧ q)) ∨ v(q)= ¬(v(p) ∧ v(q))→ F= ¬(T ∧ F) ∨ F= ¬F ∨ F= T

We say that v satisfies ϕ if v(ϕ) = T. We also say that v makes ϕ true. IfΓ ⊆ Sen and ϕ ∈ Sen, we write Γ |= ϕ to mean that every model v which satisfiesall sentences in Γ also satisfies ϕ. If Γ is the empty set, then we drop it from thenotation. And to say that |= ϕ means that every valuation makes ϕ true; in thiscase we call ϕ a tautology or a valid sentence (of sentential logic). For example,p ∨ (¬p) is a tautology.

Lemma 3.1 The following sentences are tautologies of sentential logic.(1) p→ p.(2) q→ (p→ q).(3) (p→ r)→ ((p→ (r→ q))→ (p→ q)).(4) (p→ q)→ ((¬p→ q)→ q).

Moreover, every substitution instance of a tautology is again a tautology.

We should explain the point at the end about substitution instances. It meansthat if we take one of our tautologies, say q → (p → q), and replace p by anysentence (say ¬q, just for an example) and also replace q by any sentence (saypandq), then we get another tautology:

(p ∧ q)→ (¬q→ (p ∧ q))

We are going to mention a proof system shortly, and this will provide a notionΓ ` ϕ that is intended to match the semantic notion Γ |= ϕ which we have justseen. But before we do this, it will be good to make an important distinction.

Definition 3.2 Let AtSen be a set of atomic sentences, Let C be a class ofvaluations, and let ` be a proof system for Sen.

(1) Γ |= ϕ on C means that if v ∈ C and v(ψ) = T for all ψ ∈ C, then alsov(ϕ) = T.

(2) ` is weakly complete for C iff whenever |= ϕ on C, then also ` ϕ. In otherwords, every sentence which is valid on C is provable.

(3) A proof system ` is (strongly) complete for C iff whenever Γ |= ϕ on C,then also Γ ` ϕ.

Unless otherwise mentioned, complete means strongly complete.

3.1. BACKGROUND: COMPLETENESS OF BASIC SENTENTIAL LOGICS 23

ϕ ϕ is a sentential tautologyϕ ϕ→ ψ

ψmodus ponens

Figure 3.1. The smallest sentential logical system SL. It turnsout that this system SL is strongly complete for sentential logic.We shall also be interested in logical systems which extend thissystem by adding additional axioms which are sound for variousclasses C of models.

Definition 3.3 A sentential logical system is any proof system ` for Senextending the system in Figure 3.1. That is, the axioms of ` must include thetautologies of sentential logic, and the only rule of the system is modus ponens.

Theorem 3.4 (Deduction Theorem) Let ` be a sentential logical system. Sup-pose that Γ ∪ {ϕ} ` ψ. Then Γ ` ϕ→ ψ.

Proof By induction on the height of the derivation showing Γ ∪ {ϕ} ` ψ. Ifthe height is 1, then either ψ is an axiom of the system, ψ ∈ Γ, or ψ is ϕ. If ϕ is anaxiom or if ψ ∈ Γ, then we use the tautology ψ → (ϕ → ψ) in Lemma 3.1 part 2.That is, we use the proof tree

ψ ψ → (ϕ→ ψ)ϕ→ ψ

If ψ is ϕ, then note that ϕ→ ϕ is a tautology.Assume that Γ ∪ {ϕ} ` ψ with a derivation of height n + 1, and suppose that

the last step concludes ψ using modus ponens from χ and χ→ ψ. Then Γ∪{ϕ} ` χwith a proof tree of height ≤ n, and also Γ ∪ {ϕ} ` χ → ψ with a proof tree ofheight ≤ n. By induction hypothesis,

Γ ` ϕ→ χ and Γ ` ϕ→ (χ→ ψ).

Then we use Lemma 3.1, part 3. �

Definition 3.5 In our statement of Lemma 3.6, we need some general defi-nitions. For any set Γ in any sentential logical system Sen, let

Γat = Γ ∩ AtSen¬Γat = {6 ϕ : ϕ ∈ Γat}Γ+ at = Γat ∪ ¬Γat

Lemma 3.6 Let AtSen be a set of atomic sentences, Let ` be a sentential logicalsystem for Sen, and let Γ be maximal `-consistent. Let v : AtSen → {T,F} be anyvaluation. If v(Γ+ at) = T, then v(Γ) = T also.

Proof We show that for all ϕ, v(ϕ) = T iff ϕ ∈ Γ. By induction on ϕ. Thebase of the induction is when ϕ is an atomic sentence. Then ϕ ∈ Γ+ at, so v(ϕ) = T.And if v(ϕ) = T, we claim that ϕ ∈ Γ+ at. For if not, then ϕ is a negation, say ¬ψ,and ψ ∈ Γ+ at. But then v(ψ) = T, so v(ϕ) = F. This is a contradiction.


x ∧ x = xx ∧ y = y ∧ xx ∧ (y ∧ z) = (x ∧ y) ∧ zx ∧ (x ∨ y) = xx ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z)¬(x ∧ y) = (¬x) ∧ (¬y)1 ∧ x = x0 ∧ x = 0x ∧ ¬x = 0

x ∨ x = xx ∨ y = y ∨ xx ∨ (y ∨ z) = (x ∨ y) ∨ zx ∨ (x ∧ y) = xx ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z)¬(x ∨ y) = (¬x) ∧ (¬y)1 ∨ x = 10 ∨ x = xx ∨ ¬x = 1

Figure 3.2. The laws of boolean algebra.

Now assume our statement for ϕ; we prove it for ¬ϕ. The key point is that Γis maximal `-consistent. So we have

¬ϕ ∈ Γ iff ϕ /∈ Γ by maximalityiff v(ϕ) 6= T by induction hypothesisiff v(ϕ) = Fiff v(¬ϕ) = T

The induction steps for ∧ and ∨ are similar, and they again use maximal `-consistency and the induction hypothesis. �

3.1.1. Boolean algebras.

Definition 3.7 A boolean algebra is a tuple

B = (B, 1, 0,¬,∧,∨)

such that B is a set; 1, 0 ∈ B: ¬ : B → B; and ∧,∨ : B ×B → B such the laws inFigure 3.2 hold.

There are a number of abbreviations that are adopted in the study of booleanalgebras. We write x ≤ y for x ∧ y = x. (This is equivalent to writing x ∨ y = y.)We also write x→ y for ¬x ∨ y, and we write x↔ y for (x→ y) ∧ (y → x).

Example 3.8 For us, there are three main examples of boolean algebras. Twowill be presented here, and the last in Example 3.9.

First, {T,F} with the truth table operations is a boolean algebra. We take 1to be T and 0 to be F.

Second, for every set X, the power set of X is a boolean algebra:

P(X) = (P(X), X, ∅,∧,∨,′ ).Here P(X) is the set of all subsets of X. Two examples of subsets of X are X itselfand ∅, and these play the roles of 1 and 0. The ∧ and ∨ of P(X) are intersectionand union, respectively, and the complement operation is given by A′ = X \ A.(This is the complement of A relative to X.)

Example 3.9 We shall be interested in a class of boolean algebras derivedfrom sentential logics and sets of sentences. Let ` be a sentential logic, and let

3.1. BACKGROUND: COMPLETENESS OF BASIC SENTENTIAL LOGICS 25

Γ ⊆ Sen. We define the syntactic boolean algebra BΓ` as follows: let ≡ be therelation on Sen defined by

ϕ ≡ ψ iff Γ ` ϕ↔ ψ.

Then ≡ is an equivalence relation, and indeed it is a congruence. This means thatif ϕ ≡ ϕ′ and ψ ≡ ψ′, then ϕ∧ψ ≡ ϕ′∧ψ′; and similarly for ¬ and ∨. The quotientset Sen/≡ is then a boolean algebra, where we define the operations in the naturalway: for example, ¬[ϕ] = [¬ϕ]. We also take 1 to be [p ∨ ¬p] and 0 to be [p ∧ ¬p].This boolean algebra is denoted BΓ. It is sometimes called the Lindenbaum-Tarskialgebra of Γ.

Definition 3.10 Let B be a boolean algebra. A set F ⊆ B with (1)–(3) iscalled a filter on B.

(1) If x ∈ U and x ≤ y, then y ∈ U .(2) If x, y ∈ U , then x ∧ y ∈ U .(3) 1 ∈ U , and 0 /∈ U .

An ultrafilter on B is a filter with the additional property that(4) For all x ∈ B, either x ∈ U or ¬x ∈ U .

Example 3.11 Let B be P({1, 2, 3}), the power set of a three-element set,considered as a boolean algebra (see Example 3.8). One example of a filter wouldbe

{{1, 2}, {1, 2, 3}}This is not an ultrafilter because, for example, it contains neither {1} nor its com-plement {2, 3}. An example of an ultrafilter, consider {{2}, {1, 2}, {1, 2, 3}}. (Thisis the set of supersets of {2}. Let us write this as 2↑. In general, if X is any set theemphprincipal ultrafilters on the boolean algebra P(X) are the sets of the form x↑

for some x ∈ X.)

Proposition 3.12 Let B be a boolean algebra, and let F ⊆ B be a filter. Thenthere is some ultrafilter U of B such that F ⊆ U .

Proof We use Zorn’s Lemma. Let

PF = {E ⊆ B : F ⊆ E and E is a filter}.PF is a partially ordered set under the inclusion relation, and it is also closed underunions of chains. (That is, the union of any set of filters is a filter.) Thus Zorn’sLemma applies, and we obtain an element of PF which is maximal with respect toinclusion. Let U be such an element. Then U is a filter and F ⊆ U . We claim thatU is indeed an ultrafilter. For suppose not. Then there is some x ∈ B such thatneither x nor ¬x belongs to U . It follows that U ∪ {x} is not a subset of any filter,and similarly for U ∪ {¬x}. This point about U ∪ {x} implies that there is a finiteset y1, . . . , ym ∈ U such that

y1 ∧ · · · ∧ ym ∧ x = 0.

(For if not, then {b ∈ B : for some y1, . . . , ym ∈ U ,∧yi ∧ x ≤ b} would be a fil-

ter strictly larger than U , contradicting maximality.) Similarly, there is a finitez1, . . . , zn ∈ U such that

z1 ∧ · · · ∧ zn ∧ (¬x) = 0.


And now, note that

y1 ∧ · · · ∧ ym ∧ z1 ∧ · · · ∧ zn= (

∧yi ∧

∧zj) ∧ (x ∨ ¬x)

= (∧yi ∧

∧zj ∧ x) ∨ (

∧yi ∧

∧zj ∧ ¬x)

= 0 ∧ 0= 0

But∧yi ∧

∧zj belongs to U , so by the filter property, 0 ∈ U also. And this

contradicts U being a filter. �

3.1.2. On the completeness of sentential logics.

Lemma 3.13 Let ` be a sentential logic, let Γ ⊆ Sen, and let

Γ = {[ϕ] : Γ ` ϕ}.(1) Γ is `-consistent iff Γ is a filter on BΓ`.(2) Γ is maximal `-consistent in the logic iff Γ is an ultrafilter on BΓ`.

Proof Suppose first that Γ is `consistent. If [ϕ] and [ψ] belong to Γ, thenΓ ` ϕ and Γ ` ψ. Using the tautology ϕ→ (ψ → (ϕ ∧ ψ)), we see that Γ ` ϕ ∧ ψ.add to the tautology

list And if Γ ` ϕ and |= ϕ → ψ, then Γ ` ψ. This time, the tautology is ϕ → ((ϕ →ψ)→ ψ). It is immediate that [F] /∈ Γ. Therefore Γ is a filter.

In the other direction, assume that Γ is a filter. Since [F] /∈ Γ, Γ is `consistent.If Γ is maximal `consistent, then for every sentence ϕ, either ϕ ∈ Γ or ¬ϕ ∈ Γ.

If follows that for all ϕ, either [ϕ] ∈ Γ or [¬ϕ] ∈ Γ. So Γ is an ultrafilter. Theconverse is similar. �

Proposition 3.14 Let ` be a sentential logic, and let Γ be a set of sentencesin Sen. If Γ is `-consistent, then there is some maximal `-consistent ∆ ⊇ Γ.

Proof By Lemma 3.13 and Proposition 3.12. �

Lemma 3.15 Let ` be a sentential logic, let Γ be a set of sentences in Sen, andlet C be a class of valuations. The following are equivalent assertions:

(1) ` is complete for C: If Γ |= ϕ on C, then Γ ` ϕ.(2) Every `-consistent set Γ has a model in C.(3) Every maximal `-consistent set Γ has a model in C.

Proof (1)⇒(2): Let Γ be `-consistent. If Γ has no models in C, then Γ |= ϕon C and Γ |= ¬ϕ on C, for any ϕ. By completeness, Γ ` ϕ, and Γ ` ¬ϕ. So Γ is`-inconsistent, a contradiction.

(2)⇒(3) is trivial.The important direction for us is (3)⇒(1). Suppose that every on `-consistent

set has a model in C, and suppose that Γ |= ϕ. If Γ 6` ϕ, then Γ ∪ {¬ϕ} must be`-consistent. (For if Γ∪{¬ϕ} ` ⊥, then by the Deduction Theorem, Γ ` ¬ϕ→ ⊥.)Now we have a tautology (¬ϕ → ⊥) → ϕ. And so Γ ` ϕ after all. Let ∆ ⊇ Γ bemaximal `-consistent; ∆ exists by Proposition 3.14. By hypothesis, we have someM |= ∆ with M ∈ C. A fortiori, M |= Γ. �

Theorem 3.16 Let AtSen be any set of atomic sentences. Let C be the classof all valuations v : AtSen → {T,F}. The proof system in Figure 3.1 is sound andcomplete for C: Γ |= ϕ iff Γ ` ϕ.

3.2. ADDING SENTENTIAL BOOLEAN OPERATIONS TO L(all, some) 27

Proof The soundness is an easy induction. Lemma 3.15 reduces the com-pleteness to the verification that every Γ which is maximal consistent in the logicis satisfied by some valuation v. Let v(ϕ) be defined for atomic ϕ by

v(ϕ) = T iff ϕ ∈ Γ.

Recall the set Γ+ at from Definition 3.5. We claim that for all ϕ ∈ Γ+ at,v(ϕ) = T. For ϕ ∈ AtSen, this is immediate. And if ϕ = ¬ψ for some ψ ∈ Γ+ at,then we know that ψ /∈ Γ (lest Γ be inconsistent). So v(ψ) = F; hence v(ϕ) =¬v(ψ) = ¬F = T.

By this claim and Lemma 3.6, v(Γ) = T. This completes the proof. �Expand on this!

3.1.3. Why bother with representation theorems? After going throughthis exercise, you might wonder why anyone would bother proving representationtheorems if all they are interested in is the completeness theorem that comes as acorollary. The reason is that formulating logical matters in algebraic terms makesit easier to see solution to the mathematical problems underlying completeness.

3.2. Adding Sentential Boolean Operations to L(all, some)

At this point, we combine what we did in Chapter 2 with our work in Section 3.1just above. We let

(3.1) L(all, some)bc = SentLogic(L(all, some))

That is, we study sentential logic where the atomic sentences are the sentences ofL(all, some). In addition to the sentences in L(all, some), we also have sententialconnectives ¬, ∧, and ∨. But everything is built out of sentences in L(all, some).We call this language L(all, some)bc, with “bc” standing for “boolean connectives.”

We must settle on a class C of valuations in order to state and prove a com-pleteness result. The semantics of L(all, some) is based on models M = (M, [[ ]]).Every M for L(all, some) gives a valuation

vM : L(all, some)→ {T,F},given by

vM(ϕ) ={

T if M |= ϕF if M 6|= ϕ

In the rest of this section, let

C = {vM : M is a model for the language L(all, some)}.Our goal is to craft a sentential logical system ` and then to show that Γ |= ϕ

on C iff Γ ` ϕ.

Remark When we defined models and semantics in Chapter 2, we defined whatit meant to say that M |= ϕ for ϕ in L(all, some,no). We can easily extend thisdefinition to cover the case when ϕ is in the larger set L(all, some,no)bc. To saythat Γ |= ϕ on the class C is exactly the same thing as saying that for all modelsM, if M |= ψ for all ψ ∈ Γ, then also M |= ϕ. In other words, one may reformulatethe notion Γ |= ϕ in a way that does not mention valuations. The two notions areequivalent.


(1) All p are p(2) (All p are n) ∧ (All n are q)→ All p are q(3) (All q are n) ∧ (Some p are q)→ Some n are p(4) Some p are q → Some p are p(5) ¬(Some p are p)→ All p are q

Figure 3.3. Axioms for L(all, some)bc, the language of booleancombinations of sentences in L(all, some). We also have all sub-stitution instances of sentential tautologies. The only rule of thissystem is modus ponens.

We present our sentential logical system for L(all, some)bc in Figure 3.3. Aswith any sentential logical system, the system also includes the substitution in-stances of the tautologies as axioms, and the only rule is modus ponens. Thesoundness of this system is routine.

It should be mentioned that although we are going to prove that our logic iscomplete, this doesn’t mean that it is a very convenient system to work in. Tobe sure, constructing formal proofs in Hilbert-style systems with no shortcuts is atedious matter, as the next example shows.

Example 3.17 Let α, . . ., ε be the sentences listed below.

α All p are uβ All p are vγ Some u are v

δ Some p are pε Some p are u

We wish to show that {α, β,¬γ} ` ¬δ. We’ll exhibit a proof tree for this, usingsome abbreviations. First, note that (α∧ δ)→ ε and (ε∧β)→ γ are axioms of ourlogic. Second, define some additional sentences as follows:

ϕ1 ¬γ → ¬δϕ2 β → ϕ1

ϕ3 α→ ϕ2

ϕ4 ((ε ∧ β)→ γ)→ ϕ3

ϕ5 ((α ∧ δ)→ ε)→ ϕ4

One can check that ϕ5 is a substitution instance of a tautology of sentential logic.Now we have a proof tree over Γ:

ϕ5 (α ∧ δ)→ εϕ4 (ε ∧ β)→ γ

ϕ3 αϕ2 β

ϕ1 ¬γ¬δ

This shows that indeed {α, β,¬γ} ` ¬δ.Of course, in constructing this example one takes an informal proof and derives

ϕ1, . . . , ϕ5 from it. A friendlier system would allow one to directly use proof strate-gies such as hypothetical proofs and proofs by contradiction, without the detourthrough long sentential tautologies.

3.2. ADDING SENTENTIAL BOOLEAN OPERATIONS TO L(all, some) 29

Theorem 3.18 Let ` be the sentential logical system for L(all, some)bc definedin Figure 3.3, and let C be the class of valuations derived from models. Then theneed a name herefollowing completeness theorem holds: Γ |= ϕ on C iff Γ ` ϕ.

Proof By Lemma 3.15, we only need to show that that every Γ which ismaximal `-consistent has a model of the form vM. This maximality implies thatfor every ϕ, either ϕ ∈ Γ or else ¬ϕ ∈ Γ. And Γ is closed under deduction inL(all, some)bc.

We have seen the notation Γsome for the set of sentences in Γ of the form Somep are q. Let Γ¬all be the set of ¬(All p are q) sentences in Γ.

Let M be given by:

M = Γsome ∪ Γ¬all[[u]] = {ϕ ∈ Γsome : ϕ is Some p are q, and either p ≤ u or q ≤ u}

= ∪ {ϕ ∈ Γ¬all : ϕ is ¬(All p are q) and p ≤ u}Notice that M is the union of Γsome and Γ¬all. We assume that this union isdisjoint; i.e., no sentence belongs to both sets.

We have also seen the notation Γ+ at in Definition 3.5.

Claim For all ϕ ∈ Γ+ at, M |= ϕ.

Proof First, suppose that Γ contains the sentence All u are v. So u ≤ v, andit is easy to check that [[u]] ⊆ [[v ]]. So All u are v in Γ is true in M.

Second, consider a sentence Some u are v in Γ. Call this sentence ϕ. Thenϕ ∈ Γsome, and indeed ϕ ∈ [[u]] ∩ [[v ]]. Hence [[u]] ∩ [[v ]] 6= ∅, and so ϕ is true in M.

Continuing, suppose that ¬(All u are v) in Γ. Again, call this ϕ. This time,ϕ ∈ Γ¬all. Note that ϕ ∈ [[u]]. We claim that ϕ /∈ [[v ]], and from this it followsthat ϕ is true in M. For if ϕ ∈ [[v ]], it must be the case that u ≤ v. That is,Γ ` All u are v. Hence Γ is `-inconsistent, and this is a contradiction.

Finally, suppose that Γ contains ¬(Some u are v). Suppose towards a contra-diction that this sentence were true in M. Thus we have some ϕ ∈ [[u]] ∩ [[v ]]. Wehave two cases depending on whether ϕ ∈ Γsome or ϕ ∈ Γ¬all. Suppose first thatϕ ∈ Γsome, say ϕ is Some p are q. Note that Γ ` ϕ, since ϕ ∈ Γsome. Then wehave four subcases, and two of these are typical: first, p ≤ u and q ≤ v; second,p ≤ u and p ≤ v. In the first subcase, we easily get Some u are v from Γ; seethe proof displayed in line (2.3) on page 11. In the second subcase, we again getSome u are v from Γ, but with a slightly different derivation. Either way, we have acontradiction in this case. The second case is when ϕ ∈ Γ¬all. In this case, write ϕas ¬(All p are q). Then p ≤ u and p ≤ v. Using what we did in Example 3.17, weget ¬Some p are p from Γ. And so we also have All p are q. So Γ is `-inconsistent,and this is a contradiction. �

At this point, our claim is shown. By Lemma 3.6, M |= Γ.This concludes our proof of Theorem 3.18. �

Exercise 26 Where in the proof of Theorem 3.18 were axioms (2)–(6) inFigure 3.3 used? Our proof did not make this clear.


3.3. Syntax and Semantics of S†

We introduce a language S† which is strictly bigger than S in that it has noun-level negation rather than sentence-level negation.

In the syntax, we again begin with a set P of (unary) atoms. We use p, q, . . .,for atoms. The idea once again is that unary atoms represent plural common nouns.Let Lit be P + P. (The + here is the disjoint union, so we have two copies of P.)We call the elements of this set literals following uses in areas of logic. We writethe elements of Lit as either atoms p, q, etc., or as complemented atoms p′, q′, . . ..Moreover, we extend this idea of complementation to a function complementationoperation ′ : Lit → Lit on the literals such that p′′ = p for all literals p. (Yes, weuse the same letters p, q, etc. to range over literals in this chapter.) This involutiveproperty implies that complementation is a bijection on Lit. Then we considersentences All p are q and Some p are q. Here p and q are any literals, including thecase when they are the same. We call this language S†. We shall use letters like ϕto denote sentences.

Semantics. One starts with a set M and a subset [[p]] ⊆ M for each literalp, subject to the requirement that [[p′]] = M \ [[p]] for all p. This gives a modelM = (M, [[ ]]). We then define the satisfaction relation M |= ϕ just as in Chapter 2,and also derived notions such as Γ |= ϕ.

Example 3.19 We claim that Γ |= All x are z, where

Γ = {All y′ are p,All p are q,All q are y,All y are p,All q are z}.Here is an informal explanation. Since all y and all y′ are p, everything whatsoeveris a p. And since all p are q, and all q are y, we see that everything is a y. Inparticular, all x are y. But the last two premises and the fact that all p are q alsoimply that all y are z. So all x are z.

Exercise 27 Here is an example which we mention mostly as a challenge.Let Γ be the set of the sentences below:

All y are x,All y′ are x,All z′ are y,All z are y′,All z are wSome x are x,Some x′ are x′,Some y are y,Some y′ are y′,Some z are z,Some z′ are z′

.

It is not true thatΓ ` All y are w.

Find a model M |= Γ where [[y ]] 6⊆ [[w ]]. We shall return to this problem later(Exercise 31), after we have the tools to solve this kind of problem in a general way.

No. In previous work, we took No p are q as a basic sentence in the syntax.There is no need to do this here: we may regard No p are q as a variant notationfor All p are q′. So the semantics would be

M |= No p are q iff [[p]] ∩ [[q]] = ∅In other words, if one wants to add No as a basic sentence forming-operation, on apar with Some and All, it would be easy to do so.

3.3. SYNTAX AND SEMANTICS OF S† 31

All p are p AxiomSome p are qSome p are p

Some1Some p are qSome q are p

Some2

All p are n All n are qAll p are q Barbara

All q are n Some p are qSome p are n Darii

All q are q′

All q are p ZeroAll q′ are qAll p are q One

All q are p′

All p are q′Antitone

All p are q Some p are q′

SX

Figure 3.4. S†: syllogistic logic with complement.

Proof trees. We have discussed the meager syntax of S† and its semantics. Wenext turn to the proof theory. Just as in Chapter 2, a proof tree over Γ is a finitetree T whose nodes are labeled with sentences in our fragment, with the additionalproperty that each node is either an element of Γ or comes from its parent(s) by anapplication of one of the rules for the fragment listed in Figure 3.4. Γ ` ϕ meansthat there is a proof tree T for over Γ whose root is labeled ϕ.

We attached names to the rules in Figure 3.4 so that we can refer to themlater. We usually do not display the names of rules in our proof trees exceptwhen to emphasize some point or other. The names “Barbara” and “Darii” aretraditional from Aristotelian syllogisms. But the (Antitone) rule is not part oftraditional syllogistic reasoning. It is possible to drop (Some2) if one changes theconclusion of (Darii) to Some n are p. But at one point it will be convenient tohave (Some2), and so this guides the formulation. The rules (Zero) and (One) areconcerned with what is often called vacuous universal quantification. That is, ifq′ ⊆ p, then q is the whole universe and q′ is empty; so q is a superset of every setand q′ a subset. We have already seen the rule (X) rule: it permits inference of anysentence ϕ whatsoever from a contradiction.

Example 3.20 Returning to Example 3.19, here is a proof tree showing Γ `All x are z:

All y′ are pAll p are q All q are y

All p are yAll y′ are yAll x are y

All y are pAll p are q All q are z

All p are zAll y are z

All x are z

Exercise 28 Show that

{All y are p,All y′ are p} ` All x are p.


Exercise 29 Show that

{All y are p,All y′ are p,All q are z,Some x are z′} ` Some p are q′.

Lemma 3.21 The following are derivable:(1) Some p are p′ ` ϕ (a contradiction fact)(2) All p are n,No n are q ` No q are p (Celarent)(3) No p are q ` No q are p (E-conversion)(4) Some p are q,No q are n ` Some p are n′ (Ferio)(5) All q are n,All q are n′ ` No q are q (complement inconsistency)

Proof For the assertion on contradictions,

All p are p Axiom Some p are p′

SX

(Celarent) in this formulation is just a re-phrasing of (Barbara), using complements:All p are n All n are q′

All q are n′ Barbara

(E-conversion) is similarly related to (Antitone), and (Ferio) to (Darii). For com-plement inconsistency, use (Antitone) and (Barbara). �

The logic is easily seen to be sound: if Γ ` ϕ, then Γ |= ϕ. The main contribu-tion of this section is the completeness of this system.

Some syntactic abbreviations. The language lacks boolean connectives, but itis convenient to use an informal notation for it. It is also worthwhile specifying anoperation of duals.

¬(All p are q) = Some p are q′

¬(Some p are q) = All p are q′(All p are q)d = All q′ are p′

(Some p are q)d = Some q are p

Here are some uses of this notation. We say that Γ is inconsistent if for some ϕ,not much of an ap-plication! Γ ` ϕ and Γ ` ¬ϕ. The first part of Lemma 3.21 tells us that if Γ ` Some p is p′,

then Γ is inconsistent. Also, we have the following result:

Exercise 30 If ϕ ` ψ, then ¬ψ ` ¬ϕ. What is the parallel result for thed-operation? [This fact is not needed in this chapter, but we recommend thinkingabout it as a way of getting familiar with the rules.]

the section on re-ductio proofs wasdropped

3.3.1. Continuing with the alternative notation. We have seen the lan-guage S in our alternative notation in Section 2.2.1. We extend this to the languageS† in several steps. First, in addition to unary atoms, we have literals. Foreshad-owing later notation, we use letters like l and m for literals. The literals come witha complementation operation, and we assume that l′′ = l for all l. Then sentencesof S† are those of the form

(3.2) ∃(l,m), ∀(l,m).

We gloss these as as for S, except that we sometimes require negated subjects:∃(p, q) Some non-p is not a q ∀(p, q) Every non-p is a q.

3.4. COMPLETENESS VIA REPRESENTATION OF ORTHOPOSETS 33

3.4. Completeness via Representation of Orthoposets

An important step in our work is to develop an algebraic semantics for S†.There are several definitions, and then a representation theorem. As with otheruses of algebra in logic, the point is that the representation theorem is also a modelconstruction technique.

An orthoposet is a tuple P = (P,≤, 0, ′) such that(1) (P,≤) is a partial order: ≤ is a reflexive, transitive, and antisymmetric

relation on the set P .(2) 0 is a minimum element: 0 ≤ p for all p ∈ P .(3) x 7→ x′ is an antitone map in both directions: x ≤ y iff y′ ≤ x′.(4) x 7→ x′ is involutive: x′′ = x.(5) complement inconsistency: If x ≤ y and x ≤ y′, then x = 0.

The notion of an orthoposet mainly appears in papers on quantum logic. (Infact, the stronger notion of an orthomodular poset appears to be more central there.However, I do not see any application of this notion to logics of the type consideredin this chapter.)

Example 3.22 The example below is sometimes called the Chinese lantern,and we’ll call it P2.3

1

��

ppppppppppppppp

<<<<<<<<

NNNNNNNNNNNNNN

p p′ q q′

0

>>>>>>>

NNNNNNNNNNNNNNN

��

pppppppppppppp

Here and elsewhere, we understand (x′)′ = x, 0′ = 1, 1′ = 0.

Example 3.23 For example, for all sets p we have an orthoposet (P(X),⊆, ∅, ′),where ⊆ is the inclusion relation, ∅ is the empty set, and a′ = X \ a for all subsetsa of p.

Definition 3.24 Let Γ be any set of sentences in S†. Γ need not be consistent.Definition 2.9 defines the fundamental relation ≤ from Γ and the logic, and Propo-sition 2.10 shows this relation to be a preorder. We have an induced equivalencerelation ≡, and we take LitΓ to be the quotient set Lit/≡. That is, we define u ≡ vto mean u ≤ v and v ≤ u. This quotient set Lit/≡ is a poset under the inducedrelation: if [u] ≤ [v] and [v] ≤ [u], then u ≡ v so that [u] = [v]. If there is somep such that p ≤ p′, then for all q we have [p] ≤ [q] in Lit/≡ In this case, set 0 tobe [p] for any such p. (If such p exists, its equivalence class is unique.) We finallydefine [p]′ = [p′]. If there is no p such that p ≤ p′, we add fresh elements 0 and 1to Lit/≡. We then stipulate that 0′ = 1, and that for all x ∈ PΓ, 0 ≤ x ≤ 1.

It is not hard to check that we have an orthoposet Lit/≡ = (Lit/≡,≤, 0, ′). Theantitone property comes from the axiom with the same name, and the complementinconsistency is verified using the similarly-named part of Lemma 3.21.

3The more standard name for this seems to be M02, with MO standing for modularortholattice.


Example 3.25 This example pertains to Exercise 27 from before. Let Γ bedefined by

Γ = {All y are x,All y′ are x,All z′ are y,All z are y′,All z are w}.Then

[x] = {x} [x′] = {x′}[y] = {y, z′} [y′] = {y′, z}[z] = {y′, z} [z′] = {y, z′}[w] = {w} [w′] = {w′}

Here is a picture of the orthoposet PΓ:

[x]

ttttttt

JJJJJJJ

[z′] = [y] [w]

[w′] [y′] = [x]

[x′]

JJJJJJJtttttt

Definition 3.26 Let P and Q be orthoposets. A morphism of orthoposetsf : P→ Q is a is a map m : P → Q preserving the order (if x ≤ y, then mx ≤ my),the complement m(x′) = (mx)′, and minimum elements (m0 = 0). We say m isstrict if the following extra condition holds: x ≤ y iff mx ≤ my.

Definition 3.27 A point of an orthoposet P = (P,≤, 0, ′) is a subset S ⊆ Pwith the following properties:

(1) If p ∈ S and p ≤ q, then q ∈ S (S is up-closed).(2) For all p, either p ∈ S or p′ ∈ S (S is complete), but not both (S is

consistent).

One should compare this definition with that of an ultrafilter (see Defini-tion 3.10). The idea in both cases is to reflect on the concrete forms of the structuresinvolved. For boolean algebras, this would mean power set algebras, and for ortho-posets this again would be the power set orthoposets. Given one of these concretestructures, call it P(X), and some x in the underlying set of the structure, we canask for an algebraic characterization of the sets sx as x varies over X:

(3.3) {A ∈ P(X) : x ∈ A}By an “algebraic characterization” here, we mean something expressible with thelanguage at hand: for boolean algebras, we would use T,F,¬,∧, and ∨ (and any-thing else defined in terms of them); for orthoposets, we would use ≤, 0 and ′. Theidea is to come up with a definition of what it means for a subset of the structureto be one of the sets sx.

Example 3.28 Let X = {1, 2, 3}, and let P(X) be the power set orthoposetfrom Example 3.23. Then S is a point, where

S = {{1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.


(More generally, if p is any finite set, then the collection of subsets of p containingmore than half of the elements of p is a point of P(X).) Also, it is easy to check thatthe points on this P(X) are exactly S as above and the three principal ultrafilters1↑, 2↑, and 3↑ mentioned in Example 3.11. S shows that a point of a booleanalgebras need not be an ultrafilter or even a filter. Also, the lemma just belowshows that for P(X), a collection of elements is included in a point iff every pair ofelements has a non-empty intersection.

Lemma 3.29 For a subset S0 of an orthoposet P = (P,≤, ′), the following areequivalent:

(1) S0 is a subset of a point S of P.(2) For all x, y ∈ S0, x 6≤ y′.

Proof Clearly (1) =⇒ (2). For the more important direction, use Zorn’sLemma to get a ⊆-maximal superset S1 of S0 with the consistency property. LetS = {q : (∃p ∈ S1)q ≥ p}. So S is up-closed. We check that consistency is not lost:suppose that r, r′ ∈ S. Then there are q1, q2 ∈ S1 such that r ≥ q1 and r′ ≥ q2.But then q′2 ≥ r ≥ q1. Since q1 ∈ S1, so too q′2 ∈ S1. Thus we see that S1 is notconsistent, and this is a contradiction. To conclude, we only need to see that forall r ∈ P , either r or r′ belongs to S. If r /∈ S, then r /∈ S1. By maximality, thereis q ∈ S1 such that q1 ≤ r′. (For otherwise, S1 ∪ {r} would be a consistent propersuperset of S1.) And as r′ /∈ S, there is q2 ∈ S1 such that q2 ≤ r. Then as aboveq1 ≤ q′2, leading to the same contradiction. �

We now present a representation theorem that implies the completeness of thelogic. It is due to Calude, Hertling, and Svozil [2]. We also state an additionaltechnical point.

Theorem 3.30 (Representation Theorem for Orthoposets [2, 11, 41]) LetP = (P,≤, ′) be an orthoposet. There is a set points(P ) and a strict morphism oforthoposets m : P→ P(points(P )).

Moreover, if S∪{p} ⊆ P has the following two properties, then m(p)\⋃q∈Sm(q)

is non-empty:(1) For all q ∈ S, p 6≤ q.(2) For all q, r ∈ S, q 6≥ r′.

Proof Let points(P ) be the collection of points of P. The map m is defined bym(p) = {S : p ∈ S}. The preservation of complement comes from the completenessand consistency requirement on points, and the preservation of order from the up-closed-ness. Clearly m0 = ∅. We must check that if q 6≥ p, then there is some pointS such that p ∈ S and q /∈ S. For this, take S = {q} in the “moreover” part. Andfor that, let T = {p}∪{q′ : q ∈ S}. Lemma 3.29 applies, and so there is some pointu ⊇ T . Such u belongs to m(p). But if q ∈ S, then q′ ∈ T ⊆ U ; so u does notbelong to m(q). �

Example 3.31 Here is a picture which illustrates the workings of the Repre-sentation Theorem 3.30. We presented an orthoposet P2 in Example 3.22. Three


subsets of the underlying set are points, and these are shown in colors below4.

1qqqqq 99 LLLL

p p′ q q′

0

MMMMM �� rrrr

1qqqqq 99 LLLL

p p′ q q′

0

MMMMM �� rrrr

1qqqqq 99 LLLL

p p′ q q′

0

MMMMM �� rrrr

1qqqqq 99 LLLL

p p′ q q′

0

MMMMM �� rrrr

We call these points •, •, •, and •. The orthoposet

Q = P({•, •, •, •})has sixteen elements, so we shall not display it. But we can illustrate the functionm : P2 → Q. For example, m(p) = {•, •}, because the points to which p belongsare • and •. Similarly, m(0) = ∅. Here is a picture of P2 and its image under minside Q:

1

��

ppppppppp;;;;

MMMMMMMM

p p′ q q′

0

====

NNNNNNNNN��

qqqqqqqq

{•, •, •, •}

pppppp

iiiiiiiiiiiiNNNNNN

VVVVVVVVVVVV

{•, •} {•, •} {•, •} {•, •}

∅

NNNNNNNN

VVVVVVVVVVVVVVVV

pppppppp

hhhhhhhhhhhhhhhh

3.4.1. Completeness. The completeness theorem is based on algebraic ma-chinery that we have just seen.

Lemma 3.32 (Pratt-Hartmann) Suppose that Γ |= Some p are q. Then thereis some sentence in Γsome, say Some a are b, such that

Γall ∪ {Some a are b} |= Some p are q.

Proof If not, then for every ϕ ∈ Γsome, there is a model Mϕ |= Γall ∪ {ϕ}and [[p]] ∩ [[q ]] = ∅ in the model. Take the disjoint union of the models Mϕ to get amodel of Γall ∪ Γsome = Γ where Some p are q fails. �

The next lemma is the key step in the completeness theorem of this section.

Lemma 3.33 Let Γ ⊆ S†. There is a model M = (M, [[ ]]) such that(1) M |= Γall.(2) If ϕ ∈ L(all) and M |= ϕ, then Γ ` ϕ.(3) If Γ is consistent, then also M |= Γsome.

Proof We write PΓ for the orthoposet Lit/≡. (Recall that this was obtainedby from the set of literals by taking quotient by p ≡ q iff p ≤ q ≤ p. See Defini-tion 3.24.) Recall that P is the set of unary atoms that is the base of the syntax ofthe language at hand. Let n be the natural map of P into PΓ, taking an atom p toits equivalence class [p]. Even though P is not an orthoposet with its order ≤, it isa preorder (see Proposition 2.10). This map n is an order-preserving map from onepreorder P into another. Moreover, n preserves the order in both directions. Wealso apply Theorem 3.30, to obtain a strict morphism of orthoposets m as shownbelow:

Pn // PΓ

m // points(PΓ)

4If you cannot see the colors, the four points are {p, q, 1}, {p′, q, 1}, {p, q′, 1}, and {p′, q′, 1}.


Let M = points(PΓ), and let [[ ]] : PΓ → P(M) be the composition m ◦ n,regarded as an order-preserving function on preorders. We thus have a modelM = (points(PΓ), [[ ]]).

We check that M |= Γ. Note that n and m are strict monotone functions. Sothe semantics has the property that the All sentences holding in M are exactly theconsequences of Γ. We turn to a sentence in Γsome such as Some u are v. Assumingthe consistency of Γ, u 6≤ v′. Thus [[u]] 6⊆ ([[v ]])′. That is, [[u]] ∩ [[v ]] 6= ∅. �

Unfortunately, the last step in this proof is not reversible, in the followingprecise sense. It does not follow from u 6≤ v′ that Γ ` Some u are v. (For example,if Γ is the empty set we have u 6≤ v′, and indeed M(Γ) |= Some u are v. But Γ onlyderives valid sentences.)

Theorem 3.34 Γ ` ϕ iff Γ |= ϕ.

Proof As always, the soundness half is trivial. Suppose that Γ |= ϕ; we showthat Γ ` ϕ. We may assume that Γ is consistent.

If ϕ ∈ L(all), consider M(Γ) from Lemma 3.33. It is a model of Γ, hence of ϕ;and then by the property the second part of the lemma, Γ ` ϕ.

For the rest of this proof, let ϕ be Some p are q. From Γ and ϕ, we find a andb satisfying the conclusion of Lemma 3.32.

We again use Lemma 3.33 and consider the model M = M(Lit/≡Γall) of points

on Lit/≡Γall. M |= Γall.

Consider {[a], [b], [p′]}. If this set were a subset of a point x, then consider {x}as a one-point submodel of M. In the submodel, Γall ∪{Some a are b} would hold,and yet Some p are q would fail since [[p]] = ∅.

We use Lemma 3.29 to divide into cases:(1) a ≤ a′.(2) a ≤ b′.(3) a ≤ p.(4) b ≤ b′.(5) b ≤ p.(6) p′ ≤ p.

(More precisely, the first case would be [a] ≤ [a′]. By strictness of the natural map,this means that a ≤ a′; that is, Γall ` All a are a′.) In cases (1), (2), and (4), weeasily see that Γ is inconsistent, contrary to the assumption at the outset. Case (6)implies that both (3) and (5) hold. Thus we may restrict attention to (3) and (5).

Next, consider {a, b, q′}. The same analysis gives two other cases, indepen-dently: a ≤ q, and b ≤ q. Putting these together with the other two gives fourpairs. The following are representative:

a ≤ p and b ≤ q: Using Some a are b, we see that Γ ` Some p are q.a ≤ p and a ≤ q: We first derive Some a are b, and then again we see Γ `

Some p are q.This completes the proof. �

Exercise 31 Look back at Exercise 27. When we posed this problem, youhad no tools to solve it and were left to merely guess at a solution. At this point,we have a systematic way to solve it using Theorem 3.34 and its key ingredient,Lemma 3.33.


(1) Example 3.25 shows a picture of the syntactic orthoposet PΓ. What arethe points of this orthoposet?

(2) Define the model M from Lemma 3.33 explicitly. That is, say what theuniverse M is, and also [[x ]], [[y ]], and [[z ]].

(3) Verify that all the sentences in Γ are true in your model, and that All yare w is false.

3.5. Adding Boolean Connectives on Sentences: S†bc

We continue our development a little further, mentioning a larger system whosecompleteness can be obtained by using our the results which we have seen. Let

(3.4) S†bc = SentLogic(S†)

(Compare with (3.1) at the beginning of Section 3.2.) This language is just sen-tential logic built over S† as the set of atomic propositions. For the semantics, weagain use valuations derived from models M = (M, [[ ]]) as we have seen them inSection 3.3. We have notions like M |= ϕ and Γ |= ϕ, defined in the usual ways.

A proof system S†bc is listed in Figure 3.5. It contains all the axioms of thesystem S

†bc in Figure 3.5, and in addition has axiom (6):

Some p are q′ ↔ ¬(All p are q)

As before, the only rule is modus ponens.

Lemma 3.35 Let Γ ⊆ S†. If Γ ` ϕ in the logic S† (Figure 3.4), then Γ ` ϕ inthe logic S†bc (Figure 3.5).

Check all of this!Proof By induction on the proof trees in the logic S†, and then by cases on

the rule used at the root of the tree. For a one-point tree T, say with label ϕ, itmust be the case that either ϕ is All p are p, or else ϕ ∈ Γ. Either way, the sametree is a tree in S†bc.

Next, suppose that our lemma is true for a tree T, and let T′ come from T byapplying (Some2) at the root. Suppose that the root of T is Some p are q. Let α,β, γ, and ϕ be defined as follows:

α Some p are qβ Some q are p

γ All q are qϕ (γ ∧ α)→ β

Note that γ and ϕ are axioms, as is ϕ→ (γ → (α→ β)). By induction hypothesis,Γ ` α in S†bc. Now we have a proof tree for S†bc:another proof?

(Some2) comes fromaxioms (1) and (3). ....

α

γ

ϕ ϕ→ (γ → (α→ β))γ → (α→ β)α→ β

β

Thus Γ ` β, as desired. For (Antitone), use the point about (Some2) and also axiom(6) twice. For (One), use axiom (6) to see that All p′ are p ↔ ¬(Some p are p′).This along with axiom (5) gives All p′ are p↔ All p′ are q′. Now we use our pointabout (Antitone) to see that All p′ are p↔ All q are p. For (X), use the DeductionTheorem and a sentential tautology. �

3.5. ADDING BOOLEAN CONNECTIVES ON SENTENCES: S†bc 39

(6) Some p are q′ ↔ ¬(All p are q)

Figure 3.5. The axiom added to the system in Figure 3.3. Theresulting sentential logical system is called S†bc, and it is sound andcomplete for S

†bc.

Theorem 3.36 The logical system S†bc is sound and complete: Γ |= ϕ iff Γ ` ϕ.

Proof The soundness is easy; here is a proof of completeness. By Lemma 3.15,we need only prove that a set Γ which is maximal consistent in S†bc has a model.So fix such a set Γ.

Let Γat and Γ+ at be as in Definition 3.5. Γ+ at must be consistent in S†; for ifnot, then it would be inconsistent in S†bc by Lemma 3.35. By Lemma 3.33, we havea model M |= Γat.

We claim that in addition M |= Γ+ at. To see this, let ϕ ∈ S†. and suppose that¬ϕ ∈ Γ+ at. If ϕ is All p are q, then let ψ be Some p are q′. Note that ` ϕ → ψ

in S†bc using axiom (6). It follows that ψ ∈ Γ, by maximality. Indeed ϕ is an atom,so ϕ ∈ Γat. But then M |= ϕ. If ϕ is Some p are q, then essentially the sameargument works.

At this point, we know that M |= Γ+ at. And then it follows from Lemma 3.6that M |= Γ. �

It is possible to add boolean compounds of the NPs in this fragment. Once onedoes this, the axiomatization and completeness result become quite a bit simpler,since the system becomes a variant of boolean algebra.

Sources for this chapter. The completeness of the system in Section 3.2 appearsin Lukasiewicz [12] (in work with S lupecki; they also showed decidability), and alsoby Westerstahl [38], and axioms 1–6 are essentially the system SYLL. The resulthere is taken from [18].

The source for the logic S in this chapter is Moss [22]. Papers mentioningorthoposets and representation theorems quite similar to Theorem 3.30 includeCalude, Hertling, and Svozil [2], Zierler and Schlessinger [41] and Katrnoska [11].


3.5.1. Summary.

Axioms:(0) All substitution instances of sentential tautologies.(1) All p are p(2) (All p are n) ∧ (All n are q)→ All p are q(3) (All q are n) ∧ (Some p are q)→ Some n are p(4) Some p are q → Some p are p(5) ¬(Some p are p)→ All p are q(6) Some p are q′ ↔ ¬(All p are q)

Rule of Inference:ϕ ϕ→ ψ

ψmodus ponens

All p are p AxiomSome p are qSome p are p

Some1Some p are qSome q are p

Some2

All p are n All n are qAll p are q Barbara

All q are n Some p are qSome p are n Darii

All q are q′

All q are p ZeroAll q′ are qAll p are q One

All q are p′

All p are q′Antitone

All p are q Some p are q′

SX

The logical systems studied in this chapter are listed above. On the top, we havesentential logical systems. The smallest of these contains only the tautologies asaxioms (0), and the only rule is modus ponens. This system is sound and completefor sentential logic. Adding (1)–(5) gives a system which is sound and complete forthe language L(all, some,no)bc, sentential logic interpreted on models. Adding (6)gives a sound and complete system for S

†bc. The bottom of the figure contains the

proof system for S†. This logic is a syllogistic logic rather than a sentential logic.

CHAPTER 4

Cardinality Comparisons

We next show that it is possible to have complete syllogistic systems for logicswhich go are not first-order. We regard this as a proof-of-concept; it would beof interest to get complete systems for richer fragments, such the ones in Pratt-Hartmann [26].

We write ∃≥(x, y) for There are at least as many x as y, and we are interestedin adding these sentences to our fragments. We are usually interested in sentencesin this fragment on finite models. We write |S| for the cardinality of the set S. Thesemantics is that M |= ∃≥(x, y) iff |[[x]]| ≥ |[[y]]| in M.

It might be interesting to note that the compactness theorem fails for thesemantics.

Γ = {∃≥(x1, x2),∃≥(x2, x3), . . . ,∃≥(xn, xn+1), . . .}Suppose towards a contradiction that M were a canonical model for Γ. In particular,M |= Γ. Then |[[x1]]| ≥ |[[x2]]| ≥ . . .. For some n, we have |[[xn]]| = |[[xn+1]]|. ThusM |= ∃≥(xn+1, xn). However, this sentence does not follow from Γ.

Remark In the remainder of this section, Γ denotes a finite set of sentences.

In this section, we consider L(all,∃≥). For proof rules, we take the rules inFigure 4.1 together with the rules for All in Figure 2.1. The system is sound. Thelast rule is perhaps the most interesting, and it uses the assumption that our modelsare finite. That is, if all y are x, and there are at least as many elements in thebigger set y as in x, then the sets have to be the same.

We need a little notation at this point. Let Γ be a (finite) set of sentences. Wewrite x ≤c y for Γ ` ∃≥(y, x). We also write x ≡c y for x ≤c y ≤c x, and x <c y forx ≤c y but x 6≡c y. We continue to write x ≤ y for Γ ` All x are y. And we writex ≡ y for x ≤ y ≤ x.

Proposition 4.1 Let Γ ⊆ L(all,∃≥) be a (finite) set. Let Lit/≡ be the set ofliterals in Γ.

(1) If x ≤ y, then x ≤c y.(2) (Lit/≡,≤c) is a preorder: a reflexive and transitive relation.(3) If x ≤c y ≤ x, then x ≤ y.(4) If x ≤c y, x ≡ x′, and y ≡ y′, then x′ ≤c y′.(5) (Lit/≡,≤c) is pre-wellfounded: a preorder with no descending sequences

in its strict part.

Proof Part (1) uses the first rule in Figure 4.1. In part (2), the reflexivity of≤c comes from that of ≤ and part (1); the transitivity is by the second rule of ∃≥.Part (3) is by the last rule of ∃≥. Part 4 uses part (1) and transitivity. Part 5 isjust a summary of the previous parts. �

41

42 4. CARDINALITY COMPARISONS

All y are x∃≥(x, y)

∃≥(x, y) ∃≥(y, z)

∃≥(x, z)All y are x ∃≥(y, x)

All x are y

Figure 4.1. Rules for ∃≥(x, y) and All.

Theorem 4.2 The logic of Figures 2.1 and 4.1 is complete for L(all,∃≥).

Proof Suppose that Γ |= ∃≥(y, x). Let {∗} be any singleton, and define amodel M by taking M to be a singleton {∗}, and

(4.1) [[z]] ={M if Γ ` ∃≥(z, x)∅ otherwise

We claim that if Γ contains ∃≥(w, v) or All v are w, then [[v]] ⊆ [[w ]]. We only verifythe second assertion. For this, we may assume that [[v]] 6= ∅ (otherwise the result istrivial). So [[v]] = M . Thus Γ ` ∃≥(v, x). So we see that Γ ` ∃≥(w, x). From thiswe conclude that [[w ]] = M . In particular, [[v]] ⊆ [[w ]].

Now our claim implies that M |= Γ. Therefore |[[x]]| ≤ |[[y]]|. And [[x]] = M ,since Γ ` ∃≥(x, x). Hence [[y]] = M as well. But this means that Γ ` ∃≥(y, x), justas desired.

We have shown one case of the general completeness theorem that we are after.In the other case, we have Γ |= All x are y. We construct a model M = MΓ suchthat for all A and B,

(α) [[A]] ⊆ [[B ]] iff A ≤ B.(β) If A ≤c B, then |[[A]]| ≤ |[[B ]]|.

Let Lit/≡/ ≡c be the (finite) set of equivalences classes of atoms in Γ under≡c. This set is then well-founded by the natural relation induced on it by ≤c. It isthen standard that we may list the elements of Lit/≡/ ≡c in some order

[u0], [u2], . . . , [uk]

with the property that if ui <c uj , then i < j. (But if i < j, then it might be thecase that ui ≡c uj .)

We define by recursion on i ≤ k the interpretation [[v]] of all v ∈ [ui]. Supposewe have [[w]] for all j < i and all w ≡c uj . Let

Xi =⋃

j<i,w≡cuj

[[w ]],

and note that this is the set of all points used in the semantics of any atom so far.Let n = 1 + |Xi|. For all v ≡c ui, we shall arrange that [[v]] be a set of size n.

Now [ui] is the equivalence class of ui under ≡c. It splits into equivalenceclasses of the finer relation ≡. For a moment, consider one of those finer classes,say [A]≡. We must interpret each atom in this class by the same set. For this A,let

A =⋃{[[B ]] : (∃j < i)vj ≡c B ≤ A}.

Note that A ⊆ Xi so that |A| < n. We set [[A]] to be A together with n− |A| freshelements. Moving on to the other ≡-classes which partition the ≡c-class of ui, we

4.1. ADDING ∃≥ TO THE BOOLEAN SYLLOGISTIC FRAGMENT 43

do the same thing. We must insure that for A 6≡ A′, the fresh elements added into[[A′]] are disjoint from the fresh elements added into [[A]].

This completes the definition of M. We check that so that conditions (α) and(β) are satisfied. It is easy to first check that for i < j, |[[ui]]| < |[[uj ]]|. It mightalso be worth noting that [[u0]] 6= ∅, so no [[A]] is empty.

For (β), let A ≤c B. Let i and j be such that A ≡c ui and B ≡c uj . IfA ≡c B, then i = j and the construction arranged that [[A]] and [[B ]] be sets of thesame cardinality. If A <c B, then i < j by the way we enumerated the u’s, and so|[[A]]| = |[[ui]]| < |[[uj ]]| = |[[B]]|.

Turning to (α), we argue the two directions separately. Suppose first thatA ≤ B. Then A ≤c B. If A <c B, then [[A]] ⊆ B ⊆ [[B ]]. If A ≡c B, then wealso have A ≡ B. The construction has then arranged that [[A]] = [[B ]]. In theother direction, assume that [[A]] ⊆ [[B ]], and let i and j be such that A ≡c ui andB ≡c uj . On cardinality grounds, i ≤ j. If i < j, then the construction showsthat A ≤ B. (For if not, [[A]] would be a non-empty set disjoint from [[B ]], and thiscontradicts [[A]] ⊆ [[B ]].) Finally (for perhaps the most interesting point), if i = j,then we must have A ≡ B: otherwise, the construction arranged that both A andB have at least one point that is not in the other, due to the “1+” in the definitionof n.

Since (α) and (β), we know that M |= Γ. Recall that we are assuming thatx ≤ y holds semantically from Γ; we need to show that this assertion is derivable inthe logic. But [[x]] ⊆ [[y]] in the model, and so by (α), we indeed have Γ ` x ≤ y. �

Exercise 32 Consider the following two rules:

Some y are y ∃≥(x, y)Some x are x

No y are y∃≥(x, y)

(1) Add the rule on the left to our existing system for L(all, some) and tothe rules in Figure 4.1. Prove that the resulting system is complete forL(all, some,∃≥).

(2) Similarly, add the rule on the right to get a complete logic for the resultinglanguage.

(3) Finally, add both rules to again get a complete logic for the resulting lan-guage.

4.1. Adding ∃≥ to the Boolean Syllogistic Fragment

We now put aside Most and return to the study of ∃≥ from earlier. We closethis chapter with the addition of ∃≥ to the fragment of Section 3.2.

Our logical system extends the axioms of Figure 3.3 by those in Figure 4.2.Note that the last new axiom expresses cardinal comparison. Axiom 4 in Figure 4.2is just a transcription of the rule for No that we saw in Section 32. We do not needto also add the axiom

(Some y are y) ∧ ∃≥(x, y)→ Some x are x

because it is derivable. Here is a sketch, in English. Assume that there are someys, and there are at least as many xs as ys, but (towards a contradiction) that thereare no xs. Then all x’s are ys. From our logic, all ys are xs as well. And since thereare y’s, there are also x’s: a contradiction.


(1) All x are y → ∃≥(y, x)(2) ∃≥(x, y) ∧ ∃≥(y, z)→ ∃≥(x, z)(3) All y are x ∧ ∃≥(y, x)→ All x are y(4) No x are x→ ∃≥(y, x)(5) ∃≥(x, y) ∨ ∃≥(y, x)

Figure 4.2. Additions to the system in Figure 3.3 for ∃≥ sentences.

Notice also that in the current fragment we can express There are more x thany. It would be possible to add this directly to our previous systems.

Theorem 4.3 The logic of Figures 3.3 and 4.2 is complete for assertions∆ |= ϕ in the language of boolean combinations of sentences in L(all, some,no,∃≥).

Proof We need only build a model for a maximal consistent set ∆ in thelanguage of this section. We take the basic sentences to be those of the form All xare y, Some x and y, J is M , J is an x, ∃≥(x, y), or their negations. Let

Γ = {S : ∆ |= S and ϕ is basic}.As in Chapter 3, we need only build a model M |= Γ (see Lemma 3.6). We constructrework thisM such that for all A and B,

(α) [[A]] ⊆ [[B ]] iff A ≤ B.(β) A ≤c B iff |[[A]]| ≤ |[[B ]]|.(γ) For A ≤c B, [[A]] ∩ [[B ]] 6= ∅ iff A ↑ B.

Let P be the set of atoms in Γ. Let≤c and≡c be as in Section 4. Proposition 4.1again holds, and now the quotient P/ ≡c is a linear order due to the last axiom inFigure 4.2. We write it as

[u0] <c [u2] <c · · · <c [uk]

We define by recursion on i ≤ k the interpretation [[v]] of all v ∈ [ui]. The caseof i = 0 is special. If Γ |= No u0 is a u0, then the same holds for all w ≡c u0.In this case, we set [[w ]] = ∅ for all these w. Note that by our fourth axiom inFigure 4.2, all of the other atoms w are such that Γ ` ∃w. In any case, we mustinterpret the atoms in [u0] even when Γ ` (∃u0). In this case, we may take each[[w ]] to be a singleton, with the added condition that v ≡ w iff [[v]] = [[w ]].

Suppose we have [[w]] for all j ≤ i and all w ≡c uj . Let

Xi+1 =⋃

j≤i,w≡cuj

[[w ]],

and note that this is the set of all points used in the semantics of any atom so far.Let m = |Γsome|, and let

(4.2) n = 1 +m+ |Xi+1|For all v ≡c ui+1, we shall arrange that [[v]] be a set of size n.

4.2. DIGRESSION: MOST 45

Now [ui+1] splits into equivalence classes of the finer relation ≡. For a moment,consider one of those finer classes, say [A]≡. We must interpret each atom in thisclass by the same set. For this A, let

A =⋃{[[B ]] : (∃j ≤ i) vj ≡c B ≤ A}.

Note that A ⊆ Xi+1 so that |A| ≤ |Xi+1| for all A ≡c ui+1. We shall set [[A]] to beA plus other points. Let A be the set of pairs {A,B} with B ≡c ui+1 and A ↑Γ B.(This is the same as saying that Some A are B in Γsome.) Notice that if both Aand B are ≡c ui+1 and A ↑Γ B, then {A,B} ∈ A ∩B .) We shall set [[A]] to be A ∪Aplus one last group of points. If C <c ui+1 and A ↑Γ C, then we must pick someelement of [[C ]] and put it into [[A]]. Note that the number of points selected like thisplus |A| is still ≤ |Γsome|. So the number of points so far in [[A]] is ≤ |Γsome|+m.We finally add fresh elements to [[A]] so that the total is n.

We do all of this for all of the other ≡-classes which partition the ≡c-class ofui+1. We must insure that for A 6≡ A′, the fresh elements added into [[A′]] aredisjoint from the fresh elements added into [[A]]. This is needed to arrange thatneither [[A]] nor [[A′]] will be a subset of the other.

This completes the definition of the model. We say a few words about whyrequirements (α)–(γ) are met. First, and easy induction on i shows that if j < i,then |[[uj ]]| < |[[ui]]|. The point is that |[[uj ]]| ≤ |Xi| < |[[ui]]|. The argument for (β)is the same as in the proof of Theorem 4.2. For that matter, the proof of (α) isalso essentially the same. The point is that when A ≡c B and A 6≡ B, then [[A]]and [[B ]] each contain a point not in the other.

For (γ), suppose that A ≤c B. Let i ≤ j be such that A ≡c ui and B ≡ uj .The construction arranged that [[A]] and [[B ]] be disjoint except for the case thatA ↑ B.

So this verifies that (α)–(γ) hold. We would like to conclude that M |= Γ, butthere is one last point: (γ) appears to be a touch too weak. We need to know that[[A]]∩ [[B ]] 6= ∅ iff A ↑ B (without assuming A ≤c B). But either A ≤c B or B ≤c Aby our last axiom. So we see that indeed [[A]] ∩ [[B ]] 6= ∅ iff A ↑ B. �

The next step in this direction would be to consider At least as many x as yare z.

4.2. Digression: Most

The semantics of Most is that Most x are y are that this is true iff |[[x ]]∩ [[y ]]| >12 |[[x ]]|. So if [[x ]] is empty, then Most x are y is false.

As an example of what is going on, consider the following. Assume that All xare z, All y are z, Most z are y, and Most y are x. Does it follow that Most x are y?As it happens, the conclusion does not follow. One can take x = {a, b, c, d, e, f, g},y = {e, f, g, h, i}, and z = {a, b, c, d, e, f, g, h, i}. Then |x| = 7, |y| = 5, |z| = 9,|y∩z| = 5 > 9/5, |x∩y| = 3 > 5/2, but |x∩y| = 3 < 7/2. (Another counter-model:let x = {1, 2, 4, 5}, y = {1, 2, 3}, and z = {1, 2, 3, 4, 5}. Then |y ∩ z| = 3 > 5/2,|y ∩ x| = 2 > 3/2, but |x ∩ y| = 2 6> 4/2.)

On the other hand, the following is a sound rule:All u are x Most x are v All v are y Most y are u

Some u are v


Most x are ySome x are y

Some x are xMost x are x

Most x are y Most x are zSome y are z

Figure 4.3. Rules of Most to be used in conjunction with Some.

Here is the reason for this. Assume our hypotheses and also that towards a contra-diction that u and v were disjoint. We obviously have |v| ≥ |x∩ v|, and the secondhypothesis, together with the disjointness assumption, tells us that |x∩v| > |x∩u|.By the first hypothesis, we have |x ∩ u| = |u|. So at this point we have |v| > |u|.But the last two hypotheses similarly give us the opposite inequality |u| > |v|. Thisis a contradiction.

At the time of this writing, I do not have a completeness result for L(all, some,most).The best that is known is for L(some,most). The rules are are shown in Figure 4.3.We study these on top of the rules in Figure 2.2.

Proposition 4.4 The following two axioms are complete for Most.Most x are yMost x are x

Most x are yMost y are y

Moreover, if Γ ⊆ L(most), x 6= y, and Γ 6|= Most x are y, then there is a modelM of Γ which falsifies Most x are y in which all sets of the form [[u]] ∩ [[v]] arenonempty, and |M | ≤ 5.

Proof Suppose that Γ 6` Most x are y. We construct a model M which sat-isfies all sentences in Γ, but which falsifies Most x are x. There are two cases. Ifx = y, then x does not occur in any sentence in Γ. We let M = {∗}, [[x ]] = ∅, and[[y ]] = {∗} for y 6= x.

The other case is when x 6= y. Let M = {1, 2, 3, 4, 5}, [[x ]] = {1, 2, 4, 5},[[y ]] = {1, 2, 3}, and for z 6= x, y, [[z ]] = {1, 2, 3, 4, 5}. Then the only statement inMost which fails in the model M is Most x are y. But this sentence does not belongto Γ. Thus M |= Γ. �

Theorem 4.5 The rules in Figure 4.3 together with the first two rules inFigure 2.2 are complete for L(some,most). Moreover, if Γ 6|= ϕ, then there is amodel M |= Γ with M 6|= ϕ, and |M | ≤ 6.

Proof Suppose Γ 6` ϕ, where ϕ is Some x are y. If x = y, then Γ contains nosentence involving x. So we may satisfy Γ and falsify ϕ in a one-point model, bysetting [[x ]] = ∅ and [[z ]] = {∗} for z 6= x.

We next consider the case when x 6= y. Then Γ does not contain ϕ, Some y arex, Most x are y, or Most y are x. And for all z, Γ does not contain both Most z arex and Most z are y. Let M = {1, 2, 3, 4, 5, 6}, and consider the subsets a = {1, 2, 3},b = {1, 2, 3, 4, 5}, c = {2, 3, 4, 5, 6}, and d = {4, 5, 6}. Let [[x ]] = a and [[y ]] = d, sothat M 6|= ϕ. For z different from x and y, if Γ does not contain Most z are x, let[[z ]] = c. Otherwise, Γ does not contain Most z are y, and so we let [[z ]] = b. Forall these z, M satisfies whichever of the sentences Most z are x and Most z are y(if either) which belong to Γ. M also satisfies all sentences Most x are z and Most

4.3. ON THE NUMERICAL SYLLOGISTIC 47

y are z, whether or not these belong to Γ. It also satisfies Most u are u for all u.Also, for z, z′ each different from both x and y, M |= Most z are z′. Finally, M

satisfies all sentences Some u are v except for u = x and y = v (or vice-versa). Butthose two sentences do not belong to Γ. The upshot is that M |= Γ but M 6|= ϕ.

Up until now in this proof, we have considered the case when ϕ is Some x arey. We turn our attention to the case when ϕ is Most x are y. Suppose Γ 6` ϕ. Ifx = y, then the second rule of Figure 4.3 shows that Γ 6` Some x are x. So we takeM = {∗} and take [[x ]] = ∅ and for y 6= x, [[y ]] = M . It is easy to check that M |= Γ.

Finally, if x 6= y, we clearly have Γmost 6` ϕ. Proposition 4.4 shows that thereis a model M |= Γmost which falsifies ϕ in which all sets of the form [[u]] ∩ [[v ]] arenonempty. So all Some sentences hold in M. Hence M |= Γ. �

4.3. On the Numerical Syllogistic

I hope to add Ian Pratt-Hartmann’s results in [27] on the complexity of reason-ing with the numerical syllogistic, and also his result in [28] that there is no finitesyllogistic system for it.

Source for this chapter. The material in this chapter comes from Moss [18].

CHAPTER 5

Verbs: R

At this point, we turn to logics with verbs. The basic goal is to have a logicalsystem in which we may represent a valid argument such as

(5.1)Every porter recognizes every porterNo quarterback recognizes any quarterbackNo porter is a quarterback

As in our previous work, we adopt as minimal a syntax as needed. In fact, theEnglish sentences of interest are listed below, using p and q for nouns and r forverbs:

All p are q ∀(p, q)Some p are q ∃(p, q)All p r all q ∀(p,∀(q, r))All p r some q ∀(p,∃(q, r))Some p r all q ∃(p,∀(q, r))Some p r some q ∃(p,∃(q, r))

No p are q ∀(p, q)Some p aren’t q ∃(p, q)All p don’t r all q ≡No p r any q ∀(p,∀(q, r))All p don’t r some q ≡No p r all q ∀(p,∃(q, r))Some p don’t r any q ∃(p,∀(q, r))Some p don’t r some q ∃(p,∃(q, r))

We aim to have a formal language which looks like the sentences you see above.We also have listed the translations of the English sentences into our language R.

Here is how we build the syntax of R: We start with one collection P of unaryatoms (for nouns), just as we did in Chapter 2. But we also start with anothercollection, R of and another of binary atoms. We use these for transitive verbs.

Continuing, we use a complement operation on both unary and binary atoms.As in Chapter 3, we understand this operation to be involutive, so p and p aretaken to be identical, as are r and r. We call the items p, p, r and r literals.

We next introduce set terms by the following grammar:

set terms cpositive p ∀(p, r) ∃(p, r)negative p ∃(p, r) ∀(p, r)

Four of these are new to us, and here is how they are read:

∀(p, r) those who r all p

∃(p, r) those who r some p

∀(p, r) those who fail-to-r all p ≈those who r no p

∃(p, r) those who fail-to-r some p ≈those who don’t r some p

49

50 5. VERBS: R

expression variables syntaxunary atom p, qbinary atom rset term c, d p | ∃(p, r) | ∀(p, r) |

p | ∃(p, r) | ∀(p, r)R sentence ϕ ∀(p, c) | ∃(p, c)R† sentence ϕ ∀(p, c) | ∃(p, c) | ∀(p, c) | ∃(p, c)

Figure 5.1. The syntax of the languages R and R†.

We use letters like c and d for set terms.Finally, we have two kinds of sentences: ∀(p, c), and ∃(p, c), where p is a unary

atom, and c is a set term. This completes the syntax of the language which we callR.

The language R†. This is a larger language which, loosely speaking, allowscomplemented atoms p to be subject noun phrases of sentences. More precisely,sentences such as ∃(p, p) and ∀(p, ∃(c, r)) belong to R† but not to R. We shall notbe so concerned with this larger language R† in this section, and we mention it onlyto alert you that it is coming. We shall see an important negative result on R† inSection 6.1.1, and we shall return to this language in Chapter 8.)

We summarize the syntax of both languages in Figure 5.1.

5.0.1. Semantics. The natural interpretation of verbs is as relations on theunvierse, that is, sets of pairs of individuals. This leads to the following set ofdefinitions.

Definition 5.1 A model M for R is a set M , together with an interpretation[[p]] ⊆M for each noun p ∈ P and an interpretation [[r]] ⊆M2 for each verb r. Weinterpret literals p and r using complements:

[[p]] = M \ [[p]] [[r]] = M2 \ [[r]].

We then interpret set terms by subsets of M in the following way

[[∀(p, s)]] = {m ∈M : for all n ∈ [[p]], (m,n) ∈ [[s]]}[[∃(p, s)]] = {m ∈M : for some n ∈ [[p]], (m,n) ∈ [[s]]}

Finally, we have the definition of truth in a model :

M |= ∀(p, c) iff [[p]] ⊆ [[c]]M |= ∃(p, c) iff [[p]] ∩ [[c]] 6= ∅

Finally, we have definitions of semantic consequence relation Γ |= ϕ, just as we haveseen it for other logical languages.

Example 5.2 We consider a simple case, with one unary atom p, and onebinary atom s. Consider the following model. We set M = {w, x, y, z}, and [[p]] =

5. VERBS: R 51

{w, x, y}. For the relation symbol, s, we take the arrows below:

w //

@@@@@@@@ x

~~~~~~~~~~

��y

OO

oo // zzz

For example, [[p]] = {z}, [[∀(p, s)]] = ∅, [[∃(p, s)]] = M , and [[∃(p, s)]] = M also. Hereare some R-sentences true in M: ∃(p, p) (but note that ∃(p, p) is not in the fragmentR), and also ∀(p,∃(p, s)).

Remark Sentences like “All porters recognize some quarterback” are ambiguousin English. In R, we can represent the subject wide scope reading: every porterhas the property of recognizing some quarterback or other. It is not possible torepresent the object wide scope reading, where there is one particular quarterbackwho every porter recognizes.

5.0.2. Translation of L into Boolean modal logic. We shall write R forthe following version of Boolean modal logic. R has each p ∈ P as an atomicproposition, and it has two modal operators, �s and �s, one for each s ∈ R. Thesyntax of R is given by

ϕ := p ¬ϕ ϕ ∧ ψ �sϕ �sϕ

The language is interpreted on the same kind of structures that we have beenusing for R. Then [[p]] is given for all atoms p, and we also set [[¬ϕ]] = M \ [[ϕ]],[[ϕ ∧ ψ]] = [[ϕ]] ∩ [[ψ]], and

[[�sϕ]] = {x ∈M : for all y such that [[s]](x, y), y ∈ [[ϕ]]}[[�sϕ]] = {x ∈M : for all y such that not [[s]](x, y), y ∈ [[ϕ]]}

We write Γ |= ϕ to mean that for all structures M and all x ∈M , if x ∈ [[ψ]] for allψ ∈ Γ, then again x ∈ [[ϕ]].

Let R0 be the set of sentences of R which do not involve constants. We translateR0 into R. First, for each set term c, we define a sentence c∗ of R. The definitionis: p∗ = p, p∗ = ¬p,

∀(c, s)∗ = �s¬c∗∃(c, s)∗ = ¬�s¬c∗

∀(c, s)∗ = �s¬c∗∃(c, s)∗ = ¬�s¬c∗

An easy induction shows that [[c]] = [[c∗]] for all set terms c. Then we translate∀(c, d) to ∀(c, d)∗ and ∃(c, d) to ∃(c, d)∗:

∀(c, d)∗ = �s(c∗ → d∗) ∧�s(c∗ → d∗)∃(c, d)∗ = ¬∀(c, d)∗

where s is an arbitrary element of R. Then for all sentences ϕ of R0, and all modelsM,

M |= ϕ iff [[ϕ∗]] = M iff [[ϕ∗]] 6= ∅.It follows easily from this that for Γ ∪ {ϕ} a set of R0 sentences, Γ |= ϕ in thesemantics of R iff Γ∗ |= ϕ∗ in the semantics of R.

52 5. VERBS: R

∃(p, q) ∀(q, c)∃(p, c)

(D1)∀(p, q) ∀(q, c)

∀(p, c)(B)

∀(p, q) ∃(p, c)∃(q, c)

(D2)∀(p, p)

(T)∃(p, c)∃(p, p)

(I)

∀(q, c) ∃(p, c)∃(p, q)

(D3)∀(p, p)∀(p, c)

(A)∃(p, ∃(q, t))∃(q, q)

(II)

∀(p,∀(q′, t)) ∃(q, q′)∀(p, ∃(q, t))

(∀∀)∃(p,∃(q, t)) ∀(q, q′)

∃(p, ∃(q′, t))(∃∃)

∀(p,∃(q, t)) ∀(q, q′)∀(p, ∃(q′, t))

(∀∃)

[ϕ]....⊥ϕ RAA

Figure 5.2. The logical system R for R. Notice the rule of (RAA).

5.1. A Logic for R

This section presents our logical system R for R and shows that it is sound andcomplete. Our rules appear in Figure 5.2. We remind the reader that p and q rangeover unary atoms, c over c-terms, and t over binary literals.

Rules (D1), (D2), (D3), (B), (A), (T) and (I) are natural generalizations oftheir namesakes in S. In contrast, (∀∀), (∃∃), (∀∃) and (II) express genuinely rela-tional logical principles. In some settings, these last rules are called monotonicityprinciples.

In addition to syllogistic rules, R contains the rule of reductio ad absurdum(RAA), whereby one derives ϕ from Γ by temporarily adding the “negation” of ϕto Γ, thereby obtaining Γ∪ {ϕ}, and then deriving a contradiction from this largerset Γ∪{ϕ}. The idea is that This rule (RAA) is not the same as ex falso quodlibet,the rule we have called (X) in Chapter 3. To see why, we mention the semanticjustifications for both rules.

But before we can do this, let us clarify what we mean by “negation.” We donot have a negation symbol in R, but we do have a semantic negation. It is defined

5.1. A LOGIC FOR R 53

as follows:expression syntax negationset term c p p

p p∃(p, r) ∀(p, r)∀(p, r) ∃(p, r)∃(p, r) ∀(p, r)∀(p, r) ∃(p, r)

R sentence ϕ ∀(p, c) ∃(p, c)∃(p, c) ∀(p, c)

Note that p = p, c = c and ϕ = ϕ.Here is why we call ϕ and ϕ semantic negations: for all models M, M |= ϕ iff

M 6|= ϕ.We also define ⊥ to be any contradiction. In R, this means a sentence of the

form ∃(p, p).Here are the ideas behind (RAA) and (X):

(5.2) If Γ ∪ {ϕ} |= ⊥, then Γ |= ϕ.(5.3) If Γ |= ⊥, then Γ |= ϕ

The (RAA) justification changes the assumptions in the derivation from Γ∪{ϕ} toΓ. This is the exact difference between reductio ad absurdum and ex falso quodlibet.

We must incorporate this observation into our proof system, and we do so in(RAA). We now can display this rule in natural-deduction-style:

[ϕ]....⊥ϕ RAA

Definition 5.3 A proof tree over Γ is a pair (T,Can), where finite tree T

whose nodes are labeled with sentences, and Can is a set of labeled leaves of T

called the canceled leaves. Each node n in the tree must satisfy one of the followingconditions:

(1) n is a leaf labeled by an element of Γ.(2) n comes from its parent(s) by an application of a rule other than (RAA).(3) n ∈ Can, and there is some node m on the path from n to the root such

that the parent of m is labeled ⊥, and the label of m is the semanticnegation of the label of n.

We write Γ ` ϕ if there is a proof tree T with ϕ at the root whose uncanceledleaves all belong to Γ.

Example 5.4 Here is a derivation showing that

∀(x, x) ` ∀(y,∀(x, r))In words, if there are no xs, then all y’s have any relation whatsoever to all of them.Note that we cannot simply use the rule

∀(p, p)∀(p, c)

54 5. VERBS: R

Instead, we use RAA:[∃(y,∃(x, r))]1

∃(x, x) ∀(x, x)∃(x, x)

∀(y,∀(x, r)) (RAA)1

This example also shows that we indicate canceled leaves using bracketing, and wealso use numerical superscripts to tell which application of (RAA) has canceledwhich leaves.

Example 5.5 Here is a derivation showing that

∀(x,∀(y, r)),∀(p, y),∃(p, q) ` ∀(x,∃(y, r))

∀(x, ∀(y, r))∃(p, q) ∀(p, y)∃(p, y)

(D1)

∀(x, ∃(p, r))(∀∀)

∀(p, y)∀(x, ∃(y, r))

(∀∃)

Example 5.6 Here is a formal proof showing that

∀(x, x) ` ∀(y,∀(x, r))In words, if there are no xs, then all y’s have any relation whatsoever to all of them.Note that this does not follow from the rule (A). But here is a derivation:

[∃(y,∃(x, r))]1

∃(x, x)(II)

∀(x, x)∃(x, x)

(D1)

∀(y,∀(x, r)) (RAA)1

As we shall see below in Theorem 6.4, the special status of (RAA) is essen-tial: indirect syllogistic systems are in general more powerful than direct syllogisticsystems.

Although (RAA) may be used at any point in a derivation, our proof systemin this section has the extra property that if Γ ` ϕ using (RAA), then there is aproof using (RAA) at most once. In these notes we are not going to be concernedwith this stronger property, and for more on it see [29].

5.1.1. Extending consistent sets to consistent and complete sets. Atthis point, we mention a few features of the logical system which play a role in ourcompleteness proof.

Lemma 5.7 Suppose that Γ ∪ {ϕ} ` ψ and also Γ ∪ {ϕ} ` ψ. Then Γ ` ψ.

Proof The hypotheses imply that Γ ∪ {ϕ,ψ} ` ⊥. Also Γ ∪ {ϕ,ψ} ` ⊥,and by (RAA) we also have Γ ∪ {ψ} ` ϕ. Let T1 be a proof tree showing thatΓ ∪ {ϕ,ψ} ` ⊥, let T2 be a proof tree showing that Γ ∪ {ψ} ` ϕ. Take T1 andreplace every leaf labeled ϕ with a copy of T2. This gives a proof tree with root⊥ and whose leaves are labeled in Γ ∪ {ψ} ` ⊥. Using (RAA) again, we see thatΓ ` ψ. �


Definition 5.8 A set Γ of R-sentences is inconsistent if Γ ` ⊥. Otherwise, Γis consistent. Γ is complete if for all ϕ, either ϕ ∈ Γ or ϕ ∈ Γ.1

Lemma 5.9 Every consistent set Γ has a complete and consistent superset Γ∗.

Proof We are going to use Zorn’s Lemma2. Let C be the set of consistent∆ ⊇ Γ, ordered by inclusion. A chain in C is a set X of elements of C with theproperty that if X contains both ∆1 and ∆2, then either ∆1 ⊆ ∆2 or ∆2 ⊆ ∆1. Touse Zorn’s Lemma, we must check that every chain X in C has an upper bound.This is immediate: we take the union

⋃X. The important point is that this is a

consistent set, and this point follows from the fact that proofs are finite.Zorn’s Lemma applies and gives us a maximal element Γ∗ of C. Γ∗ is thus

a consistent superset of Γ. We need only check that it is complete. For if not,suppose that neither ϕ nor ϕ belong to Γ∗. By maximality, both Γ∗ ∪{ϕ} ` ⊥ andΓ∗ ∪ {ϕ} ` ⊥. By Lemma 5.7, Γ∗ ` ⊥. This contradicts the consistency of Γ∗, andso we conclude that Γ∗ is indeed complete. �

Remark Lemmas 5.7 and 5.9 have nothing to do with the particular proof system,and indeed they hold for logical systems which have (RAA). So versions of theseresults hold for all of the logical systems in the rest of these notes.

5.1.2. Model construction.

Theorem 5.10 For Γ ∪ {ϕ} ⊆ R, Γ |= ϕ iff Γ ` ϕ in R.

Proof The soundness is an easy induction on derivations. For the complete-ness, we need only show that a consistent set Γ is satisfiable. (In more detail,suppose we know the general fact that every consistent set is satisfiable. Supposethat Γ |= ϕ. Then Γ ∪ {ϕ} has no models, so by the general fact, Γ ∪ {ϕ} is in-consistent. Thus Γ ∪ {ϕ} ` ⊥, and so by (RAA), Γ ` ϕ.) By Lemma 5.9, we mayassume that Γ is R-complete.

For such a consistent and R-complete set Γ, we shall define a model M = M(Γ)as follows: we let

(5.4) M = {x1, x2 : Γ ` ∃(x, x)} ∪ {{p, q} : p 6= q and Γ ` ∃(p, q)}.We assume the union in (5.4) is disjoint, and we call the elements {p, q} pair-elements. So we have two copies of every variable x such that Γ entails the existenceof x, and also pair elements {p, q} corresponding to sentences of the form ∃(p, q)which are provable from Γ and such that p 6= q. Our semantics will insure that thepair-element {p, q} belongs to [[p]] ∩ [[q]], and so this element will witness the truthof ∃(p, q) in the model which we are constructing.

1Please do not confuse the completeness of a set of sentences with the completeness of the

logical system under discussion.2If you are not comfortable, it is also possible to prove the result in the case that the overall

language is countable by a step-by-step procedure. Here is a sketch. One lists the sentences

in the language in a list as ϕ0, ϕ1, . . ., ϕn, . . .. One also constructs an infinite sequence Γ =Γ0 ⊆ · · ·Γn ⊆ · · · of sets of sentences. Γn+1 is either Γn ∪ {ϕn} or else Γn ∪ {ϕn}, whichever

is consistent. One must be consistent, lest Γn be inconsistent by Lemma 5.7. And thenS

n Γn

would be as desired.

56 5. VERBS: R

x1 //

((PPPPPP y1

x2 //

66nnnnnn y2

(a)

x1 // y1

x2 //

66nnnnnn y2

(b)

x1 y1

x2 //

66nnnnnn y2

(c)

x1 // y1

x2

66nnnnnn y2

(d)

x1 y1

x2

66nnnnnn y2

(e)

x1 y1

x2 y2

(f)

Use (a) when Γ ` ∀(x,∀(y, r));(b) when Γ 6` ∀(x,∀(y, r)), but Γ ` ∀(x,∃(y, r)), and Γ ` ∃(x, ∀(y, r));(c) when Γ 6` ∀(x,∀(y, r)), but Γ ` ∃(x,∀(y, r)) and Γ 6` ∀(x, ∃(y, r));(d) when Γ 6` ∀(x,∀(y, r)), but Γ ` ∀(x, ∃(y, r)) and Γ 6` ∃(x, ∀(y, r));(e) when Γ 6` ∀(x,∃(y, r)) and Γ 6` ∃(x, ∀(y, r)), but Γ ` ∃(x, ∃(y, r));and (f) when none of (a)–(e) hold.

Figure 5.3. The relation RΓx,y.

The unary atoms x are interpreted in our models as follows:

(5.5)wi ∈ [[x]] iff Γ ` ∀(w, x){p, q} ∈ [[x]] iff Γ ` ∀(p, x), or Γ ` ∀(q, x)

For the binary atoms r, we first need a definition. For two unary atoms x and y(possibly identical) we shall need a certain set shown in Figure 5.3.

RΓx,y ⊆ {x1, x2} × {y1, y2}.

The semantics [[r]] in M(Γ) is then given as follows:

(5.6)xi[[r]]yj iff xi → yj according to RΓ

x,y in Figure 5.3{x, y}[[r]]w2 iff Γ ` ∀(x, ∀(w, r)) or Γ ` ∀(x,∀(y, r)){x, y}[[r]]w1 iff {x, y}[[r]]w2, or Γ ` ∀(x, ∃(w, r)), or Γ ` ∀(y,∃(w, r))u1[[r]]{x, y} iff Γ ` ∀(u,∀(x, r)) or Γ ` ∀(u,∀(y, r))u2[[r]]{x, y} iff u1[[r]]{x, y}, or Γ ` ∃(u,∀(x, r)), or Γ ` ∃(u,∀(y, r)){x, y}[[r]]{p, q} iff Γ ` ∀(x, ∀(p, r)), or Γ ` ∀(x, ∀(q, r)),

or Γ ` ∀(y,∀(p, r)), or Γ ` ∀(y,∀(q, r))

Proposition 5.11 Concerning the relation [[r]]:(1) x1[[r]]y2 iff Γ ` ∀(x,∀(y, r)).(2) x1[[r]]y1 iff Γ ` ∀(x,∀(y, r)) or Γ ` ∀(x, ∃(y, r)).(3) x2[[r]]y2 iff Γ ` ∀(x,∀(y, r)) or Γ ` ∃(x, ∀(y, r)).(4) x2[[r]]y1 iff for some i and j, xi[[r]]yj.


Lemma 5.12 Let Γ be an arbitrary set of R-sentences. Let ϕ be a positivesentence. If Γ ` ϕ, then M(Γ) |= ϕ.

Proof We argue by cases on ϕ.

Case 1: ϕ is ∀(x, y). If zi ∈ [[x]], then z ≤ x. By monotonicity, z ≤ y. Sozi ∈ [[y]]. If {p, q} ∈ [[x]], then without loss of generality ∀(p, x). Again, we see that{p, q} ∈ [[y]].

Case 2: ϕ is ∃(x, y). This time {x, y} is an element of our model. Our logiccontains the identity axioms All x are x. By our semantics, {x, y} ∈ [[x]]∩ [[y]]. Thusthe model overall satisfies Some x are y.

Case 3: ϕ is ∀(x, ∀(y, r)). Let zi ∈ [[x]] and wj ∈ [[y]]; so we have z ≤ x andw ≤ y. By monotonicity, Γ ` ∀(z,∀(w, r)). So zi[[r]]wj for 1 ≤ i, j ≤ 2. We alsomust consider pair-elements {p, q} ∈ [[x]]. Without loss of generality, assume ∀(p, x).By monotonicity, Γ ` ∀(p,∀(w, r)). Again, our semantics insures that {p, q}[[r]]wjfor all j. There are two other cases which are argued similarly; these show that ifzi ∈ [[x]] and {u,w} ∈ [[y]], then zi[[r]]{u,w}; also, if {p, q} ∈ [[x]] and {u,w} ∈ [[y]],then {p, q}[[r]]{u,w}.

Case 4: ϕ is ∀(x, ∃(y, r)). In this case, we can assume that [[x]] 6= ∅. That is,Γ ` ∃(x, x). Then Γ ` ∃(y, y) as well. We shall show that every element of [[x]]is related to y1. Let zi ∈ [[x]], so that z ≤ x. By monotonicity, Γ ` ∀(z,∃(y, r)).Then by Proposition 5.11, parts (2) and (4), we indeed have zi[[r]]y1 and zi[[r]]y2.Further, let {p, q} ∈ [[x]]. Without loss of generality, p ≤ x. By monotonicity,Γ ` ∀(p,∃(y, r)). Then by the definition of [[r]], {p, q}[[r]]y1.

Case 5: ϕ is ∃(x, ∀(y, r)). By rule (I) of our logic, Γ ` ∃(x, x). Let wj ∈ [[y]].Then Γ ` ∀(w, y). Hence Γ ` ∃(x, ∀(w, r)). By Proposition 5.11, parts (3) and(4), x2[[r]]w1, and also x2[[r]]w2. We must also consider pair-elements of [[y]]. Let{p, q} ∈ [[y]] so that Γ ` ∃(p, q); and assume Γ ` ∀(p, y). By monotonicity, Γ `∃(x,∀(p, r)). By construction, x2[[r]]{p, q}. We conclude that x2 is the requiredwitness to ∃(x, ∀(y, r)).

Case 6: ϕ is ∃(x,∃(y, r)). Here both Γ ` ∃(x, x) and also ∃(y, y). By Proposi-tion 5.11, part (4), x2[[r]]y1. So M |= ∃(x, ∃(y, r)). �

Lemma 5.13 Let Γ be complete and consistent. Let ϕ be a positive sentence.If M(Γ) |= ϕ, then ϕ ∈ Γ.

Proof We argue by cases on ϕ. In each case, we assume that M(Γ) |= ϕ, andwe then show Γ ` ϕ. Since Γ is complete, we indeed have ϕ ∈ Γ.

One fact which we shall use frequently is that if [[x]] 6= ∅ in M(Γ), then Γ `∃(x, x). For if yj ∈ [[x]], they by the structure of the model, Γ ` ∃(y, y) and also∀(y, x). Similar considerations apply to a pair-element {u,w} ∈ [[x]].

Case 1: ϕ is ∀(x, y). We may assume that Γ ` ∃(x, x); if not, then Γ ` ϕ using(A). And then the structure of the model easily tells us that Γ ` ϕ.

Case 2: ϕ is ∃(x, y). The argument is very close to what we do concerning∃(x, ∃(y, r)) in Case 6 below.

Case 3: ϕ is ∀(x, ∀(y, r)). By completeness, either Γ ` ∀(x, x); or Γ ` ∀(y, y);or else both Γ ` ∃(x, x) and Γ ` ∃(y, y). In the first case, Γ ` ϕ using the rule (A).In the second case, Γ ` ϕ as we have seen in Example 5.6. In the last, consider

58 5. VERBS: R

M = M(Γ). By Lemma 5.12, M(Γ) |= ϕ. In M, x1 ∈ [[x]] and y2 ∈ [[y]]. SinceM |= ϕ, x1[[r]]y2. By Proposition 5.11, part (1), Γ ` ∀(x,∀(y, r)).

Case 4: ϕ is ∀(x, ∃(y, r)). As in the previous case, we may assume that Γ `∃(x, x). Consider M = M(Γ). In the model, [[x]] 6= ∅ by definition of the model, andbecause M |= ϕ, the same is true of y. In particular, x1 is related to some elementof [[y]]. Say x1[[r]]zj where z ≤ y. If j = 1, then by Proposition 5.11, part (2), weΓ ` ∀(x, ∃(z, r)) or Γ ` ∀(x, ∀(z, r)). In the first case, we are done by monotonicity.So we shall assume that Γ ` ∀(x,∀(z, r)). Since zj belongs to the model Γ ` ∃(z, z).Therefore Γ ` ∀(x, ∃(z, r)), and as above we are done.

Now if j = 2, then by Proposition 5.11, part (1), we have Γ ` ∀(x,∀(z, r)).Exactly as above, we reason that Γ ` ∀(x, ∃(y, r)).

The last possibility is that x1[[r]]{p, q}, where Γ ` ∃(p, q). Without loss ofgenerality, suppose that Γ ` ∀(x,∀(q, r)) and also that Γ ` ∀(p, y). The derivationin Example 5.5 shows that Γ ` ∀(x,∃(y, r)).

Case 5: ϕ is ∃(x, ∀(y, r)). In this case, M |= ∃(x, x), and so Γ ` ∃(x, x). Wemay assume that Γ ` ∃(y, y), since otherwise we get Γ ` ϕ as in Case 3.

Some element of [[x]] is related to y2. In particular, Γ ` ∃(x, x). Suppose firstthat x1[[r]]y2. By Proposition 5.11, part (1), Γ ` ∀(x, ∀(y, r)). And as Γ ` ∃(x, x),we have Γ ` ∃(x,∀(y, r)).

Suppose next that it is x2 ∈ [[x]] which is related by [[r]] to y2. By Proposi-tion 5.11, part (3), we have two alternatives: Γ ` ∃(x,∀(y, r)) (and we are done);or else Γ ` ∀(x, ∀(y, r)). In this last case, we have already seen Γ ` ∃(x, x), and wenow have Γ ` ∃(x, ∀(y, r)).

Finally, suppose that {p, q} ∈ [[x]] and also that {p, q}[[r]]y2. Either Γ `∀(p, ∀(y, r)) or Γ ` ∀(q,∀(y, r)). Since this pair-element {p, q} belongs to our model,Γ ` ∃(p, q). So either Γ ` ∃(p,∀(y, r)) or Γ ` ∃(q,∀(y, r)). But also, either p ≤ x orq ≤ x. Without loss of generality, p ≤ x. By monotonicity, Γ ` ∃(x,∀(y, r)).

Case 6: ϕ is ∃(x,∃(y, r)). In our final case, we must have Γ ` ∃(x, x); alsoΓ ` ∃(y, y).

Suppose that z1 ∈ [[x]] and w1 ∈ [[y]] are related by [[r]]. Thus z ≤ x and w ≤ y.By Proposition 5.11, part (2), Γ ` ∀(z,∃(w, r)). Since z1 belongs to our model,Γ ` ∃(z, z). Thus Γ ` ∃(z,∃(w, r)). And by monotonicity again, Γ ` ∃(x, ∃(y, r)).

Next, suppose that z1 ∈ [[x]] and w2 ∈ [[y]] are related by [[r]]. The work here isquite similar, and we omit all the details. The same goes for the case of z2 ∈ [[x]]and w1 ∈ [[y]], and also for the case of z2 ∈ [[x]] and w2 ∈ [[y]]. There are severalmore cases, owing to the possibility that the witnesses to ∃(x, ∃(y, r)) might includepair-elements. These are all routine, and we omit these details.

This concludes the proof. �

At this point we conclude the proof of the main result in this section, Theo-rem 5.10. We began with a set Γ which we assume to be complete and consistent;the goal is to build a model of Γ. Consider M = M(Γ) as we defined it in (5.4),(5.5), and (5.6). By Lemma 5.12, M satisfies the positive sentences in Γ. We claimthat M satisfies the negative sentences in Γ as well. For suppose that ψ is positiveand ψ belongs to Γ. If M 6|= ψ, we would have M |= ψ. By Lemma 5.12, Γ ` ψ.But then Γ is inconsistent, a contradiction. The claim shown, we now see thatM |= Γ. �

5.2. THERE IS NO FINITE SYLLOGISTIC LOGIC FOR R 59

5.2. There is no Finite Syllogistic Logic for R

In the previous section, we presented a logic for R. The reader might wellwonder whether we need (RAA) or some such device, or whether the kinds of logicswhich we have previously considered were sufficient. Now is the time to answerthis. We need to formulate exactly what we mean by a purely syllogistic system,and then prove that there are no purely syllogistic systems for R which are finite,sound, and complete.

5.2.1. General Definitions on Syllogistic Proof Systems. Let F be asyllogistic fragment. A derivation relation |∼ in F is a subset of P(F) × F, whereP(F) is the power set of F. For readability, we write Θ |∼ θ instead of 〈Θ, θ〉 ∈ |∼.We say that |∼ is sound if Θ |∼ θ implies Θ |= θ, and complete (for F) if Θ |= θimplies Θ |∼ θ.

Definition 5.14 Let F be a syllogistic fragment, what we have been calling alogical language in these notes. We employ the following terminology. A syllogisticrule (sometimes, simply: rule) in F is a pair Θ/θ, where Θ is a finite set (possiblyempty) of F-sentences, and θ an F-sentence. We call Θ the antecedents of the rule,and θ its consequent. The rule Θ/θ is sound if Θ |= θ.

A substitution is a function g = g1 ∪ g2, where g1 : P → P and g2 : R → R.If θ is an F-formula, denote by g(θ) the F-formula which results by replacing anyatom (unary or binary) in θ by its image under g, and similarly for sets of formulas.An instance of a syllogistic rule Θ/θ is the syllogistic rule g(Θ)/g(θ), where g is asubstitution.

A syllogistic proof system is a set of syllogistic rules.

We generally display rules in ‘natural-deduction’ style. For example,

(5.7)∀(q, o) ∃(p, q)

∃(p, o)∀(q, o) ∃(p, q)

∃(p, o)where p, q and o are unary atoms, are syllogistic rules in S, corresponding to thetraditional syllogisms Darii and Ferio, respectively.

Syllogistic rules which differ only with respect to re-naming of unary or bi-nary atoms will be informally regarded as identical, because they have the sameinstances. Thus, the letters p, q and o in (5.7) function, in effect, as variablesranging over unary atoms. It is often convenient to display syllogistic rules usingvariables ranging over other types of expressions, understanding that these are justmore compact ways of writing finite collections of syllogistic rules in the officialsense. For example, the two rules (5.7) may be more compactly written

∀(q, l) ∃(p, q)∃(p, l)

(D1)

where p and q range over unary atoms, but l ranges over unary literals.Fix a syllogistic fragment F, and let X be a set of syllogistic rules in F. Define

`X to be the smallest derivation relation in F satisfying:(1) if θ ∈ Θ, then Θ `X θ;(2) if {θ1, . . . , θn}/θ is a rule in X, g a substitution, Θ = Θ1 ∪ · · · ∪ Θn, and

Θi `X g(θi) for all i (1 ≤ i ≤ n), then Θ `X g(θ).

60 5. VERBS: R

It is simple to show that the derivation relation `X is sound if and only if each rulein X is sound.

Informally, we imagine chaining together instances of the rules in X to constructderivations, in the obvious way; and we refer to the resulting proof system as thedirect syllogistic system defined by X. We generally display derivations in natural-deduction style, as we have been doing throughout these notes.

Theorem 5.15 There exists no finite set X of syllogistic rules in R such that`X is both sound and complete.

Proof Let X be any finite set of syllogistic rules for R, and suppose `X issound. We show that it is not complete. Since X is finite, fix n ∈ N greater thanthe number of antecedents in any rule in X.

Let p1, . . . , pn be distinct unary atoms and r a binary atom. Let Γ be thefollowing set of R-formulas:

∀(pi,∃(pi+1, r)) (1 ≤ i < n)(5.8)

∀(p1,∀(pn, r))(5.9)

∀(p, p) (p ∈ P)(5.10)

∀(pi, pj) (1 ≤ i < j ≤ n)(5.11)

and let γ be the R-formula ∀(p1,∃(pn, r)). Observe that Γ |= γ. To see this, letM |= Γ. If pM

1 = ∅, then trivially M |= γ; on the other hand, if pM1 6= ∅, a simple

induction using formulas (5.8) shows that pMi 6= ∅ for all i (1 ≤ i ≤ n), whence

M |= γ by (5.9).For 1 ≤ i < n, let ∆i = Γ \ {∀(pi,∃(pi+1, r))}.

Claim If ϕ ∈ R and ∆i |= ϕ, then ϕ ∈ Γ.

It follows from this claim that Γ 6`X γ. For, since no rule of X has morethan n − 1 antecedents, any instance of those antecedents contained in Γ must becontained in ∆i for some i. Let δ be the corresponding instance of the consequentof that rule. Since `X is sound, ∆i |= δ. By Claim 5.2.1, δ ∈ Γ. By inductionon the number of steps in derivations, we see that no derivation from Γ leads to aformula not in Γ. But γ 6∈ Γ.

Proof of Claim Certainly, ∆i has a model, for instance the model Mi givenby:

(5.12)p1

GF ED��

// p2 // · · · // pi pi+1 // · · · // pn

Here, A = {p1, . . . , pn}, pMij = {pj} for all j (1 ≤ j ≤ n), and rMi is indicated

by the arrows. All other atoms (unary or binary) are assumed to have emptyextensions. Note that there is no arrow from pi to pi+1.

We consider the various possibilities for ϕ in turn and check that either ϕ ∈ Γor there is a model of ∆i in which ϕ is false.

(i) ϕ is of the form ∀(p, p). Then ϕ ∈ Γ by (5.10).

(ii) ϕ is not of the form ∀(p, p), and involves at least one unary or binary atomother than p1, . . . , pn, r. In this case, it is straightforward to modify Mi so as to

5.3. AN ALTERNATIVE FOR THE POSITIVE FRAGMENT 61

obtain a model M′i of ∆i such that M′i 6|= ϕ. Henceforth, then, we may assume thatϕ involves no atoms other than p1, . . . , pn, r.

(iii) ϕ is of the form ∀(pj , pk). If j = k, then ϕ ∈ Γ, by (5.10). If j 6= k, thenMi 6|= ϕ by inspection.

(iv) ϕ is of the form ∀(pj , pk). If j = k, then Mi 6|= ϕ, since pMij 6= ∅. If j 6= k, then

ϕ ∈ Γ, by (5.11) and the identification ∀(pj , pk) = ∀(pk, pj).

(v) ϕ is of the form ∀(pj ,∀(pk, r)). If j = 1 and k = n, then ϕ ∈ Γ, by (5.9). So wemay assume that either j > 1 or k < n, in which case, k 6= j+1 implies Mi 6|= ϕ, byinspection. Hence, we may assume that ϕ = ∀(pj ,∀(pj+1, r)), with j < n. Let Bi,j

be the structure obtained from Mi by adding a second point b to the interpretationof pj+1, and to which pj is not related by r. In pictures:

p1

GF ED��

// p2 // · · · pj // pj+1 // pj+2 // · · · // pi pi+1 // · · · pn

b

::uuuuuu

(This picture shows j + 2 < i. Similar pictures are possible in all other cases.) Byinspection, Bi,j |= ∆i, but Bi,j 6|= ϕ.

(vi) ϕ is of the form ∀(pj ,∃(pk, r)). If k = j + 1, then ϕ ∈ Γ, by (5.8). Moreover,if k 6= j + 1, then, unless j = 1 and k = n, Mi 6|= ϕ, by inspection. Hence we mayassume ϕ = ∀(p1,∃(pn, r)). Let Ci be the structure:

p1 // p2 // · · · // pi,

with pCij = ∅ for all j (i < j ≤ n). Then Ci |= ∆i, but Ci 6|= ϕ.

(vii) ϕ is of either of the forms ∀(pj ,∀(pk, r)), ∀(pj ,∃(pk, r)). Define M′′i to be likeMi except that rM′′i additionally contains the pair of points 〈pj , pk〉. By inspection,M′′i |= ∆i, but M′′i 6|= ϕ.

(viii) ϕ is of the form ∃(p, c). Let M0 be a structure over any domain in whichevery atom has empty extension. Then M0 |= ∆i, but M0 6|= ϕ. �

This also completes the proof of Theorem 5.15. �

5.3. An Alternative for the Positive Fragment

We now present an alternative logic for the positive fragment of R, the set ofsentences with no complementation on either nouns or verbs. The fragment cannothave a finite axiomatization by purely syllogistic rules, and we have seen a logic reason!which (RAA) gives us the capability of handling the full R. The point here is to goa different route, by proposing a logic with no (RAA), indeed not even with ex falsoquodlibet, and instead with infinitely many rules. (However, the rules themselvesare a regular set.)

Our logic is listed in Figure 5.4.We write x ; y as a shorthand for a sequence of sentences such as

(5.13) ∀(x, ∃(c1, r)),∀(c1,∃(c2, r)), . . . ,∀(cn,∃(y, r)).

62 5. VERBS: R

∃(p, q) ∀(q, c)∃(p, c)

(D1)∀(p, q) ∀(q, c)

∀(p, c)(B)

∀(p, q) ∃(p, c)∃(q, c)

(D2)∀(p, p)

(T)∃(p, c)∃(p, p)

(I)

∃(p,∃(q, t))∃(q, q)

(II)

∀(p,∀(q′, t)) ∃(q, q′)∀(p, ∃(q, t))

(∀∀)∃(p,∃(q, t)) ∀(q, q′)

∃(p,∃(q′, t))(∃∃)

∀(p,∃(q, t)) ∀(q, q′)∀(p,∃(q′, t))

(∀∃)

∀(x1,∃(x2, r)) · · · ∀(xn,∃(z, r)) ∀(x,∀(z, r)) ∀(z, y)∀(x1,∃(y, r))

(H)

∀(y,∃(y1, r)) . . . ∀(yn,∃(z, r)) ∀(z,∀(y, r)) ∀(z, x) ∃(x, x)∃(x,∀(y, r))

(HH)

Figure 5.4. A logical system for the positive fragment of R whichincludes neither ex falso quodlibet nor (RAA). The system is notaxiomatizable using only finitely many rules.

So the last two rules in Figure 5.4 may be abbreviated thus:x ; z ∀(x, ∀(z, r)) ∀(z, y)

∀(x, ∃(y, r))(H)

y ; z ∀(z,∀(y, r)) ∀(z, x) ∃(x, x)∃(x, ∀(y, r))

(HH)

Here is the idea: if x ; y and if a model M has x’s, then M must also havey’s. So we could read x ; y as “if x is non-empty, so is y.”

In the definition of ; in (5.13), we allow n = 0. In this case, the sequencewould be empty. So x ; x for all x. Taking n = 0 in (J) corresponds to thecase when x and y are identical. This instance of (HH) would boil down to thecheck this pointinference of ∀(x,∃(x, r)) from ∀(x,∀(x, r)). (HH) would reduce to the inference of∃(x,∀(y, r)) from ∀(y,∀(y, r)), ∀(y, x) and ∃(x, x).

We write Γ ` x ; y to mean that there is a sequence of sentences as above,each of which is derivable from Γ. However, we frequently drop the Γ and just writex ; y.

A family of model constructions. Let Γ be a set of sentences in the fragmentof this section. We write E(Γ) for {u : Γ ` ∃(u, u)}. We also say that a set S ofvariables is Γ-existentially closed (EC) if whenever u ∈ S and Γ ` u ; w, thenw ∈ S as well. For example, E(Γ) is Γ-EC, and the union of two Γ-EC sets is againΓ-EC.

5.3. AN ALTERNATIVE FOR THE POSITIVE FRAGMENT 63

Let Γ be a set of sentences in our fragment, let S be Γ-EC assume that E(Γ) ⊆ S.We define a model M = M(Γ, S) by taking

(5.14) M = {x1, x2 : x ∈ S} ∪ {{p, q} : p 6= q and Γ ` ∃(p, q)}.(This differs from the construction in the previous section in that the elements inthe first part of the union are those from S; so for xi ∈ M , we need not haveΓ ` ∃(x, x).) But the rest of the definition of the structure is exactly the same asin our previous construction.

Proposition 5.16 If [[x]] 6= ∅ in M(Γ, S), then x ∈ S.

Proof First, assume that yi ∈ [[x]], so that y ∈ S and Γ ` ∀(y, x). Then �

Lemma 5.17 Let Γ be a theory in positive-R, and let S be Γ-EC. Then M(Γ, S) |=Γ.

Proof Most of the details are the same as in Lemma 5.12. The one differenceis in the case of a sentence in Γ of the form ∀(x, ∃(y, r)). We may assume [[x]] 6= ∅,or our result is trivial. An element of [[x]] is either of the form zi or else of the form{u, v}. Let zi ∈ [[x]], so that z ∈ S and ∀(z, x). Then Γ ` ∀(z,∃(y, r)). As S isΓ-EC, y ∈ S. Then zi[[r]]y1 by construction. The other case is for {u, v} ∈ [[x]], sothat Γ ` ∃(u, v). Without loss of generality, u ≤ x. Then Γ ` ∀(u,∃(y, r)). Byconstruction, {u, v}[[r]]y1. �

Theorem 5.18 The logical system determined by the rules in Figure 5.4 issound and complete for positive-R.

Proof Suppose Γ |= ϕ. We show that Γ ` ϕ. The argument splits into casesaccording to the syntax of S.

The first case: ϕ ∈ L(all, some). We omit this case and leave it as an exerciseto the reader.

ϕ is ∃(x, ∃(y, r)). Let S = E(Γ), and consider M(Γ, S). By Lemma 5.12, M |= Γ.Hence M |= ϕ. There are several cases here, depending on the data witnessing ϕ.One case is when there are ui ∈ [[x]] and wj ∈ [[y]] related by [[r]]. We argue fromΓ. We have ∀(u, x), ∀(w, y), ∃u, ∃w, and then easily we see that Γ ` ∃(u,∃(w, r)).Then by monotonicity, Γ ` ∃(x, ∃(y, r)), as desired.

We must also consider the case [[r]] contains a pair such as ({a, b}, {c, d}). Therewould be four subcases here, and we go into details on only one of them: suppose{a, b}[[r]]{c, d}. Without loss of generality, Γ ` ∀(a,∀(c, r)). Since Γ also derives∃a, ∀(a, x), ∃b, and ∀(b, y), we easily get the desired ∃(x,∃(y, r)). The reasoning issimilar when [[r]] contains a pair such as ({a, b}, yj) or one such as (xi, {a, b}).

ϕ is ∀(x, ∀(y, r)). Let

S = E(Γ) ∪ {u : x ; u or y ; u}.S is Γ-EC. As a result, M(Γ, S) |= Γ. Hence also M(Γ, S) |= ϕ. Note that x and ybelong to S, and also x ∈ [[x]] and y ∈ [[y]]. Since M(Γ, S) |= ϕ, x1[[r]]y2. Then byProposition 5.11, part 1, Γ ` ϕ.

ϕ is ∀(x, ∃(y, r)). Here we take

(5.15) S = E(Γ) ∪ {u : x ; u}.As in our previous cases, M(Γ, S) |= ϕ. We have x ∈ S, by definition. In particular,x1 ∈ M . Every witness to ϕ in [[y]] for x1 is either of the form zi or a pair {a, b}.

64 5. VERBS: R

∀(d,∀(c, r))∀(d,∀(c, s))

∀(d, ∃(c, r))∀(d,∃(c, s))

∃(d,∀(c, r))∃(d,∀(c, s))

∃(d,∃(c, r))∃(d,∃(c, s))

Figure 5.5. Rules of inference corresponding to the assertion r ⇒ s.

This paragraph only presents the details for a witness of the first form. Let zi besuch that x1[[r]]zi and z ≤ y. Γ must derive one of the following: ∀(x, ∃(z, r)), or∀(x,∀(z, r)). In the first case, we easily get ∀(x,∃(y, r)). We thus concentrate onthe second case. Since z ∈ S, either z ∈ E(Γ), or else x ; z. If z ∈ E(Γ), thenfrom ∀(x,∀(z, r)), we also get ∀(x, ∃(z, r)). By monotonicity, we have ϕ. The moreinteresting case is when x ; z. When x ; z, we use (J) to immediately derive ϕ.

We have another overall case, going back to x2 ∈M . Suppose that {a, b} ∈ [[y]](so that ∃(a, b)) and x2[[r]]{a, b}. There are four cases here, perhaps one represen-tative one is when Γ ` ∀(a, y) and ∀(x, ∀(b, r)). Then from Γ we have ∀(x,∃(a, r)).Finally, by monotonicity we have ∀(x, ∃(y, r)).

ϕ is ∃(x, ∀(y, r)). On our assumption that Γ |= ϕ, we have Γ |= ∃x. Thus bythe first case on ϕ in this overall proof of this theorem, we know that x ∈ E(Γ).In the model construction, we take S = E(Γ) ∪ {u : y ; u}. The resulting modelM(Γ, S) then satisfies ϕ. Suppose that zi ∈ [[x]] is related to all elements of [[y]]. (Inthe next paragraph, we consider an element of the form {a, b}.) By the structureof the model, we either have Γ ` ∃(z,∀(y, r)) (and then we are easily done becausez ≤ x) or Γ ` ∀(z,∀(y, r)). In the latter case, if z ∈ E(Γ), we easily see that Γ ` ϕ.So we are left with the case that y ; z. And here we use (HH).

We also need to consider the situation when a pair element such as {a, b} ∈ [[x]]is related to all elements of [[y]]. One representative case is when a ≤ x. By definitionof S, y ∈ S and so y2 ∈M . In particular, {a, b}[[r]]y2. By the structure of the model,we have ∀(a,∀(y, r)) or ∀(b,∀(y, r)); without loss of generality, it is ∀(b,∀(y, r)).Since Γ ` ∃(a, b), we have ∃(a,∀(y, r)). We would then have Γ ` ∃(x, ∀(y, r)).

This concludes the proof of Theorem 5.18. �

5.4. Incorporating Background Facts, I

Suppose we have a stock of background facts about verbs, such as

(5.16) hitting entails touching

We mean that every act of hitting is also an act of touching. This background factcannot be stated in any of the languages which we have so far studied. Nevertheless,it can be made into a semantic requirement: we would require of a model that theinterpretation of hitting be a sub-relation of the interpretation of touching. Andthen we might like to study the entailment relation on this smaller class of models.

Even though we cannot state (5.16) as an axiom, it does yield a rule of inference.To state it more abstractly, suppose we have a rule like

(5.17) r ⇒ s

and we restrict attention to the models where [[r]] ⊆ [[s]]. Then Figure 5.5 listssound rules of inference.

5.4. INCORPORATING BACKGROUND FACTS, I 65

We add these to the system R as listed in Figure 5.2.

Proposition 5.19 The system R together with the rules in Figure 5.5 give asound and complete logic: Γ ` ϕ iff Γ |= ϕ.

Proof The soundness being easy, we need only show that every consistent setin the logic is satisfiable. Let Γ be consistent in the current system. Then Γ isconsistent in R, and the proof of Theorem 5.10 shows how to build a model M(Γ)of it. It remains to check that whenever r ⇒ s is one of our constraints, then[[r]] ⊆ [[s]]. And for this, we must look back at Figure 5.3 and the semantics of verbsin M(Γ) defined on page 56.

Figure 5.3 defines a relation RΓx,y for atoms x and y. There is an implicit

dependence to a relation, and so in this proof we shall write RΓ,rx,y . First, we have

some remarks on the relation of RΓ,rx,y and RΓ,s

x,y, again under the assumption thatr ⇒ s.

If (a) holds for r, then by our logic, (a) also holds for s. So in this case,RΓ,rx,y = RΓ,s

x,y.If (b) holds for r, then either (a) or (b) holds for s, and the pictures in Figure 5.3

tell us that RΓ,rx,y ⊆ RΓ,s

x,y.If (c) holds for r, then either (a) or (b) or (c) holds for s, and again RΓ,r

x,y ⊆ RΓ,sx,y.

If (d) holds for r, then either (a) or (b) or (d) holds for s. (It is important tosee that (c) cannot hold for s in this case,) Again RΓ,r

x,y ⊆ RΓ,sx,y.

If (e) holds for r, then one of (a) – (e) holds for s; (f) cannot hold. Once againRΓ,rx,y ⊆ RΓ,s

x,y.If (f) holds for r, then RΓ,r

x,y ⊆ RΓ,sx,y because RΓ,r

x,y is empty.So in all cases, we have checked that RΓ,r

x,y ⊆ RΓ,sx,y. At this point, we return to

the definitions of [[r]] and [[s]] to see that [[r]] ⊆ [[s]]. Recall that these relations involvepair-elements, and they are defined in a case-by-case fashion. However, all of thecases are positive, (that is, the atoms occur positively rather than negatively), andso it is easy to check by inspection that indeed [[r]] ⊆ [[s]].

In this way, we have verified that the model M(Γ) indeed respects backgroundinformation. This concludes the proof. �

Exercise 33 In this section we gave a precise definition of the derivabilityrelation Γ ` ϕ. Here is another characterization of this relation. Let alt be the thesmallest relation on P(R)× R such that

(1) If ϕ ∈ Γ, then Γ alt ϕ.(2) If Γ alt ψ1, . . ., Γ alt ψn and one of the rules of R other than (RAA) has

as a substitution instance ψ1, . . . , ψn\ϕ, then Γ alt ϕ.(3) If Γ ∪ {ϕ} alt ⊥, then Γ alt ϕ.

Prove that Γ ` ϕ iff Γ alt ϕ.

Sources. The first syllogistic logic to employ verbs was Nishihara, Morita, andIwata [24]. The negative result on R is due to Pratt-Hartmann and Moss [29]. Thelogical system for R comes from Moss [21] and from Pratt-Hartmann and Moss [29].The proof in the latter paper is somewhat different, and it has the advantage of alsogiving the NLogSpace complexity result. The work on incorporating backgroundfacts is new here.

CHAPTER 6

Relative Clauses: RC

6.1. RC: an Indirect System

This chapter continues the study of verbs with a presentation of logics contain-ing relative clauses.

We begin with a language called RC. Figure 6.1 exhibits a set of syllogisticrules which we call RC used in connection with the language RC. We shall showthat the indirect derivation relation RC is complete. In the following, the variablesb+, c+ range over positive c-terms, and d over c-terms. (As usual, p, q range overunary atoms and r over binary atoms.)

Rules (D1), (D2), (B), (T) and (I) are natural generalizations of their namesakesin R. Rules (J), (K), and (L) embody logical principles that are intuitively clear,yet not familiar when taken as single steps. If all porcupines are brown animals, theneverything which attacks all brown animals attacks all porcupines (J), and everythingwhich photographs some porcupine photographs some brown animal (K). And if someporcupines are brown animals, then everything which caresses all porcupines caressessome brown animals (L). We have already seen (II). Rule (Z) tells that if there areno porcupines (say), then all farmers love all porcupines. Rule (W) tells us thatunder this same assumption, there is something which loves all porcupines (simplybecause we assume the universe is non-empty, and everything in it vacuously lovesall porcupines). If we did not assume that the universe of a model is non-empty,then we would drop (W), and the completeness of the resulting system would beproved the same way.

All the rules of R are derivable in RC. For example, here is a proof of (A):

[∃(p, d)]1

∃(p, p)(I)

∀(p, p)∃(p, p)

(D1)

∀(p, d)(RAA)1

Theorem 6.1 The derivation relation RC is sound and complete for RC.

Proof To prove completeness, we need only show that every theory Γ in thefragment RC which is consistent in the logic RC is satisfiable. Also, by Lemma 5.9,we may assume that Γ is RC-complete. For the remainder of this section, fix someRC-complete set of formulas Γ which is consistent with respect to RC. We simplifyour notation to write ` for RC.

67

68 6. RELATIVE CLAUSES: RC

∀(c+, c+)(T)

∃(c+, d)∃(c+, c+)

(I)∀(b+, c+) ∀(c+, d)

∀(b+, d)(B)

∃(b+, c+) ∀(c+, d)∃(b+, d)

(D1)∀(b+, c+) ∃(b+, d)

∃(c+, d)(D2)

∀(p, q)∀(∀(q, r),∀(p, r))

(J)∀(p, q)

∀(∃(p, r),∃(q, r))(K)

∃(p, q)∀(∀(p, r),∃(q, r))

(L)

∃(q,∃(p, r))∃(p, p)

(II)∀(p, p)

∀(c+,∀(p, r))(Z)

∀(p, p)∃((∀(p, r),∀(p, r))

(W)

Figure 6.1. Proof rules for the system RC.

We shall construct a structure M and prove that it satisfies Γ. First, let C+

be the set of positive c-terms. Then we define M by:

A = {〈c1, c2, Q〉 ∈ C+ ×C+ × {∀,∃} : Γ ` ∃(c1, c2)}[[p]] = {〈c1, c2, Q〉 ∈ A : Γ ` ∀(c1, p) or Γ ` ∀(c2, p)}〈c1, c2, Q1〉[[r]](d1, d2, Q2) iff either (a) for some i, j, and q ∈ P,

Γ ` ∀(ci,∀(q, r)) and Γ ` ∀(dj , q);or else (b) Q2 = ∃, and for some i and q ∈ P,d1 = d2 = q, and Γ ` ∀(ci,∃(q, r)).

Note that the set A is non-empty. For let p ∈ P. If Γ ` ∃(p, p), then 〈p, p,∀〉 ∈ A.Otherwise, Γ ` ∀(p, p), and so for all binary atoms r, Γ ` ∃(∀(p, r),∀(p, r)) by (W).Thus 〈c, c,∀〉 ∈ A, where c is ∀(p, r).

Lemma 6.2 For all c ∈ C+,

[[c]] = {〈d1, d2, Q〉 ∈ A : either Γ ` ∀(d1, c), or Γ ` ∀(d2, c)}.

Proof The result for c a unary atom is immediate. We often shall use theresulting fact that if Γ ` ∃(p, p), then [[p]] 6= ∅; it contains both 〈p, p,∀〉 and 〈p, p,∃〉.The main work concerns c-terms of the form ∀(p, r) and ∃(p, r). We remark thatall c-terms referred to in this proof are positive.

We begin with c = ∀(p, r). Let 〈d1, d2, Q〉 ∈ [[∀(p, r)]]. Now, either Γ ` ∃(p, p)or Γ 6` ∃(p, p). If the former, then 〈p, p,∀〉 ∈ [[p]]. By the semantics of our fragment,〈d1, d2, Q〉[[r]]〈p, p,∀〉. By the structure of M, there are i and q giving the derivationfrom Γ as in the tree on the left below:

....∀(di,∀(q, r))

....∀(p, q)

∀(∀(q, r),∀(p, r))(J)

∀(di,∀(p, r))(B)

....∀(p, p)

∀(dj ,∀(p, r))(Z).

6.1. RC: AN INDIRECT SYSTEM 69

This shows that Γ ` ∀(di,∀(p, r)). On the other hand, if Γ 6` ∃(p, p), we use theassumption that Γ is complete to assert that Γ ` ∀(p, p). And then we have thederivation from Γ on the right above, for both j.

Conversely, fix i and suppose that Γ ` ∀(di,∀(p, r)). We claim that 〈d1, d2, Q〉belongs to [[∀(p, r)]]. For this, take any 〈b1, b2, Q′〉 ∈ [[p]] so that Γ ` ∀(bj , p) forsome j. (We are thus using b1 and b2 to range over positive c-terms, just as thec’s and d’s do.) Then p, i and j show that 〈d1, d2, Q〉[[r]]〈b1, b2, Q′〉. This for allelements of [[p]] shows that 〈d1, d2, Q〉 ∈ [[∀(p, r)]].

We next prove the statement of our lemma for c = ∃(p, r).Let 〈d1, d2, Q〉 ∈ [[∃(p, r)]]. Thus we have 〈d1, d2, Q〉[[r]]〈b1, b2, Q′〉 for some

〈b1, b2, Q′〉 ∈ [[p]]. We first consider case (a) in the definition of our structure M:there are i, j, and q so that Γ ` ∀(di,∀(q, r)) and Γ ` ∀(bj , q). We have Γ ` ∃(b1, b2),since 〈b1, b2, Q′〉 ∈ A. Further, let k be such that Γ ` ∀(bk, p). We show the desiredconclusion using a derivation from Γ:

....∀(di,∀(q, r))

....∃(b1, b2)∃(bk, bj)

....∀(bj , q)

∃(bk, q)(D1)

....∀(bk, p)

∃(q, p)(D1)

∀(∀(q, r),∃(p, r))(L)

∀(di,∃(p, r))(B)

This concludes the work in case (a). In case (b), Q′ = ∃, there is some q ∈ Psuch that b1 = b2 = q, and for some i, Γ ` ∀(di,∃(q, r)). Again we have Γ ` ∀(q, p).So we have a derivation from Γ as follows:

....∀(di,∃(q, r))

....∀(q, p)

∀(∃(q, r),∃(p, r))(K)

∀(di,∃(p, r))(B)

At this point, we know that if 〈d1, d2, Q〉 ∈ [[∃(p, r)]], then Γ ` ∀(di,∃(p, r))for some i. We now verify the converse. Let 〈d1, d2, Q〉 ∈ A, and fix i such thatΓ ` ∀(di,∃(p, r)). Then Γ ` ∃(d1, d2). We thus have a derivation from Γ:

....∃(d1, d2)∃(di, di)

(I)....

∀(di,∃(p, r))∃(di,∃(p, r))

(D1)

∃(p, p)(II)

This goes to show that 〈p, p,∃〉 ∈ A. By the construction of M, 〈d1, d2, Q〉[[r]]〈p, p,∃〉,and 〈p, p,∃〉 ∈ pM. So 〈d1, d2, Q〉 ∈ [[∃(p, r)]]. This completes the proof. �

Lemma 6.3 M |= Γ.

Proof The proof is by cases on the various formula types in RC. Using thefact that formulas ∃(e, f) and ∃(f, e) are identified, and similarly for ∀(e, f) and


∀(f, e), we may take all RC-formulas to have one of the forms:

∀(c+, d+), ∀(c+, d+), ∃(c+, d+), ∃(c+, d+),

where c+ and d+ range over positive c-terms. In the remainder of the proof, weomit the +-superscripts for clarity: i.e. c and d range over positive c-terms.

Let ϕ ∈ Γ be ∀(c, d). Using (B) and Lemma 6.2, we see that [[c]] ⊆ [[d]].Let ϕ ∈ Γ be ∀(c, d). Suppose towards a contradiction that M 6|= ϕ. Let

〈b1, b2, Q〉 ∈ [[c]] ∩ [[d]]. Let i and j be such that Γ ` ∀(bi, c) and Γ ` ∀(bj , d). Thenusing (B), Γ ` ∀(bi, d). And since Γ ` ∃(bi, bj), we use (D1) to see that Γ ` ∃(d, d).So Γ is inconsistent, a contradiction.

If ϕ ∈ Γ is ∃(c, d), then (c, d,∃) ∈ A. Indeed, (c, d,∃) ∈ [[c]] ∩ [[d]], by Rule (T)and Lemma 6.2.

Finally, consider the case when ϕ ∈ Γ is of the form ∃(c, d). Then, using (I),Γ ` ∃(c, c), so 〈c, c,∀〉 ∈ A. Suppose towards a contradiction that M |= ∀(c, d).Then 〈c, c,∀〉 ∈ [[d]]. But then we have Γ ` ∀(c, d), by Lemma 6.2 again. Oneapplication of (D2) now shows that Γ ` ∃(d, d). Thus we have a contradiction tothe consistency of Γ. �

This completes the proof of Theorem 6.1. �

6.1.1. Negative results for larger fragments. Recall that the languagesR† and RC† have full noun-level negation. (This is the point of the † notation.)Recall also from Theorem 5.15 that there is no logical system for R which is simul-taneously finite, sound and complete. In fact, an even stronger result holds for R†

and RC†.

Theorem 6.4 There exists no finite set X of syllogistic rules in R† such that X is both sound and complete.

That is, even allowing for reductio ad absurdum, there are still no logical systemsfor R† and RC† which are simultaneously finite, sound and complete.

The proof of Theorem 6.4 is similar to what we saw earlier in Theorem 5.15for R, but the argument here is more intricate and so we are not going to give it inthese notes.

6.2. Complexity Results for RC, R†, and RC†

In this section, we study the computational complexity of the logical systemswith which we have been concerned. We are mainly interested in the complexity ofthe consequence relation, that is

{(Γ, ϕ) : Γ is a finite set, and Γ ` ϕ}.Recall that we defined syllogistic proof systems in Section 5.2.1. Derivation

relations defined by direct proof-systems are easily seen to have polynomial-timecomplexity.

Lemma 6.5 Let F be a syllogistic fragment, and X a finite set of syllogistic rulesin F. The problem of determining whether Θ `X θ, for a given set of F-formulasΘ and F-formula θ, is in PTime.

Proof Let Σ be the set of all atoms (unary or binary) occurring in Θ ∪ {θ},together with one additional binary atom r. We first observe that, if there is aderivation of θ from Θ using the rules X, then there is such a derivation involving

6.2. COMPLEXITY RESULTS FOR RC, R†, AND RC† 71

only the atoms occurring in Σ. For, given any derivation of θ from Θ, uniformlyreplace any unary atom that does not occur in Θ∪{θ} with one that does. Similarly,uniformly replace any binary atom which does not occur in Θ∪{θ} with one whichdoes (or with r in case Θ ∪ {θ} contains no binary atoms). This process obviouslyleaves us with a derivation of θ from Θ, using the rules X.

To prove the lemma, let the the total number of symbols occurring in Θ∪{θ} ben. Certainly, |Σ| ≤ n. Let X comprise k1 proof-rules, each of which contains at mostk2 atoms (unary or binary). The number of rule instances involving only atomsin Σ is bounded by p(n) = k1n

k2 . Hence, we need never consider derivations with‘depth’ greater than p(n). Let Θi be the set of formulas involving only the atomsin Σ, and derivable from Θ using a derivation of depth i or less (0 ≤ i ≤ p(n)).Evidently, |Θi| ≤ |Θ| + p(n). It is then straightforward to compute the successiveΘi in total time bounded by a polynomial function of n. �

Theorem 6.6 (McAllester and Givan [15]) The satisfiability problem for asequent Γ in RC is NPTime-complete. (Thus the validity problem is co-NPTime-complete.)

Proof It follows from the proof of Theorem 6.1, that if Γ is a satisfiable theoryin RC, then Γ has a model whose size is at most 2n, where n is the number ofpositive c-terms in the language. (This bound is independent of Γ.) It follows thatsatisfiability is in NPTime. The main work is in showing the NPTime-hardness.

Our proof is a small variation on the original argument. We use a reductionfrom the monotone exactly-1 3SAT problem. This problem is defined as follows.We are given a conjunction of 3-CNF clauses without negation, so each clause is ofthe form U ∨ V ∨W . The problem is to find a truth assignment f to the variablesmaking one variable in each clause T and the other two variables F. We call thisa 1-valued assignment on S. This problem was shown to be NPTime-complete inSchaefer [31].

Consider a sentence S which is a conjunction of clauses, each of which is adisjunction of three variables without negation. We define an RC-theory Γ = Γ(S)via two clauses below. It uses unary atoms which correspond to the variables of S:we use u to correspond with U , etc. Γ also uses a number of new unary and binaryatoms. It is defined as follows:

(1) For each clause of S, say c ≡ U ∨ V ∨W , add to Γ the seven sentences

∀(xc,∀(u, r1c ))

∀(∃(u, r1c ), yc)

∀(yc,∀(v, r2c ))

∀(∃(v, r2c ), zc)

∀(zc,∀(w, r3c ))

∀(∃(w, r3c ), ac)

∃(xc, ac)

Here xc, yc, zc and ac are new unary atoms, and r12, r2

c , and r3c are new

binary atoms.(2) Let P and Q be any two distinct variables which occur together in some

clause c. Then add to Γ the sentence

∀(∀(p, rp,q),∃(q, r′p,q))Here rp,q and r′p,q are new binary atoms.

So if S has k clauses, then the first point will add 4k new unary atoms and 3k newbinary atoms. The second clause will add at most 2 ·

(3k2

)< 18k2 new binary atoms.

We assume that all of the atoms listed are distinct.


The main claim is that S has a 1-valued assignment iff Γ is satisfiable. In onedirection, assume that M |= Γ. Define a truth assignment f by f(U) = F iff [[u]] 6= ∅.Consider a clause c ≡ U ∨ V ∨W of S. If f(U) = f(V ) = f(W ) = F, then [[u]], [[v]],and [[w]] are all non-empty. By the first six points in (1), [[xc]] ⊆ [[yc]] ⊆ [[zc]] ⊆ [[ac]].But this contradicts the last point in (1). Thus we know that at least one variablein c is assigned the value T by f . We claim that only one variable can be T.For suppose towards a contradiction that (for example) f(U) = f(V ) = T. Then[[u]] = [[v]] = ∅. So [[∀(u, rp,q)]] = M and [[∃(v, r′p,q)]] = ∅. By the sentence in (2), Mis empty. But this is impossible, since as soon as S has at least one clause, Γ hasan existential assertion via (1). In this way, f is 1-valued on S.

We conclude by checking the converse assertion. Suppose f is 1-valued on S. Wemust find a model M |= Γ. Let M be the set of variables U such that f(U) = F. Fora variable X, define [[x]] = ∅ if f(X) = T, and [[x]] = {x} if f(X) = F. We still needto define the interpretations of all of the binary atoms, and all of the other unaryatoms. Suppose that P and Q are distinct variables which happen to belong to thesame clause. We know that either f(P ) = F or f(Q) = F (or both). In the first case,set [[rp,q]] = ∅ so that [[∀(p, rp,q)]] = ∅. (The interpretation of r′p,q is arbitrary in thiscase; we no longer mention this point.) This makes M |= ∀(∀(p, rp,q),∃(q, r′p,q)).In the second case, [[r′p,q]] = M ×M , so that [[∃(q, r′p,q)]] = M . In this way, thesentence in (2) holds in M.

Finally, we consider the sentences in (1). If f(U) = T, f(V ) = F, and f(W ) =F, then we already have [[u]] = ∅, [[v]] = {v}, and [[w]] = {w}. We set [[xc]] = M ,[[yc]] = ∅, [[zc]] = ∅, [[ac]] = {w}, [[r1

c ]] = M ×M , [[r2c ]] = ∅, and [[r3

c ]] = {(w,w)}.If f(U) = F, f(V ) = T, and f(W ) = F, set [[xc]] = M , [[yc]] = M , [[zc]] = {w},

[[ac]] = {w}, [[r1c ]] = M ×M , [[r2

c ]] = ∅, and [[r3c ]] = {(w,w)}.

If f(U) = F, f(V ) = F, and f(W ) = T, set [[xc]] = M , [[yc]] = M , [[zc]] = M ,[[ac]] = ∅, [[r1

c ]] = M ×M , [[r2c ]] = M ×M , and [[r3

c ]] = ∅.In all cases, the resulting model M satisfies all sentences in (1), hence all sen-

tences in Γ. �

Lemma 6.7 (Pratt-Hartmann and Moss [29]) The problem of determining thevalidity of a sequent in R† is ExpTime-hard.

Proof The logic KU is the basic modal logic K together with an additionalmodality U (for “universal”), whose semantics are given by the standard relational(Kripke) semantics, plus

|=w Uϕ if and only if |=w′ ϕ for all worlds w′.

The satisfiability problem for KU is ExpTime-hard. (The proof is an easy adapta-tion of the corresponding result for propositional dynamic logic; see, e.g. Harel etal. [9]: 216 ff.) It suffices, therefore, to reduce this problem to satisfiability in R†.Let ϕ be a formula of KU .

We first transform ϕ into an equisatisfiable set of formulas Tϕ ∪ Sϕ of first-order logic; then we translate the formulas of Tϕ ∪ Sϕ into an equisatisfiable setof R†-formulas. To simplify the notation, we shall take unary atoms (in R†) to beunary predicates (in first-order logic); similarly, we take binary atoms to do doubleduty as binary predicates. Let r and e be binary atoms. For any KU -formula ψ,let pψ be a unary atom, and define the set of first-order formulas Tψ inductively as

6.2. COMPLEXITY RESULTS FOR RC, R†, AND RC† 73

follows:

Tp = ∅ (where p is a proposition letter)

Tψ∧π =Tψ ∪ Tπ ∪ {∀x(pψ(x) ∧ pπ(x)→ pψ∧π(x)),

∀x(pψ∧π(x)→ pψ(x)),∀x(pψ∧π(x)→ pπ(x))}T¬ψ = Tψ ∪ {∀x(p¬ψ(x)→ ¬pψ(x)),∀x(¬p¬ψ(x)→ pψ(x))}

T2ψ =Tψ ∪ {∀x(p2ψ(x)→ ∀y(¬pψ(y)→ ¬r(x, y))),

∀x(¬p2ψ(x)→ ∃y(¬pψ(y) ∧ r(x, y)))}

TUψ =Tψ ∪ {∀x(pUψ(x)→ ∀y(¬pψ(y)→ ¬e(x, y))),

∀x(¬pUψ(x)→ ∃y(¬pψ(y) ∧ e(x, y)))}.Now let Sϕ be the collection of five first-order formulas

∃x(pϕ(x) ∧ pϕ(x)), ∀x(±pϕ(x)→ ∀y(±pϕ(y)→ e(x, y))).

(Although the first formula looks like it has a redundant conjunct, we state it inthis way only to make our work below a little easier.) We claim that the modalformula ϕ is satisfiable if and only if the set of first-order formulas Tϕ ∪ Sϕ issatisfiable. For let M be any (Kripke) model of ϕ over a frame (W,R). Definethe first-order structure M with domain W , by setting rM = R, eM = A2, andpMψ = {w | M |=w ψ}, for any subformula ψ of ϕ. It is then easy to check that

M |= Tϕ ∪ Sϕ. Conversely, suppose M |= Tϕ ∪ Sϕ. We build a Kripke structureM over the frame (A, rM) by setting, for any proposition letter o mentioned in ϕ,M |=a o if and only if a ∈ pM

o . A straightforward structural induction establishesthat for any subformula ψ of ϕ, M |=a ψ if and only if a ∈ pM

ψ . The formula∃x(pϕ(x) ∧ pϕ(x)) ∈ Sϕ then ensures that ϕ is satisfied in M.

Now, all of the formulas in Tϕ ∪ Sϕ are of one of the forms

∀x(±p(x)→ ±q(x)) ∀x(±p(x)→ ∀y(±q(y)→ ±r(x, y)))(6.18)

∃x(p(x) ∧ p(x)) ∀x(±p(x)→ ∃y(±q(y) ∧ r(x, y)))(6.19)

∀x(p(x) ∧ q(x)→ o(x)).(6.20)

Notice that formulas of the forms (6.18) and (6.19) translate (in the obvious sense)directly into the fragment R†; those of form (6.20), by contrast, do not. The nextstep is to eliminate formulas of this last type.

Let o∗ be a new unary relation symbol. For θ ∈ Tϕ ∪ Sϕ of the form (6.20), letrθ be a new binary atom, and define Rθ to be the set of formulas

∀x(¬o(x)→ ∃z(o∗(z) ∧ rθ(x, z)))(6.21)

∀x(p(x)→ ∀z(¬p(z)→ ¬rθ(x, z)))(6.22)

∀x(q(x)→ ∀z(p(z)→ ¬rθ(x, z))),(6.23)

which are all of the forms in (6.18) or (6.19). It is easy to check that Rθ |= θ. Forsuppose (for contradiction) that M |= Rθ and a satisfies p and q but not o in M.By (6.21), there exists b such that M |= rθ[a, b]. If M 6|= p[b], then (6.22) is falsein M; on the other hand, if M |= p[b], then (6.23) is false in M. Thus, Rθ |= θ asclaimed. Conversely, if M |= θ, expand M to a structure M′ by interpreting o∗ andrθ as follows:

(o∗)M = A

rMθ = {〈a, a〉 |M 6|= o[a]}.


We check that M′ |= Rθ. Formula (6.21) is true, because M′ 6|= o[a] implies M′ |=rθ[a, a]. Formula (6.22) is true, because M′ |= rθ[a, b] implies a = b. To seethat Formula (6.23) is true, suppose M′ |= q[a] and M′ |= p[b]. If a = b, thenM |= o[a] (since M′ |= θ); that is, either a 6= b or M |= o[a]. By construction, then,M′ 6|= rθ[a, b].

Now let T ∗ϕ be the result of replacing all formulas θ in Tϕ of form (6.20) with thecorresponding trio Rθ. (The binary atoms rθ for the various θ are assumed to bedistinct; however, the same unary atom o∗ can be used for all θ.) By the previousparagraph, T ∗ϕ ∪ Sϕ is satisfiable if and only if Tϕ ∪ Sϕ is satisfiable, and hence ifand only if ϕ is satisfiable. But T ∗ϕ ∪ Sϕ is a set of formulas of the forms (6.18)and (6.19), and can evidently be translated into a set of R†-formulas satisfied inexactly the same structures. Moreover, this set can be computed in time boundedby a polynomial function of ‖ϕ‖. This completes the reduction. �

We note the following fact. (We omit a detailed proof, since subsequent devel-opments do not hinge on this result.)

Lemma 6.8 The problem of determining the validity of a sequent in RC† is inExpTime.

Proof Trivial adaptation of Pratt-Hartmann [25], Theorem 3, which considersa fragment obtained by adding relative clauses to the relational syllogistic. �

Theorem 6.9 The validity problem for R† and RC† are ExpTime-complete.

Proof Lemmas 6.7 and 6.8. �

Corollary 6.10 There exists no finite set X of syllogistic rules in either R†

or RC† such that `X is both sound and refutation-complete.

Proof It is a standard result that PTime 6=ExpTime. The result is thenimmediate by Lemmas 6.5 and 6.7. �

Of course, Corollary 6.10 leaves open the possibility that there exist indirectsyllogistic systems that are sound and complete for R† and RC†. To show thatthere do not, stronger methods are required.

6.3. Complete Subsystems

different point below. Moss [21] also analyses a syllogistic logic with negatednouns (not verbs) and only using All. So in our notation, its formulas are ∀(l,m)and ∀(l,∀(m, r)), with l and m unary literals. In addition to rules we have seen, ituses the following additional rules which might be taken to be forms of the law ofthe excluded middle:

∀(n, ∀(q, r)) ∀(n,∀(q, r))∀(p,∀(q, r))

∀(p, ∀(n, r)) ∀(p, ∀(n, r))∀(p,∀(q, r)).

The system is completed by a rule with three premises:∀(p, ∀(n, r)) ∀(o,∀(q, r)) ∀(o,∀(n, r))

∀(p,∀(q, r)).A sub-fragment of the McAllester and Givan fragment was considered in Moss [21].

In our terms, the formulas would be of the forms ∀(c, d) and ∃(c, d), where c and dare class-terms of the forms p (an atom), ∃(c, r), or ∀(d, r). Thus the system lacks

6.4. INCORPORATING BACKGROUND FACTS, II 75

negation. The rules are versions of rules we have seen in Section 6.1: (T), (I), (B),(D), (J), (K), (L), and (II); it also employs a rule allowing for reasoning-by-cases,which is not a syllogistic rule in our sense. An informal example shows that thesystem allows non-trivial inferences, and inferences with more than one verb.

All porcupines are mammals

All who respect all mammals respect all porcupines

All who dislike all who respect all porcupines dislike all who respect all mammals.

6.4. Incorporating Background Facts, II

As in Section 5.4, we can consider background facts among verbs expressedsemantically as constraints of the form [[r]] ⊆ [[s]]. In the case of the system RC weobtain two new rules:

(6.24) ∀(∃(c, r),∃(c, s)) ∀(∀(c, r),∀(c, s))We add these to the system RC as listed in Figure 6.1 in order to accommodatebackground facts.

Sources. Most of the material in this chapter is from Moss [21], Pratt-Hartmannand Moss [29].

To the best of my knowledge, the first presentation of a complete proof-systemfor a fragment close to the relational syllogistic seems to be Nishihara, Morita, andIwata [24]. This logic is in effect a relational version of Lukasiewicz’, in that formulasroughly similar to those of R are treated as atoms of a propositional calculus.The authors provide axiom-schemata which, together with the usual axioms ofpropositional logic, yield a complete proof-system for the language in question.Actually, the propositional atoms in this language are allowed to feature n-arypredicates for all n ≥ 1. However, the rather strange restrictions on quantifier-scope(existentials must always outscope universals), mean that this language is primarilyof interest for atoms featuring only unary and binary predicates; these atoms (andtheir negations) then essentially correspond the formulas of our fragment R.

CHAPTER 7

Comparative Adjectives

This chapter proposes and studies logical systems in which one can expressarguments such as the following:

(7.1)

Every giraffe is taller than every gnuSome gnu is taller than every lionSome lion is taller than some zebraEvery giraffe is taller than some zebra

We encourage the reader to work out an argument showing that the conclusionof (7.1) follows from the premises. A picture would do. Any convincing argumentmust appeal in some way to the transitivity of is taller than; without transitivity, theconclusion does not follow. (The argument also depends on the subject-wide scopereading in the second sentence, and indeed all sentences in this chapter are readwith the subject having wide scope.) Although transitivity is salient, it is also silent:it is not explicitly mentioned, and it even seems unnecessary to add the transitivityas an additional premise. Every comparative adjective phrase is most naturallyinterpreted by a transitive relation, and so this point will be of central importancein this chapter. Further, the interpretations are irreflexive, corresponding to thefact that sentences like Every woman is not taller than some woman (with the subjectwide scope reading) are logical truths. (This is because for every woman w, w isdefinitely not taller than herself.) Returning to the topic of our chapter, we wishto examine reasoning in simple fragments of language, and the transitivity will giveus many sound principles for comparative adjective phrases.

We are especially interested in complete and decidable logical systems in whichone can represent the main features of arguments such as (7.1). Here is a relatedpoint. One might take a formal representation of (7.1) to just be a translation intofirst-order logic, together with a proof of the conclusion from the hypotheses insome formal system or other. For the purposes of this chapter, this will not do: weare interested only in decidable logical systems. A more sophisticated point mightbe to translate (7.1) into the two-variable fragment FO2 of first-order logic. Thisfragment is indeed decidable, by Mortimer’s Theorem [17]. But the requirementof transitivity cannot be expressed in FO2, as shown in Theorem 20 of Purdy [30].And a related point: FO2 enriched by atomic sentences saying “R is a transitiverelation” is also undecidable, by a result in Gradel et al [8].

Furthermore, the same premises in (7.1) also entail Everything which is tallerthan some giraffe is taller than some zebra. This last sentence is important due tothe relative clause in the subject. For another example, consider:

(7.2)Every hyena is taller than some jackalEverything taller than some jackal is not heavier than any warthogEverything which is taller than some hyena is not heavier than any warthog

77

78 7. COMPARATIVE ADJECTIVES

The reader wishing to get an idea of the subject is encouraged to devise a logicalsystem in which one can carry out the reasoning in (7.1) and (7.2) and which hasno individual variables.

Our plan in this chapter is not to propose any new logical languages, but ratherto use the languages R and RC which we have seen in Chapter 5. Recall that theselanguages have “binary atoms” r which in R and RC were taken to correspondto English transitive verbs (verbs taking direct objects). The difference is thatRC allows relative clauses in subject NP’s, a point hinted at above. That is, thesentences in (7.1) are formalizable in R, while those in (7.2) require the largermachinery of RC. Previously, binary atoms were interpreted as arbitrary relationson some domain. We saw logical completeness results for semantic interpretationswhere the verbs were interpreted as arbitrary relations. To pursue the topic of thischapter, we therefore restrict attention to interpretations where binary atoms arerequired to be interpreted by transitive and irreflexive relations. This restrictionmeans that more logical laws will be valid. This holds for both R and RC. Thepoint of the chapter is to characterize these valid inference patterns by syllogisticproof systems.

We find it convenient to separate transitivity from irreflexivity in our work. Ac-cordingly, we study (i) R interpreted on transitive models (Sections 7.1.2 and 7.1.1),(ii) RC on transitive models (Section 7.2.1), and (iii) RC on transitive and irreflexivemodels (Section 7.2.2).

Incidentally, interpreting R on transitive and irrreflexive relations immediatelygives an axiom of infinity : if every child is taller than some child, and there is somechild to begin with, then there are infinitely many children. The model constructionin Section 7.2.2 reflects this. To the best of my knowledge, this is the first timeinfinite models have turned up in the study of syllogistic systems.

Relation to other work. The model construction in Section 7.1.2 is due toZiegler [40]. Specifically, the relations in Figure 5.3 are essentially from [40].

The two logical languages studied in this chapter are the languages R and RC

introduced in [29]. Although we shall use the same formal languages, we glossthem in English in a different way. In previous chapters of these notes (startingwith Chapter 5), we took the binary atoms to represent English transitive verbs,in this chapter we take them to be comparative adjective phrases.

7.1. Requiring Transitivity in R

The first system in this chapter is an extension of R which enforces transitivityof the relation. We say “the” relation, because in languages like R, every sentencehas at most one binary atom. Thus by Craig Interpolation Theorem (for example),if Γ |= ϕ, then ϕ indeed follows semantically from the sentences in Γ with the samebinary atom (or no atom). So we might as well formulate R in terms of a singlebinary atom. (None of these points hold for RC studied below; in that system onemay reason about several relations.)

7.1.1. R(tr). In Figure 7.1 we introduce a set of eight rules R(tr) which areeasily seen to be sound for transitive interpretations. These all make intuitivelyclear points that reflect transitivity. For example, (ρ6) may be exemplified asfollows: if some girl is taller than all boys, and some boy is taller than some teacher,then some girl is taller than some teacher.

7.1. REQUIRING TRANSITIVITY IN R 79

∀(x, ∀(y, r)) ∃(y,∀(z, r))∀(x, ∀(z, r))

(ρ1)∀(x, ∀(y, r)) ∃(y,∃(z, r))

∀(x, ∃(z, r))(ρ2)

∀(x, ∃(y, r)) ∀(y,∃(z, r))∀(x, ∃(z, r))

(ρ3)∀(x, ∃(y, r)) ∀(y,∀(z, r))

∀(x, ∀(z, r))(ρ4)

∃(x, ∀(y, r)) ∃(y,∀(z, r))∃(x, ∀(z, r))

(ρ5)∃(x, ∀(y, r)) ∃(y,∃(z, r))

∃(x, ∃(z, r))(ρ6)

∃(x, ∃(y, r)) ∀(y,∀(z, r))∃(x, ∀(z, r))

(ρ7)∃(x, ∃(y, r)) ∀(y,∃(z, r))

∃(x, ∃(z, r))(ρ8)

Figure 7.1. Additions to R to get the system R(tr).

We should mention that some variations on the rules in Figure 7.1 which appearto be sound really are unsound. For example, consider

∀(x, ∀(y, r)) ∀(y,∀(z, r))∀(x, ∀(z, r))

If there is one x, one y, and no z, then the premises hold vacuously. But theconclusion fails when the x does not have the relation r to the z. Indeed worriesabout the emptiness of interpretations are the main source of complications inproofs in this subject.

We use R(tr) for the set of rules containing those in Figures 5.2 and 7.1. And,as earlier in these lecture notes, we also use the same notation for the derivationrelation determined by this set of rules.

We shall prove the completeness result for R(tr) in Theorem 7.2 below. Beforestarting in on that, let us check that the system can represent valid arguments suchas that found in (7.1) and repeated below:

Every giraffe is taller than every gnuSome gnu is taller than every lionSome lion is taller than some zebraEvery giraffe is taller than some zebra

Here is the relevant derivation:∀(giraffe,∀(gnu, taller)) ∃(gnu,∀(lion, taller))

∀(giraffe,∀(lion, taller))(ρ1)

∃(lion,∃(zebra), taller)∀(giraffe,∃(zebra, taller))

(ρ2)

7.1.2. Model construction. We shall show that every set Γ of sentenceswhich is consistent in R(tr) is satisfiable on a transitive model. The proof is nearlythe same as the one we saw earlier for R, except that we need the additionalinformation that the model which we construct is actually transitive. This is thecontent of the following result.


Lemma 7.1 Let Γ be complete. Then [[r]] is transitive in M(Γ).needed!!

Proof We are first going to check the transitivity of [[r]] when restricted tothe elements xi; after this we shall consider also the pair-elements {p, q}.

The idea is to look at all of the possible ways that instances of our six types ofdiagrams in Figure 5.3 can appear next to each other. We need not worry about(f), but we still must examine 5×5 = 25 cases. These are shown in the table below:

(a) (b) (c) (d) (e)(a) ρ4,∃ ρ3, ρ5,∃ ρ1 ρ3,∃ ρ2

(b) ρ4 ρ3, ρ5 ρ5 ρ3 ρ6

(c) ρ5,∃ ρ5 ρ5 ρ6,∃ ρ6

(d) ρ4 ρ3 − ρ3 −(e) ρ8, ρ7 ρ8 − ρ8 −

The entry in the ith row and jth column indicates which rule(s) are needed inthe verification of transitivity when an instance of the ith letter of the alphabet isplaced on the immediate left of an instance of the jth letter. A dash − in the tablemeans that no instances of transitivity arise. For the other entries, it is best to givesome examples; these also explain the notation. Consider first an instance of (c) tothe left of an instance of (b), shown on the left below:

(7.3) x1 y1 // z1

x2 //

66nnnnnn y2 //

66nnnnnn z2

x1 y1 // z1

x2 //

66nnnnnn y2

66nnnnnn z2

We need to check that x2 → z1 and x2 → z2. But we have Γ ` ∃(x, ∀(y, r)); andalso Γ ` ∃(y,∀(z, r)). We use (ρ5) to see that Γ ` ∃(x,∀(z, r)). This implies thatx2 → z1 and x2 → z2, as desired.

Next, consider an instance of (c) to the left of an instance of (d); this is shownin the right of (7.3) above. This time, Γ ` ∃(x,∀(y, r)) and also Γ ` ∀(y,∃(z, r)). Itdoes not follow from these alone that Γ ` ∃(x,∃(z, r)). However, since the y pointsbelong to the model, Γ ` ∃(y, y). So Γ ` ∃(y,∃(z, r)). Now we may use (ρ6) to seethat Γ ` ∃(x, ∃(z, r)). (In the chart, the steps of the argument where we need toknow that Γ derives the existence of some object are indicated with a ∃ symbol.)

We have only given two of the 25 cases, but the rest are very similar. It remainsto verify transitivity when one or two pair-elements {p, q} are involved. Here is oneof the many similar verifications. Suppose that u2 → {p, q} → w1. There areseveral ways that this could happen, and to be concrete, let us assume that from Γwe have ∃(u,∀(z, r)),∃(p, q), and ∀(q,∃(w, r)). From the first two of these, we have∃(u,∃(q, r)). Then with the last we indeed have ∃(u,∀(w, r)). This means thatu2 → w1, just as desired.

All of the remaining points are similar. �

And as in Theorem 5.10, we have the following result:

Theorem 7.2 For Γ∪{ϕ} ⊆ R, Γ |= ϕ on transitive models iff Γ ` ϕ in R(tr).

7.2. Requiring Transitivity and Irreflexivity in RC

Section 7.1 studied what happens when one requires the interpretations ofbinary atoms in R to be transitive relations. An expansion of R to a language RC

7.2. REQUIRING TRANSITIVITY AND IRREFLEXIVITY IN RC 81

∀(p,∃(q, r))∀(∃(p, r),∃(q, r))

(tr1)∀(p,∀(q, r))

∀(∃(p, r),∀(q, r))(tr2)

∃(p,∀(q, r))∀(∀(p, r),∀(q, r))

(tr3)∃(p,∃(q, r))

∀(∀(p, r),∃(q, r))(tr4)

Figure 7.2. The four transitivity rules. RC(tr) is RC togetherwith these rules.

was introduced in Pratt-Hartmann and Moss [29]. That chapter also axiomatizedRC by the logical system called RC which is shown in Figure 7.2. We thereforecontinue this chapter by studying the effect of transitivity in the models of RC.The effect is to add the additional set of rules shown in Figure 6.1. We prove thecompleteness of the resulting system in Section 7.2.1 below.

7.2.1. RC(tr). We use RC(tr) for the set of rules in Figures 6.1 and 7.2, andwe use the same symbol for the logical system associated to it. The transitivityrules of RC(tr) found in Figure 7.2 are similar to the ones we have seen in R(tr).One should check the soundness of (tr1) – (tr4) for transitive models, and we omitthese routine details.

Our formulation of (RAA) for R must be more general than the one for R,since we now allow contradictions of the form ∃(c, c), where c might be a (complex)c-term.

Here is an example of a derivation in RC(tr), corresponding to the informalexample presented in (7.2):

∀(hyena,∃(jackal, taller))∀(∃(hyena, taller),∃(jackal, taller))

(tr1)∀(∃(jackal, taller),∀(warthog, heavier))

∀(∃(hyena, taller),∀(warthog, heavier)))(B)

The application of (tr1) corresponds to using the premise every hyena is taller thansome jackal to derive the sentence everything which is taller than some hyena is tallerthan some jackal. The second step corresponds to the transitivity of predication(is).

Our completeness result is that the logical system RC(tr) is complete for theclass of transitive models. The proof is a modification of the corresponding resultfor RC in [29]. That result showed that RC is complete for the class of all models.Rather than reproduce all of the details, we review the main idea and quote withoutproofs two lemmas from [29]. In the sequel, the variables b, c range over positivec-terms, and d over the larger class of all c-terms. As earlier, p, q, x, y, and z rangeover unary atoms and r over binary atoms.

The relation between RC(tr) and R(tr). We have already seen that RC derivesall of the rules in R. We mention here one point not in [29], the reasoning behind


a derivation of (D3) in Figure 5.2. A derivation may be found on the left below:

∃(p, c)[∀(p, q)]1 ∀(q, c)

∀(p, c)(B)

∃(p, p)(D1)

∃(p, q) (RAA)1∀(x, ∃(z, r))

∀(y,∃(z, r))∀(∃(y, r),∃(z, r))

(tr1)

∀(x,∃(z, r))(B)

Our application of (D1) uses the identification of ∀(p, c) with ∀(c, p).This observation that RC derives the rules in R extends: the eight rules in

Figure 7.1 are derivable in RC(tr). On the right above, we have a derivation of(ρ3). The derivations of all the other rules in Figure 7.1 are similar.

We should also add that RC(tr) derives versions of (ρ1)–(ρ8) in which thesubject noun phrase might contain a c-term.

Theorem 7.3 For Γ ∪ {ϕ} ⊆ RC, Γ |= ϕ on transitive models iff Γ ` ϕ inRC(tr).

The rest of this section is devoted to the proof of Theorem 7.3. It is based onthe proof of of Theorem 6.1, the same result for RC but without any transitivityassumptions in the semantics. Fortunately, the same proof idea works. (Thiscontrasts with our work earlier in the chapter; the constructions in Section 7.1 werenew, and they made for a much longer argument than what we shall see below.)We need only show that every consistent set Γ in RC(tr) is satisfiable on a transitivemodel. Also, by Lemma 5.9, we may assume that Γ is RC(tr)-complete; that is,every sentence or its negation belongs to Γ.

We construct a model M and prove that it satisfies Γ. Since we need the detailson this, we must review the construction. First, let C+ be the set of positive c-terms. Then we define M by setting:

M = {〈c1, c2, Q〉 ∈ C+ ×C+ × {∀,∃} : Γ ` ∃(c1, c2)}(7.4)[[p]] = {〈c1, c2, Q〉 ∈M : Γ ` ∀(c1, p) or Γ ` ∀(c2, p)}(7.5)

(7.6)

and 〈c1, c2, Q1〉[[r]]〈d1, d2, Q2〉 if and only if either:(a): for some i, j, and q ∈ P, Γ ` ∀(ci,∀(q, r)) and Γ ` ∀(dj , q); or(b): Q2 = ∃, and for some i and q ∈ P, d1 = d2 = q, and Γ ` ∀(ci,∃(q, r)).

One then checks easily that A is non-empty (using (W)).

Lemma 7.4 For all c ∈ C+,

[[c]] = {〈d1, d2, Q〉 ∈M : either Γ ` ∀(d1, c), or Γ ` ∀(d2, c)}.

Lemma 7.5 M |= Γ.

Lemma 7.6 The interpretation of each binary atom r is transitive.

Proof Assume that

〈b1, b2, Q1〉 [[r]] 〈c1, c2, Q2〉 [[r]] 〈d1, d2, Q3〉.We shall use several times the fact that since 〈c1, c2, Q2〉 belongs to the model,Γ ` ∃(c1, c2). We have four cases.


Case 1: for some i, j, k, l, and q1, q2 ∈ P, Γ ` ∀(bi,∀(q1, r)), Γ ` ∀(cj , q1),Γ ` ∀(ck,∀(q2, r)), and Γ ` ∀(dl,∀(q2, r)). We have Γ ` ∃(cj , ck) also. Now wehave derivation from Γ:

....∀(bi,∀(q1, r))

....∀(ck,∀(q2, r))

....∀(cj , q1)

....∃(cj , ck)

∃(q1, ck)∃(q1,∀(q2, r))

∀(∀(q1, r),∀(q2, r))(tr3)

∀(bi,∀(q2, r))(B)

And now we see that 〈b1, b2, Q1〉[[r]]〈d1, d2, Q3〉, using alternative (a) in the definitionof [[r]].

Case 2: for some i, j, and q1 ∈ P, Γ ` ∀(bi,∀(q1, r)) and Γ ` ∀(cj , q1); Q3 = ∃,and for some k and q2 ∈ P, d1 = d2 = q2, and Γ ` ∀(ck,∃(q2, r)). This time,we have Γ ` ∃(q1, ck), and by a derivation similar to what we saw in Case 1, Γ `∀(bi,∃(q2, r)). In this case, alternative (b) shows that 〈b1, b2, Q1〉[[r]]〈d1, d2, Q3〉.

Case 3: Q2 = ∃, and for some i and q1 ∈ P, c1 = c2 = q1, and Γ `∀(bi,∃(q1, r)); and also for some j, k, and q2 ∈ P, Γ ` ∀(cj ,∀(q2, r)) and Γ `∀(dk, q2). Thus cj at the end is the same as q1. The proof system now shows thatΓ ` ∀(bi,∀(q2, r)). We again finish the case with an application of (a).

Case 4: Q2 = ∃, and for some i and q1 ∈ P, c1 = c2 = q1, and Γ `∀(bi,∃(q1, r)); and also Q3 = ∃, and for some j and q2 ∈ P, d1 = d2 = q2,and Γ ` ∀(cl,∃(q2, r)). This time we see that Γ ` ∀(bi,∃(q2, r)), and we use (b) tosee that 〈b1, b2, Q1〉[[r]]〈d1, d2, Q3〉.

This completes the proof. �

And with the work of Lemma 7.6 done, we have also completed the proof ofTheorem 7.3.

7.2.2. Irreflexivity: the system R∗(tr, irr). We now continue the study ofthe syllogistic logic of comparative adjectives by adding a requirement to the se-mantics that the interpretation of each binary atom be irreflexive. Notice that assoon as we add this requirement, the logic no longer has the finite model property.Specifically, ∀(p,∃(p, r)) is satisfiable, but not by any finite (transitive and irreflex-ive) model. (In Section 7.2.3 below, we discuss what happens if we want to restrictto transitive, irreflexive, and finite models.)

The additional requirement also results in the soundness of the following ir-reflexivity rule

∀(p, ∃(p, r))(Irr)

This looks like a very weak consequence of irreflexivity, so perhaps it will be sur-prising that it leads to a complete system. The point, perhaps, is that withoutvariables there is not much that one can actually do with irreflexivity besides (Irr)and its consequences.

Let R∗(tr, irr) be the logical system whose rules are those of Figures 6.1 and 7.2,and also (Irr).

Theorem 7.7 For Γ ∪ {ϕ} ⊆ RC, Γ |= ϕ on transitive, irreflexive models iffΓ ` ϕ in R∗(tr, irr).


Toward this end, let Γ be complete and consistent in R∗(tr, irr); we shall definea model M = M(Γ). We continue to denote the set of positive c-terms by C+. LetN be the set of natural numbers. Let

(7.7)M = {〈c1, c2, n〉 ∈ C+ ×C+ ×N : Γ ` ∃(c1, c2),

and if n ≥ 0, then c1 = c2, and c1 is a unary atom}[[p]] = {〈c1, c2, n〉 ∈M : Γ ` ∀(c1, p) or Γ ` ∀(c2, p)}

Additional notation. Before going further we need some extra relations on C+.We write c w d if Γ ` ∃(c, c) and also Γ ` ∃(d, d), d is a unary atom, and Γ `∀(c,∃(d, r)). The relation w is not reflexive, but it is transitive. We write c ≡ dfor c v d v c. We also write c = d if c w d and not c ≡ d. We write c ⇒ d if ifΓ ` ∃(c, c) and also Γ ` ∃(d, d), d is a unary atom, and Γ ` ∀(c,∀(d, r)). Noticethat if c⇒ d and d is an atom, then c = d. (This uses (ρ7) and (Irr).) And recallthe convention to write c ≤ d for Γ ` ∀(c, d).

The interpretation of binary atoms. We say 〈c1, c2, n〉[[r]]〈d1, d2,m〉 if and onlyif one of the conditions below holds:

(a): for some i and j and some unary atom p, ci ⇒ p and dj ≤ p.(b): d1 = d2 is a unary atom p, and for some i, ci = p and m > 0.(c): d1 = d2 is a unary atom p, and for some i, ci ≡ p and m > n.

Note that the conditions above are not exclusive.

Example 7.8 Suppose Γ has ∃(p, p), ∃(q, q), ∃(p, q), ∀(p,∃(q, r)), ∀(q,∃(p, r)),∃(p,∃(q, r)), ∀(∃(p, r), p), ∃(∃(q, r), p), and ∃(∃(p, r), q). (We have again left offsome formulas which follow from these, such as ∀(p,∃(p, r)).) To save on somenotation, we’ll abbreviate ∃(p, r) by c and ∃(q, r) by d. Then M contains 〈p, p, n〉and 〈q, q, n〉 for n ≥ 0, and to save on notation, we write these as pn and qn. M alsocontains 〈p, c, 0〉, 〈c, c, 0〉, 〈q, d, 0〉, 〈d, d, 0〉, and 〈p, q, 0〉. Then [[p]] contains 〈p, c, 0〉,〈p, q, 0〉, and all pn; [[q]] is similar. Moreover, [[r]] makes M look as follows:

〈c, c, 0〉 ))〈p, c, 0〉 // p0 //

@@@@@@ p1 //

AAAAAAA · · · pn //

""EEEEEEE pn+1 · · ·〈p, q, 0〉

55jjjj

))TTTT〈d, d, 0〉 66〈q, d, 0〉 // q0 //

>>~~~~~~q1 //

>>}}}}}}}· · · qn //

<<yyyyyyyqn+1 · · ·

The relation in the model is actually the transitive closure of the arrows above.

Lemma 7.9 The interpretation [[r]] of each binary atom r is transitive.

Proof Assume that

〈b1, b2, n1〉 [[r]] 〈c1, c2, n2〉 [[r]] 〈d1, d2, n3〉,and write these three triples as b, c, and c, respectively.

Case 1: b[[r]]c via (a), and c[[r]]d via (a). Let i, j, p, and q be such that bi ⇒ p,cj ≤ p, ck ⇒ q, and dl ≤ q. See Case 1 of Lemma 7.6.

Case 2: b[[r]]c via (a), and c[[r]]d via (b). This time, d1 = d2 is an atom, andwe have i, j, k and p so that bi ⇒ p, cj ≤ p, ck = d. Also, m3 > 0. The derivation


below shows that bi w d:

....∀(bi,∀(p, r))

....∀(ck,∃(d, r))

....∃(cj , ck)

....∀(cj , p)

∃(p, ck)∃(p,∃(d, r))

∀(bi,∃(d, r))(ρ2)

(The rule we are quoting as (ρ2)) here is more general than the rule we used earlier inconnection with RC

¯in that we are allowing the subject noun phrase to be complex.

But the form above is again derivable in two steps using (tr 2) and (B). The samecomments apply below in several other cases of this proof.) We claim now thatbi < d. (For suppose that d w bi. Since bi ⇒ p and ∃(p, ck) we see that bi w ck. Bytransitivity, we have d w ck. This contradicts ck = d.) Recalling that m3 > 0, wesee that b[[r]]d via alternative (b) in the definition of [[r]].

Case 3: b[[r]]c via (a), and c[[r]]d via (c). The argument is as in Case 2.

Case 4: b[[r]]c via (b), and c[[r]]d via (a). Write c for c1 = c2; this is an atom.And the same holds for d. We have i and j and p such that bi = c, c⇒ p, p ≤ dj .(Also, n2 > 0, but this is irrelevant.) Using (ρ4), we see that bi w p. It followsfrom this that b[[r]]d via alternative (a) in the definition of [[r]].

Case 5: b[[r]]c via (b), and c[[r]]d via (b). Write c for c1 = c2; this is an atom.And the same holds for d. For some i, we have bi = c = d. Using (ρ3), we see thatbi = d. Also, m3 > 0. It now follows that b[[r]]d via alternative (b) in the definitionof [[r]].

Case 6: b[[r]]c via (b), and c[[r]]d via (c). The argument is as in Case 5. (Theonly difference is that this time we have bi = c = d. But this is sufficient to implybi = d.)

Case 7: b[[r]]c via (c), and c[[r]]d via (a). The argument is as in Case 4.

Case 8: b[[r]]c via (c), and c[[r]]d via (b). The argument is as in Case 5.

Case 9: b[[r]]c via (c), and c[[r]]d via (c). As in Case 5, we write c and d forthe evident atoms. For some i, we have bi ≡ c ≡ d, and we also have n1 > n2 >n3. Since the order on natural numbers is transitive, n1 > n3. Thus b[[r]]d viaalternative (c) in the definition of [[r]].

This concludes the proof. �

Lemma 7.10 The interpretation [[r]] of each binary atom r is irreflexive.

Proof Suppose towards a contradiction that

(7.8) 〈c1, c2, n〉[[r]]〈c1, c2, n〉.We have Γ ` ∃(ci, cj) for all i and j. The reason for (7.8) cannot be (a) in thedefinition of [[r]] because if it were, we would have Γ ` ∀(ci, (∀(cj , r)) for some i andj. With the irreflexivity rule (Irr), this would contradict the consistency of Γ.

Suppose that the reason for (7.8) were (b). Then c1 = c2 is a unary atom, sayc, and c = c. But this is impossible: if c w c, then c ≡ c.

Finally, it is impossible that (c) be the reason for (7.8), since n 6> n. �


Lemmas 7.9 and 7.10 show that M is a transitive and irreflexive model for theinterpretation of our fragment. Recall that our completeness proof for the logicR∗(tr, irr) on the intended class reduces to showing that every complete, consistentset Γ has a transitive and irreflexive model. The next two results show that M

serves as such a model.

Lemma 7.11 For all c ∈ C+, [[c]] = {〈d1, d2, Q〉 ∈M : either Γ ` ∀(d1, c), or Γ `∀(d2, c)}.

Lemma 7.12 M |= Γ.

The proofs of Lemmas 7.11 and 7.12 are nearly the same as that of Lemmas5.3 and 5.4 in [29], so we shall not reproduce them. The main notational changeis that where [29] mentions an element of the model such as 〈c, c,∀〉, we would usethe element 〈c, c, 0〉 in this chapter.

The foregoing work proves Theorem 7.7.

7.2.3. Finite models. The requirement of transitivity alone does not allowus to state an axiom of infinity: the proof of Theorem 7.3 shows that every finiteconsistent set in RC(tr) has a finite model. However, adding to transitivity thefurther requirement of irreflexivity leads to theories with only infinite models, suchas ∃(p, p), ∀(p,∃(p, r)).

Therefore, if we want to axiomatize the finite models, then we must also addsomething to the logic. Here is one way to do that, in the form of an axiom:

∃(p, p)∃(p,∀(p, r))

(Fin)

Let RC(tr, irr, fin) be the logic which adds this new axiom to R∗(tr, irr).

Theorem 7.13 For Γ ∪ {ϕ} ⊆ RC, Γ |= ϕ on transitive, irreflexive and finitemodels iff Γ ` ϕ in RC(tr, irr, fin).

Proof We use the same construction as in Theorem 7.7 but with one change.The definition of the model M began in (7.7), and of course the structure is infinite.To keep straight the old and new models, we’ll now define a model to be called N.The domain of the model is the set N given by

(7.9)N = {〈c1, c2, n〉 ∈ C+ ×C+ × {0, 1} : Γ ` ∃(c1, c2),

and if n ≥ 0, then c1 = c2, and c1 is a unary atom }The rest of the structure is defined in the same way as with M: the unary atomsare interpreted as in (7.7), and the binary ones from the three alternatives (a)–(c)stated before Lemma 7.9. This way, N is a submodel of M, so it is transitive andirreflexive; clearly it is also finite; this is the point of changing N to {0, 1}. Theonly point to check is that the analog of Lemma 7.11 goes through. And for this,the only point to check is that for c a positive c-term of the form ∃(p, r),

[[c]] = {〈d1, d2, n〉 ∈ N : either Γ ` ∀(d1, c), or Γ ` ∀(d2, c)}.And here, only half needs to be checked. Suppose that 〈d1, d2, n〉 ∈ N and Γ `∀(d1,∃(p, r)). We need to see that 〈d1, d2, n〉 ∈ [[∃(p, r)]]. That is, we need somec ∈ [[p]] such that 〈d1, d2, n〉[[r]]c. We claim that c = 〈p, p, 1〉 works using alternative(b) in the definition of [[r]]. We have d w p, and we only need to see that theconverse assertion p w d does not hold. Suppose toward a contradiction that it did.


Then d would be a unary atom, and also we would have ∀(p,∃(p, r)) using (ρ3).But this directly contradicts the finiteness axiom.

The remaining verifications are as above. �

Source for this chapter. The material in this chapter comes from Moss [20].

CHAPTER 8

Logic Beyond the Aristotle Border

There are only two languages in this chapter (actually they are families oflanguages parameterized by sets of basic symbols): the language RC† which wehave seen already, and an extension L(adj) studied in Section 8.2. We reformulateRC† a bit, adding constants and also being allowing for recursive constructs. Toavoid confusion, we do not speak of RC† but instead call the language of this chapterL. L is based on three pairwise disjoint sets called P, R, and K. These are calledunary atoms, binary atoms, and constant symbols.

8.1. Fitch-style Proof System for RC†

We review the syntax of L in Figure 8.1. Sentences are built from constantsymbols, unary and binary atoms using an involutive symbol for negation, a forma-tion of set terms, and also a form of quantification. The second column indicatesthe variables that we shall use in order to refer to the objects of the various syn-tactic categories. Because the syntax is not standard, it will be worthwhile to gothrough it slowly and to provide glosses in English for expressions of various types.

One might think of the constant symbols as proper names such as John andMary. The unary atoms may be glossed as one-place predictates such as boys, girls,etc. And the relation symbols correspond to transitive verbs (that is, verbs whichtake a direct object) such as likes, sees, etc. They also correspond to comparativeadjective phrases such as is bigger than. (However, later on in Section 8.2, weintroduce a new syntactic primitive for the adjectives.)

Unary atoms appear to be one-place relation symbols, especially because weshall form sentences of the form p(j). However, we do not have sentences p(x), sincewe have no variables at this point in the first place. Similar remarks apply to binaryatoms and two-place relation symbols. So we chose to change the terminology fromrelation symbols to atoms.

We form unary and binary literals using the bar notation. We think of thisas expressing classical negation. So we take it to be involutive, so that p = p ands = s.

The set terms in this language are the only recursive construct. If b is read asboys and s as sees, then one should read ∀(b, s) as sees all boys, and ∃(b, s) as seessome boys. Hence these set terms correspond to simple verb phrases. We also allownegation on the atoms, so we have ∀(b, s); this can be read as fails to see all boys, or(better) sees no boys or doesn’t see any boys. We also have ∃(b, s), fails to see someboys. But the recursion allows us to embed set terms, and so we have set terms like

∃(∀(∀(b, s), h), a)

which may be taken to symbolize a verb phrase such as admires someone who hateseveryone who does not see any boy.

89

90 8. LOGIC BEYOND THE ARISTOTLE BORDER

Expression Variables Syntaxunary atom p, qbinary atom sconstant j, kunary literal l p | pbinary literal r s | sset term b, c, d l | ∃(c, r) | ∀(c, r)sentence ϕ, ψ ∀(c, d) | ∃(c, d) | c(j) | r(j, k)

Figure 8.1. Syntax of sentences of L.

We should note that the relative clauses which can be obtained in this way areall “missing the subject”, never “missing the object”. The language is too poor toexpress predicates like λx.all boys see x.

The main sentences in the language are of the form ∀(b, c) and ∃(b, c); they canbe read as statements of the inclusion of one set term extension in another, andof the non-empty intersection. We also have sentences using the constants, suchas ∀(g, s)(m), corresponding to Mary sees all girls. But we are not able to say allgirls see Mary; the syntax again is too weak. (However, in our Conclusion we shallsee how to extend our system to handle this.) This weakness in expressive powercorresponds to a less complex decidability result, as we shall see.

Semantics. A model (for this language L) is a pair M = 〈M, [[ ]]〉, where M is anon-empty set, [[p]] ⊆M for all p ∈ P, [[r]] ⊆M2 for all r ∈ R, and [[j]] ∈M for allj ∈ K.

Given a model M, we extend the interpretation function [[ ]] to the rest of thelanguage by setting

[[p]] = M \ [[p]][[r]] = M2 \ [[r]][[∃(l, t)]] = {x ∈M : for some y such that [[l]](y), [[t]](x, y)}[[∀(l, t)]] = {x ∈M : for all y such that [[l]](y), [[t]](x, y)}

We define the truth relation |= between models and sentences by:

M |= ∀(c, d) iff [[c]] ⊆ [[d]]M |= ∃(c, d) iff [[c]] ∩ [[d]] 6= ∅M |= c(j) iff [[c]]([[j]])M |= r(j, k) iff [[r]]([[j]], [[k]])

If Γ is a set of formulas, we write M |= Γ if for all ϕ ∈ Γ, M |= ϕ.

Example 8.1 For example, look back to Example 5.2. This was the modelwith M = {w, x, y, z}, [[p]] = {w, x, y}, and with [[s]] shown below

w //

@@@@@@@@ x

~~~~~~~~~~

��y

OO

oo // zzz

8.1. FITCH-STYLE PROOF SYSTEM FOR RC† 91

Note at this point that [[∃(∀(p, s), s)]] = ∅. We also set [[j]] = w and [[k]] = x. Weget additional sentences true in M such as s(j, k), s(k, j), and ∃(p, s)(k).

Here is a point that will be important later. For all terms c, M |= c(j) iffM |= c(k). (The easiest way to check this is to show that for all set terms c, [[c]] isone of the following four sets: ∅,M, {w, x, y}, or {z}.) However, M |= s(j, k) andM |= s(k, j).

The satisfiability problem for the language is decidable for a very easy reason:the language L translates to the two-variable fragment FO2 of first-order logic. (Weshall see this shortly.) Thus we have the finite model property (by Mortimer [17])and decidability of satisfiability in non-deterministic exponential time (Gradel etal [7]). It might therefore be interesting to ask whether the smaller fragment L isof a lower complexity. As it happens, it is. Pratt-Hartmann [25] showed that thesatisfiability problem for a certain fragment E2 of FO2 can be decided in ExpTimein the length of the input Γ, and his fragment was essentially the same as the onein this chapter.

The bar notation. We have already seen that our unary and binary atoms comewith negative forms. We extend this notation to all sentences in the following ways:p = p, s = s, ∃(l, r) = ∀(l, r), ∀(l, r) = ∃(l, r), ∀(c, d) = ∃(c, d), ∃(c, d) = ∀(c, d),c(j) = c(j), and r(j, k) = r(j, k).

Translation of the syllogistic into L. We indicate briefly a few translations toorient the reader. First, the classical syllogistic translates into L:

All p are q 7→ ∀(p, q)Some p are q 7→ ∃(p, q)

No p are q 7→ ∀(p, q)Some p aren’t q 7→ ∃(p, q)

We can also translate L to FO2, the fragment of first order logic using only thevariables x and w. We do this by mapping the set terms two ways, called c 7→ ϕc,xand c 7→ ϕc,y. Here are the recursion equations for c 7→ ϕc,x:

p 7→ P (x)p 7→ ¬P (x)

∀(c, r) 7→ (∀y)(ϕc,y(y)→ r(x, y))∃(c, r) 7→ (∃y)(ϕc,y(y) ∧ r(x, y))

The equations for c 7→ ϕc,y are similar. Then the translation of the sentences intoFO2 follows easily.

8.1.1. Proof System. We present our system in natural-deduction style inFigure 8.3. It makes use of introduction and elimination rules, and more criticallyof variables. For a textbook account of a proof system for first-order logic presentedin this way, see van Dalen [36].

General sentences. in this fragment are what usually are called formulas. Weprefer to change the standard terminology to make the point that here, sentencesare not built from formulas by quantification. In fact, sentences in our sense do nothave variable occurrences. But general sentences do include variables. They areonly used in our proof theory.

The syntax of general sentences is given in Figure 8.2. What we are callingindividual terms are just variables and constant symbols. (There are no functionsymbols here.) Using terms allows us to shorten the statements of our rules, butthis is the only reason to have terms.

An additional note: we don’t need general sentences of the form r(j, x) orr(x, j). In larger fragments, we would expect to see general sentences of theseforms, but our proof theory will not need these.


Expression Variables Syntaxindividual variable x, yindividual term t, u x | jgeneral sentence α ϕ | c(x) | r(x, y) | ⊥

Figure 8.2. Syntax of general sentences of L, with ϕ ranging over sentences.

The bar notation, again. We have already seen the bar notation c for set termsc, and ϕ for sentences ϕ. We extend this to formulas b(x) = b(x), r(x, y) = r(x, y).We technically have a general sentence ⊥, but this plays no role in the proof theory.

We write Γ ` ϕ if there is a proof tree conforming to the rules of the systemwith root labeled ϕ and whose axioms are labeled by elements of Γ. (Frequentlywe shall be sloppy about the labeling and just speak, e.g, of the root as if it were asentence instead of being labeled by one.) Instead of giving a precise definition here,we shall content ourselves with a series of examples in Section 8.1.2 just below.

The system has two rules called (∀E), one for deriving general sentences of theform c(x) or c(j), and one for deriving general sentences r(x, y) or r(j, k). (Otherrules are doubled as well, of course.) It surely looks like these should be unified,and the system would of course be more elegant if they were. But given the waywe are presenting the syntax, there is no way to do this. That is, we do not havea concept of substitution, and so rules like (∀E) cannot be formulated in the usualway. Returning to the two rules with the same name, we could have chosen touse different names, say (∀E1) and (∀E2). But the result would have been a morecluttered notation, and it is always clear from context which rule is being used.

Although we are speaking of trees, we don’t distinguish left from right. This isespecially the case with the (∃E) rules, where the canceled hypotheses may occurin either order.

Side Conditions. As with every natural deduction system using variables, thereare some side conditions which are needed in order to have a sound system.

In (∀I), x must not occur free in any uncanceled hypothesis. For example, inthe version whose root is ∀(c, d), one must cancel all occurrences of c(x) in theleaves, and x must not appear free in any other leaf.

In (∃E), the variable x must not occur free in the conclusion α or in anyuncanceled hypothesis in the subderivation of α.

In contrast to usual first-order natural deduction systems, there are no sideconditions on the rules (∀E) and (∃I). The usual side conditions are phrased interms of concepts such as free substitution, and the syntax here has no substitutionto begin with. To be sure on this point, one should check the soundness result ofLemma 8.6.

Formal proofs in the Fitch style. The proof system in this chapter is presentedin a standard Gentzen-style format. But it may easily be re-formatted to lookmore like a Fitch system, as we shall see in Example 8.3 and Figure 8.5. Theseexamples might give the impression that we have merely re-presented Fitch-stylenatural deduction proofs. The difference is that our syntax is not a special caseof the syntax of first-order logic. Corresponding to this, our proof rules our rather


c(t) ∀(c, d)d(t)

∀Ec(u) ∀(c, r)(t)

r(t, u)∀E

c(t) d(t)∃(c, d)

∃Ir(t, u) c(u)∃(c, r)(t) ∃I

[c(x)]....d(x)∀(c, d)

∀I

[c(x)]....r(t, x)∀(c, r)(t) ∀I

∃(c, d)

[c(x)] [d(x)]....α

α ∃E∃(c, r)(t)

[c(x)] [r(t, x)]....α

α ∃E

α α⊥ ⊥I

[ϕ]....⊥ϕ RAA

Figure 8.3. Proof rules. See the text for the side conditions inthe (∀I) and (∃E) rules.

restrictive, and the system cannot be used for much of anything beyond the languageL. However, the fact that our Fitch-style proofs look like familiar formal proofs isa virtue: for example, it means that one could teach logic using this material.

8.1.2. Examples. We present a few examples of the proof system at work,along with comments pertaining to the side conditions. Many of these are takenfrom the proof system R∗ for the language RC of [29]. That system R∗ is amongthe strongest of the known syllogistic systems, and so it is of interest to check thecurrent proof system is at least as strong.

Example 8.2 Here is a proof of the classical syllogism Darii : ∀(b, d),∃(c, b) `∃(c, d). First, in Fitch-style:

1 ∀(b, c) hyp

2 ∃(b, d) hyp

3 b(x) ∃E, 2

4 d(x) ∃E, 2

5 c(x) ∀E, 1, 3

6 ∃(c, d) ∃I, 4, 5


1 ∀(c, d) hyp

2 x ∃(c, r)(x) hyp

3 c(y) ∃E, 2

4 r(x, y) ∃E, 2

5 d(y) ∀E, 1, 3

6 ∃(d, r)(x) ∃I, 4, 5

7 ∀(∃(c, r),∃(d, r)) ∀I, 1–6

[∃(c, r)(x)]2[r(x, y)]1

[c(y)]1 ∀(c, d)d(y)

∀E

∃(d, r)(x)∃I

∃(d, r)(x) ∃E1

∀(∃(c, r),∃(d, r)) ∀I2

Figure 8.4. Derivations in Example 8.3

and then in natural deduction:

∃(c, b)

[b(x)]1 ∀(b, d)d(x)

∀E[c(x)]1

∃(c, d)∃I

∃(c, d) ∃E1

Example 8.3 Next we study a principle called (K) in [29]. Intuitively, ifall watches are expensive items, then everyone who owns a watch owns an expensiveitem. The formal statement in our language is ∀(c, d) ` ∀(∃(c, r),∃(d, r)). SeeFigure 8.4. We present a Fitch-style proof on the left and the corresponding one inour formalism on the right. One aspect of the Fitch-style system is that (∃E) givestwo lines; see lines 3 and 4 on the left in Figure 8.4.

Example 8.4 Here is an example of a derivation using (RAA). It shows∀(c, c) ` ∀(d,∀(c, r)).

[d(x)]2

[c(y)]1 ∀(c, c)c(y)

∀E[c(y)]1

⊥ ⊥I

r(x, y)RAA

∀(c, r)(x) ∀I1

∀(d,∀(c, r)) ∀I2

Example 8.5 Here is a statement of the rule of proof by cases: If Γ+ϕ ` ψ andΓ +ϕ ` ψ, then Γ ` ψ. (Here and below, Γ +ϕ denotes Γ∪{ϕ}.) Instead of givinga derivation, we only indicate the ideas. Since Γ + ϕ ` ψ, we have Γ + ϕ + ψ ` ⊥using (⊥I). From this and (RAA), Γ, ψ ` ϕ. Take a derivation showing Γ +ϕ ` ψ,and replace the labeled ϕ with derivations from Γ +ψ. We thus see that Γ +ψ ` ψ.


Using (⊥I), Γ + ψ ` ⊥. And then using (RAA) again, Γ ` ψ. (This point isfrom [29].)

8.1.3. Soundness. Before presenting a soundness result, it might be good tosee an improper derivation. Here is one, purporting to infer some men see somemen from some men see some women:

∃(m,∃(w, s))[∃(w, s)(x)]2

[s(x, x)]1 [m(x)]2

∃(s,m)(x)∃I

[m(x)]2

∃(m,∃(m, s)) ∃I

∃(m,∃(m, s)) ∃E1

∃(m,∃(m, s)) ∃E2

The specific problem here is that when [s(x, x)] is withdrawn in the application of∃I1, the variable x is free in the as-yet-uncanceled leaves labeled m(x).

To state a result pertaining to the soundness of our system, we need to definethe truth value of a general sentence under a variable assignment. First, a variableassignment in a model M is a function v : V → M , where V is the set of variablesymbols and M is the universe of M. We need to define M |= α[v] for generalsentences α. If α is a sentence, then M |= α[v] iff M |= α in our earlier sense. If α isb(x), then M |= α[v] iff [[b]](v(x)). If α is r(x, y), then M |= α[v] iff [[r]](v(x), v(y)).If α is ⊥, then M 6|= ⊥ for all models M.

Lemma 8.6 Let Π be any proof tree for this fragment all of whose nodes arelabeled with L-formulas, let ϕ be the root of Π, let M be a structure, let v : X →M bea variable assignment, and assume that for all uncanceled leaves ψ of Π, M |= ψ[v].Then also M |= ϕ[v].

Proof By induction on Π. We shall only go into details concerning two cases.First, consider the case when the root of Π is

∃(c, r)(t)

[c(x)] [r(t, x)]....α

α ∃E

To simplify matters further, let us assume that t is a variable. Let v be a variableassignment making true all of the leaves of the tree, except possibly c(x) and r(t, x).By induction hypothesis, M |= ∃(c, r)(t)[v]. Let a ∈ A witness this assertion. In theobvious notation, [[c]](a) and [[r]](tM,v, a). Let w be the same variable assignment asv, except that w(x) = a. Then since x is not free in any leaves except those labeledc(x) and r(t, x), we have M |= ψ[w] for all those ψ. And so M |= α[w], using theinduction hypothesis applied to the subtree on the right. And since x is not free inthe conclusion α, we also have M |= α[v], as desired.

Second, let us consider the case when the root isc(y) ∀(c, r)(x)

r(x, y)∀E

(That is, we are considering an instance of (∀E) when the terms t and u are vari-ables.) The variables x and y might well be the same. Let M be a structure, and vbe a variable assignment making true the leaves of the tree. By induction hypoth-esis, [[c]](v(y)) and also [[r]](v(x),m)) for all m ∈ [[c]]. In particular, [[r]](v(x), v(y)).

The remaining cases are similar. � �


8.1.4. The Henkin Property. The completeness of the logic parallels theHenkin-style completeness result for first-order logic. Given a consistent theory Γ,we get a model of Γ in the following way: (1) take the underlying language L,add constant symbols to the language to witness existential sentences; (2) extendΓ to a maximal consistent set in the larger language; and then (3) use the set ofconstant symbols as the carrier of a model in a canonical way. In the setting of thischapter, the work is in some ways easier than in the standard setting, and in someways harder. There are more details to check, since the language has more basicconstructs. But one doesn’t need to take a quotient by equivalence classes, and inother ways the work here is easier.

Given two languages L and L′, we say that L′ ⊇ L if every symbol (of anytype) in L is also a symbol (of the same type) in L′. In this chapter, the main caseis when P(L) = P(L′), R(L) = R(L′), and K(L) ⊆ K(L′); that is, L′ arises byadding constants to L.

A theory in a language is just a set of sentences in it. Given a theory Γ in alanguage L, and a theory Γ∗ in an extension L′ ⊇ L, we say that Γ∗ is a conservativeextension of Γ if for every ϕ ∈ L, if Γ∗ ` ϕ, then Γ ` ϕ.

Lemma 8.7 Let Γ be a consistent L-theory, and let j /∈ K(L).(1) If ∃(c, d) ∈ Γ, then Γ + c(j) + d(j) is a conservative extension of Γ.(2) If ∃(c, r)(j) ∈ Γ, then Γ + r(j, k) + c(k) is a conservative extension of Γ.

Proof For (1), suppose that Γ contains ∃(c, d) and that Γ + c(j) + d(j) ` ϕ.Let Π be a derivation tree. Replace the constant j by an individual variable xwhich does not occur in Π. The result is still a derivation tree, except that theleaves are not labeled by sentences. (The reason is that our proof system has norules specifically for constants, only for terms which might be constants and alsomight be individual variables.) Call the resulting tree Π′. Now the following prooftree shows that Γ ` ϕ:

∃(c, d)

[c(x)] [d(x)]....ϕ

ϕ ∃E

The subtree on the right is Π′. The point is that the occurrences of c(x) and d(x)have been canceled by the use of ∃E at the root.

This completes the proof of the first assertion, and the proof of the second issimilar. � �

Definition 8.8 An L-theory Γ has the Henkin property if the following hold:(1) If ∃(c, d) ∈ Γ, then for some constant j, c(j) and d(j) belong to Γ.(2) If r is a literal of L and ∃(c, r)(j) ∈ Γ, then for some constant k, r(j, k)

and c(k) belong to Γ.

Lemma 8.9 Let Γ be a consistent L-theory. Then there is some L∗ ⊃ L andsome L∗-theory Γ∗ such that Γ∗ is a maximal consistent theory with the Henkinproperty. Moreover, if s ∈ R(L), j ∈ K(L∗) and k ∈ K(L), and if s(j, k) ∈ Γ∗,then j ∈ K(L).

Proof This is a routine argument, using Lemma 8.7. One dovetails the addi-tion of constants which is needed for the Henkin property together with the additionof sentences needed to insure maximal consistency. The formal details would use


Lemma 8.7 for steps of the first kind, and for the second kind we need to know thatif Γ is consistent, then for all ϕ, either Γ + ϕ or Γ + ϕ is consistent. This followsfrom the derivable rule of proof by cases; see Example 8.5 in Section 8.1.2. � �

The last point in Lemma 8.9 states a technical property that will be useful inSection 8.2.1.

It might be worthwhile noting that the extensions produced by Lemma 8.9 addinfinitely many constants to the language.

8.1.5. Completeness Via Canonical Models. In this section, fix a lan-guage L and a maximal consistent Henkin L-theory Γ. We construct a canonicalmodel M = M(Γ) as follows: M = K(L); [[p]](j) iff p(j) ∈ Γ; [[s]](j, k) iff s(j, k) ∈ Γ;and [[j]] = j. That is, we take the constant symbols of the language to be the pointsof the model, and the interpretations of the atoms are the natural ones. Eachconstant symbol is interpreted by itself.

Lemma 8.10 For all set terms c, [[c]] = {j : c(j) ∈ Γ}.

Proof By induction on c. The base case of unary atoms p is by definition ofM.

Before we turn to the induction proper, here is a preliminary point. Assumingthat [[c]] = {j : c(j) ∈ Γ}, we check that [[c]] = {j : c(j) ∈ Γ}:

j ∈ [[c]] iff j /∈ [[c]] iff c(j) /∈ Γ iff c(j) ∈ Γ.

The last point uses the maximal consistency of Γ.Turning to the inductive steps, assume our result for c; we establish it for ∀(c, s)

and ∃(c, s); it then follows from the preliminary point that we have the same factfor ∀(c, s) and ∃(c, s).

Let j ∈ [[∀(c, s)]]. We claim that ∀(c, s)(j) ∈ Γ. For if not, then ∃(c, s)(j) ∈ Γ.By the Henkin property, let k be such that Γ contains c(k) and s(j, k). By theinduction hypothesis, k ∈ [[c]], and by the definition of M, [[s]](j, k) is false. Thusj /∈ [[∀(c, s)]]. This is a contradiction.

In the other direction, assume that ∀(c, s)(j) ∈ Γ; this time we claim thatj ∈ [[∀(c, s)]]. Let k ∈ [[c]]. By induction hypothesis, Γ contains c(k). By (∀E),we see that Γ ` s(j, k). Hence Γ contains s(j, k). So in M, [[s]](j, k). Since k wasarbitrary, we see that indeed j ∈ [[∀(c, s)]].

The other induction step is for ∃(c, s). Let j ∈ [[∃(c, s)]]. We thus have somek ∈ [[c]] such that [[s]](j, k). That is, s(j, k) ∈ Γ. Using (∃I), we have Γ ` ∃(c, s)(j);from this we see that ∃(c, s)(j) ∈ Γ, as desired.

Finally, assume that ∃(c, s)(j) ∈ Γ. By the Henkin condition, let k be suchthat Γ contains c(k) and s(j, k). Using the derivation above, we have the desiredconclusion that j ∈ [[∃(c, s)]].

This concludes the proof. � �

Lemma 8.11 M |= Γ.

Proof We check the sentence types in turn. Throughout the proof, we shalluse Lemma 8.10 without mention.

First, let Γ contain the sentence ∀(c, d). Let j ∈ [[c]], so that c(j) ∈ Γ. We haved(j) ∈ Γ using (∀E). This for all j shows that M |= ∀(c, d).

Second, let ∃(c, d) ∈ Γ. By the Henkin condition, let j be such that both c(j)and d(j) belong to Γ. This element j shows that [[c]]∩ [[d]] 6= ∅. That is, M |= ∃(c, d).


Continuing, consider a sentence c(j) ∈ Γ. Then j ∈ [[c]], so that M |= c(j).Finally, the case of sentences r(j, k) ∈ Γ is immediate from the structure of the

model. � �

Theorem 8.12 If Γ |= ϕ, then Γ ` ϕ.

Proof We rehearse the standard argument. Due to the classical negation, weneed only show that consistent sets Γ are satisfiable. Let L be the language of Γ,Let L′ ⊇ L be an extension of L, and let Γ∗ ⊇ Γ be a maximal consistent theory inL′ with the Henkin property (see Lemma 8.9). Consider the canonical model M(Γ∗)as defined in this section. By Lemma 8.11, M(Γ∗) |= Γ∗. Thus Γ∗ is satisfiable,and hence so is Γ. � �

8.1.6. The Finite Model Property. Let Γ be a consistent finite theory insome language L. As we now know, Γ has a model. Specifically, we have seenthat there is some Γ∗ ⊇ Γ which is a maximal consistent theory with the Henkinproperty in an extended language L∗ ⊇ L. Then we may take the set of constantsymbols of L∗ to be the carrier of a model of Γ∗, hence of Γ. The model obtainedin this way is infinite. It is of interest to build a finite model, so in this section Γmust be finite. The easiest way to see that Γ has a finite model is to recall that ouroverall language is a sub-language of the two variable fragment FO2 of first-orderlogic. And FO2 has the finite model property by Mortimer’s Theorem [17].

However, it is possible to give a direct argument for the finite model property,along the lines of filtration in modal logic (but with some differences). We sketchthe result here because we shall use the same method in Section 8.2.1 below toprove a finite model property for our second logical system L(adj) with respect toits natural semantics; that result does not follow from others in the literature.

Let M = M(Γ∗) be the canonical model as defined in Section 8.1.5. Let Sub(Γ)be the collection of set terms occurring in any sentence in the original finite theory Γ.So Sub(Γ) is finite, and if ∀(c, r) ∈ Sub(Γ) or ∃(c, r) ∈ Sub(Γ), then also c ∈ Sub(Γ).For constant symbols j and k of L∗, write j ≡ k iff the following conditions hold:

(1) If either j or k is a constant of L, then k = j.(2) For all c ∈ Sub(Γ), c(j) ∈ Γ iff c(k) ∈ Γ.

Remark The equivalence relation ≡ may be defined on any structure. It is notcheck the referencenecessarily a congruence, as Example 5.2 shows. Specifically, we had constantsymbols j and k such that j ≡ k, and yet in our structure s(j, k) and s(k, j). Inthe case of M(Γ∗), we have no reason to think that ≡ is a congruence. That is, theconstruction in Section 8.1.5 did not arrange for this.

Let N = {[k] : k ∈ K(L)} × {∀,∃}. (We use ∀ and ∃ as tags to give two copiesof the quotient K/≡.) We endow N with an L-structure as follows:(8.1) [[p]] = {([j], Q) : p(j) ∈ Γ∗ and Q ∈ {∀,∃}}.(8.2) [[s]](([j], Q), ([k], Q′)) iff one of the following two conditions holds:

(1) There is a set term c such that Γ∗ contains c(k) and ∀(c, s)(j).(2) Q′ = ∃, and for some j∗ ≡ j and k∗ ≡ k, Γ∗ contains s(j∗, k∗).

(8.3) For a constant j of L, [[j]] = ([j],∃). (Of course, [j] is the singleton set{j}.)


Before going on, we note that the first of the two alternatives in the definition of[[s]](([j], Q), ([k], Q′)) is independent of the choice of representatives of equivalenceclasses. And clearly so is the second alternative.

We shall write N for the resulting L-structure, hiding the dependence on Γ andΓ∗.

Lemma 8.13 For all c ∈ Sub(Γ), [[c]] = {([j], Q) : c(j) ∈ Γ∗ and Q ∈ {∀,∃}}.

Proof By induction on set terms c. We are not going to present any of thedetails here because in Lemma 8.21 below, we shall see all the details on a moreinvolved result. � �

Lemma 8.14 N |= Γ.

Proof Again we only highlight a few details, since the full account is similarto what we saw in Lemma 8.11, and to what we shall see in Lemma 8.22. Onewould check the sentence types in turn, using Lemma 8.13 frequently. We wantto go into details concerning sentences in Γ of the form s(j, k) or s(j, k). Recallthat we are dealing in this result with sentences of L, and so j and k are constantsymbols of that language. Also recall that [[j]] = ([j],∃), and similarly for k.

First, consider sentences in Γ of the form s(j, k). By the definition of [[s]], wehave

[[s]](([j],∃), ([k],∃)).By the way binary atoms and constants are interpreted in N, we have N |= s(j, k),as desired.

We conclude with the consideration of a sentence in Γ of the form s(j, k). Wewish to show that N |= s(j, k). Suppose towards a contradiction that N |= s(j, k).Then we have [[s]](([j],∃), ([k],∃)). There are two possibilities, corresponding to thealternatives in the semantics of s. The first is when there is a set term c such thatΓ∗ contains c(k) and ∀(c, s)(j). By (∀E), Γ∗ then contains s(j, k). But recall thatΓ contains s(j, k). So in this alternative, Γ∗ ⊇ Γ is inconsistent. In the secondalternative, there are j∗ ≡ j and k∗ ≡ k such that s(j∗, k∗) ∈ Γ∗. But recall thatthe equivalence classes of constant symbols from the base language L are singletons.Thus in this alternative, j∗ = j and k∗ = k; hence s(j, k) ∈ Γ∗. But then again Γ∗

is inconsistent, a contradiction. � �

Theorem 8.15 (Finite Model Property) If Γ is consistent, then Γ has a modelof size at most 22n, where n is the number of set terms in Γ.

Complexity notes. Theorem 8.15 implies that the satisfiability problem for ourlanguage is in NExpTime. We can improve this to an ExpTime-completenessresult by quoting the work of others. Pratt-Hartmann [25] define a certain logic E2

and showed that the complexity of its satisfiability problem is ExpTime-complete.E2 corresponds to a fragment of first-order logic, and it is somewhat bigger thanthe language L. (It would correspond to adding converses to the binary atoms inL, as we mention at the very end of this chapter.) Since satisfiability for E2 isExpTime-complete, the same problem for L is in ExpTime.

A different way to obtain this upper bound is via the embedding into Booleanmodal logic which we saw in Section 5.0.2. For this, see Theorem 7 of Lutz andSattler [13]. We shall use an extension of that result below in connection with anextension L(adj) of L.


The ExpTime-hardness for L follows from Lemma 6.1 in [29]. That resultdealt with a language called R†, and R† is a sub-language of L.

8.2. Adding Transitivity: L(adj)

Before going further, let us briefly recapitulate the overall problem of thischapter and point out where we are and what remains to be done. We aim toformalize a fragment of first-order logic in which one may represent argumentsas complex as that in (8.4) in the Introduction. We are especially interested indecidable systems, and so the systems must be weaker than first-order logic. Wepresented in Section 8 a language L and a proof system for it. Validity in thelogic cannot be captured by a purely syllogistic proof system, and so our proofsystem uses variables. But the use is very special and restricted. The proof systemis complete and decidable in exponential time. To our knowledge, it is the firstsystem with these properties.

There are a number of ways in which one can go further. In this section, wewant to explore one such way, connected to the example in (8.4) below:

(8.4)

Every sweet fruit is bigger than every ripe fruitEvery pineapple is bigger than every kumquatEvery non-pineapple is bigger than every unripe fruitEvery fruit bigger than some sweet fruit is bigger than every kumquat

One key feature of this example is that comparative adjectives such as bigger thanare transitive. This is true for all comparative adjectives.

We extend our language L to a language L(adj) by taking a basic set A ofcomparative adjective phrases in the base. The proof system simply extends the onewe have already seen with a rule corresponding to the transitivity of comparatives.Our completeness result, Theorem 8.12, extends to the new setting. The nextsection does this. The decidability of the language is a more delicate matter thanbefore, since it does not follow from Mortimer’s Theorem [17] on the finite modelproperty for FO2. Indeed, adding transitivity statements to FO2 renders the logicundecidable, as shown in Gradel, Otto, and Rosen [8]. Instead, one could useTheorem 12 of Lutz and Sattler [13] on the decidability of a variant on Booleanmodal logic in which some of the relations are taken to be transitive. This wouldindeed give the ExpTime-completeness of L(adj) with our semantics. However, wehave decided to present a direct proof for several reasons. First, Lutz and Sattler’sresult does not give a finite model property, and our result does do this. Second, ourargument is shorter. Finally, our treatment connects to modal filtration argumentsand is therefore different; [13] uses automata on infinite trees and is based on Vardiand Wolper [37].

I do not wish to treat the transitivity of comparison with adjectives as an en-thymeme (missing premise) because the transitivity seems more fundamental, more‘logical’ somehow. Hence it should be treated on a deeper level. The decidabilityconsiderations give a supporting argument: if we took the transitivity to be a mean-ing postulate, then it would seem that the underlying language would have to berich enough to state transitivity. This requires three universal quantifiers. For otherreasons, we want our languages to be closed under negation. It thus seems verylikely that any logical system with these properties is going to be undecidable. The

8.2. ADDING TRANSITIVITY: L(adj) 101

upshot is a system in which the transitivity turns out to be a proof postulate ratherthan a meaning postulate. We turn to the system itself.

Syntax and semantics. We start with four pairwise disjoint sets A (for compar-ative adjective phrases) and the three that we saw before: P, R, and K. We use aas a variable to range over A in our statement of the syntax and the rules.

For the syntax, we take elements a ∈ A to be binary atoms, just as the elementss ∈ R are. Thus, the binary literals are the expressions of the form s, s, a, or a.

The syntax is the same as before, except that we allow the binary atoms to beelements of A in addition to elements of R. So in a sense, we have the same syntaxis before, except that some of the binary atoms are taken to render transitive verbs,and some are taken to render comparative adjective phrases. The only difference isin the semantics. Here, we require that (in every model M) for an adjective a ∈ A,[[a]] must be a transitive relation.

Proof system. We adopt the same proof system as in Figure 8.3, but with oneaddition. This addition is the rule for transitivity:

a(t1, t2) a(t2, t3)a(t1, t3)

trans

This rule is added for all a ∈ A.

Example 8.16 We have seen an informal example in (8.4) at the beginningof this chapter. At this point, we can check that our system does indeed have aderivation corresponding to this. We need to check that Γ ` ϕ, where Γ contains

∀(sw,∀(ripe, bigger)),∀(pineapple,∀(kq, bigger)),∀(pineapple,∀(ripe, bigger)),

and ϕ is∀(∃(sw, bigger),∀(kq, bigger)).

(We are going to use kq as an abbreviation of kumquat for typographical conve-nience, and similarly for sw and sweet.)

Example 8.17 The example at the beginning of this chapter cannot be for-malized in this fragment because the correct reasoning uses the transitivity of isbigger than. However, we can prove a result which may itself be used in a formalproof of (8.4):

(8.5)

Every sweet fruit is bigger than every ripe fruitEvery pineapple is bigger than every kumquatEvery non-pineapple is bigger than every unripe fruitEvery sweet fruit is bigger than every kumquat

To discuss this, we take the set P of unary atoms to be

P = {sweet, ripe, pineapple, kumquat}.We also take R = {bigger} and K = ∅. Figure 8.5 contains a derivation showing(8.5), done in the manner of Fitch [5]. The main way in which we have bentthe English in the direction of our formalism is to use the bar notation on thenouns. The main reason for presenting the derivation as a Fitch diagram is thatthe derivation given as a tree (as demanded by our definitions) would not fit on apage. This is because the cases rule is not a first-class rule in the system, it is aderived rule (see Example 8.5 above). Our Fitch diagram pretends that the systemhas a rule of cases. Another reason to present the derivation as in Figure 8.5 is to


1 Every sweet fruit is bigger than every ripe fruit hyp

2 Every pineapple is bigger than every kumquat hyp

3 Every pineapple is bigger than every ripe fruit hyp

4 x x is a sweet fruit hyp

5 x is bigger than every ripe fruit ∀E, 1, 4

6 x is a pineapple hyp

7 x is bigger than every kumquat ∀E, 2, 6

8 x is a pineapple hyp

9 x is bigger than every ripe fruit ∀E, 3, 8

10 y y is a kumquat hyp

11 y is a ripe fruit hyp

12 x is bigger than y ∀E, 5, 11

13 y is a ripe fruit hyp

14 x is bigger than y ∀E, 9, 13

15 x is bigger than y cases, 13–14, 11–12

16 x is bigger than every kumquat ∀I, 10–15

17 x is bigger than every kumquat cases, 6–7, 8–16

18 Every sweet fruit is bigger than every kumquat ∀I, 4–17

Figure 8.5. A derivation corresponding to the argument in (8.5).

make the point that the treatment in this chapter is a beginning of a formalizationof the work that Fitch was doing.

In Figure 8.5, we see that Γ ` ∀(sweet,∀(kq, bigger)). (The a derivation wascheck all the refer-ences on this presented using a format which could be converted to our official format of natural

deduction trees.) That work used R = {bigger}, but here we want R = ∅ andA = {bigger}. The same derivation works, of course. Transitivity enables us toobtain a derivation for (8.4):

[∃(sw, bigger)(x)]3

[bigger(x, y)]2[kq(z)]1

[sw(y)]2

....∀(sw,∀(kq, bigger))

∀(kq, bigger)(y)∀E

bigger(y, z)∀E

bigger(x, z)trans

∀(kq, bigger)(x)∀I1

∀(kq, bigger)(x)∃E2

∀(∃(sw, bigger), ∀(kq, bigger))∀I3

Adding the transitivity rule gives a sound and complete proof system for thesemantic consequence relation Γ |= ϕ. The soundness is easy, and so we only sketch


the completeness. We must show that a set Γ which is consistent in the new logichas a transitive model. The canonical model M(Γ) as defined in Section 8.1.5 isautomatically transitive; this is immediate from the transitivity rule. And as weknow, it satisfies Γ.

8.2.1. L(adj) has the Finite Model Property. Our final result is thatL(adj) has the finite model property. We extend the work in Section 8.1.6. Theinspiration for our definitions comes from the technique of filtration in modal logic,but we shall not refer explicitly to this area.

We again assume that Γ is consistent, and Γ∗ has the properties of Lemma 8.9.

Definition 8.18 For a ∈ A, we say that j reaches k (by a chain of ≡ and astatements) if there is a sequence

(8.6) j = j0 ≡ k0, j1 ≡ k1, . . . , jn ≡ kn = k

such that n ≥ 1, and Γ∗ contains a(k0, j1), . . ., a(kn−1, jn).

Lemma 8.19 Assume that j reaches k by a chain of ≡ and a statements.(1) If c(k) ∈ Γ∗, then Γ∗ contains ∃(c, a)(j).(2) If j, k ∈ K(L), then Γ∗ contains a(j, k).

Proof By induction on n ≥ 1 in (8.6). For n = 1, we have essentially seenthe argument as a step in Lemma 8.13. Here it is again. Since c(k1) and j1 ≡ k1,we see that c(j1). Together with s(k0, j1), we have ∃(c, a)(k0). And as j0 ≡ k0, wesee that ∃(c, a)(j0).

Assume our result for n, and now consider a chain as in (8.6) of length n+ 1.The induction hypothesis applies to

j = j1 ≡ k1, j2 ≡ k2, . . . , jn+1 ≡ kn+1 = k

and so we have ∃(c, a)(j1). Since a(k0, j1), we easily have ∃(c, a)(k0) by transitivity.And as j0 ≡ k0, we have ∃(c, a)(j0).

The second assertion is also proved by induction on n ≥ 1. For n = 1, wehave j = j0 ≡ k0, Γ∗ contains a(k0, j1); and j1 ≡ k1 = k. Then since the ≡ isthe identity on K(L), j = j0 = k0, and j1 = k1 = k. Hence Γ∗ contains s(j, k).Assuming our result for n, we again consider a chain as in (8.6) of length n + 1.Just as before, j = j0 = k0, and so Γ∗ contains a(j, j1). By induction hypothesis,Γ∗ contains a(j1, k). By transitivity, Γ∗ contains a(j, k). � �

We endow N with an L-structure as follows: Is ’itemize’ right forthis?[[p]] = {([j], Q) : p(j) ∈ Γ∗ and Q ∈ {∀,∃}}.

[[s]](([j], Q), ([k], Q′)) iff one of the following two conditions holds:(1) There is a set term c such that Γ∗ contains c(k) and ∀(c, s)(j).(2) Q′ = ∃, and for some j∗ ≡ j and k∗ ≡ k, Γ∗ contains s(j∗, k∗).

[[a]](([j], Q), ([k], Q′)) iff(1) If ∀(c, a)(k) ∈ Γ∗, then also ∀(c, a)(j) ∈ Γ∗.(2) In addition, either (a) or (b) below holds:

(a) There is a set term c such that Γ∗ contains c(k) and ∀(c, a)(j).(b) Q′ = ∃, and j reaches k by a chain of ≡ and a statements.

(Notice that this definition is independent of the representatives in [j] and[k].)For a constant j of L, [[j]] = ([j],∃).


Once again, we suppress Γ and Γ∗ and simply write N for the resulting L-structure.

Lemma 8.20 For a ∈ A, each relation [[a]] is transitive in N.

Proof In this proof and the next , we are going to use l to stand for a constant???symbol, even though earlier in the chapter we used it for a literal. Assume that

(8.7) ([j], Q) [[a]] ([k], Q′) [[a]] ([l], Q′′).

Clearly we have the first requirement concerning [[a]]: if ∀(c, a)(l) ∈ Γ∗, then also∀(c, a)(j) ∈ Γ∗.

We have four cases, depending on the reasons for the two assertions in (8.7).Case 1 There is a set term b such that Γ∗ contains b(k) and ∀(b, a)(j), and there

is also a set term c such that Γ∗ contains c(l) and ∀(c, a)(k). By (1), Γ∗ containsc(l) and ∀(c, a)(j). And so we have requirement (2a) concerning [[a]] for ([j], Q) and([l], Q′′).

Case 2 There is a set term b such that Γ∗ contains b(k) and ∀(b, a)(j), and kreaches l. Note that a(j, k). So j reaches l.

Case 3 j reaches k by a chain of ≡ and a statements, and there is a set term csuch that Γ∗ contains c(l) and ∀(c, a)(k). Then a(k, l). And so j reaches l.

Case 4 j reaches k, and k reaches l. Then concatenating the chains shows thatj reaches l. � �

Lemma 8.21 For all c ∈ Sub(Γ), [[c]] = {([j], Q) : c(j) ∈ Γ∗ and Q ∈ {∀,∃}}.

Proof We argue by induction on c. Much of the proof is as in Lemma 8.13,For c a unary atom, the result is obvious. Also, assuming that [[c]] = {([j], Q) :c(j) ∈ Γ∗} we easily have the same result for c using the maximal consistency ofΓ∗:

([j], Q) ∈ [[c]] iff ([j], Q) /∈ [[c]] iff c(j) /∈ Γ∗ iff c(j) ∈ Γ∗.Assume about c that if c ∈ Sub(Γ), then [[c]] = {([j], Q) : c(j) ∈ Γ∗}. In view of

what we just saw, we only need to check the same result for ∀(c, s), ∃(c, s), ∀(c, a),and ∃(c, a).∀(c, s). Suppose that ∀(c, s) ∈ Sub(Γ), so that c ∈ Sub(Γ) as well. We prove

that[[∀(c, s)]] = {([j], Q) : ∀(c, s)(j) ∈ Γ∗}.

Let ([j], Q) ∈ [[∀(c, s)]]. We shall show that ∀(c, s)(j) ∈ Γ∗. If not, then by maximalconsistency, ∃(c, s)(j) ∈ Γ∗. By the Henkin property, let k be such that Γ∗ containsc(k) and s(j, k). By induction hypothesis, ([k],∀) ∈ [[c]]. And so ([j],∀)[[s]]([k],∀).Thus there is a set term b such that Γ∗ contains b(k) and ∀(b, s)(j). From these, Γ∗

contains s(j, k). And thus Γ∗ is inconsistent. This contradiction shows that indeed∀(c, s)(j) ∈ Γ∗.

In the other direction, suppose that ([j], Q) is such that ∀(c, s)(j) ∈ Γ∗. Let([k], Q′) ∈ [[c]], so by induction hypothesis, c(k) ∈ Γ∗. By the way we interpretbinary relations in N, [[s]](([j], Q), ([k], Q′)). This for all ([k], Q′) ∈ [[c]] shows that([j], Q) ∈ [[∀(c, s)]].∃(c, s). Suppose that ∃(c, s) ∈ Sub(Γ), so that c ∈ Sub(Γ) as well. Let ([j], Q) ∈

[[∃(c, s)]]. Let k and Q′ be such that [[c]]([k], Q′) and [[s]](([j], Q), ([k], Q′)). Byinduction hypothesis, c(k) ∈ Γ∗. First, let us consider the case when Q′ = ∀.Let b be such that Γ∗ contains b(k) and ∀(b, s)(j). Using (∀E), we have Γ∗ `


∃(c, s)(j). And as Γ∗ is closed under deduction, ∃(c, s)(j) ∈ Γ∗ as desired. Themore interesting case is when Q′ = ∃, so that for some j∗ ≡ j and k∗ ≡ k, Γ∗

contains s(j∗, k∗). Since c(k) and k ≡ k∗, we have c(k∗) ∈ Γ∗. Then using (∃I), wesee that ∃(c, s)(j∗) ∈ Γ∗. Since j ≡ j∗, once again we have ∃(c, s)(j) ∈ Γ∗.

Conversely, suppose that ∃(c, s)(j) ∈ Γ∗. By the Henkin property, let k besuch that c(k) and s(j, k) belong to Γ∗. Then [[s]](([j], Q), ([k],∃)), and by inductionhypothesis, [[c]](k). Hence ([j], Q) ∈ [[∃(c, s)]].∀(c, a). Suppose that ∀(c, a) ∈ Sub(Γ), so that c ∈ Sub(Γ) as well. We prove

that[[∀(c, a)]] = {([j], Q) : ∀(c, a)(j) ∈ Γ∗}.

The first part argument is the left-to-right inclusion. It is exactly the same as whatwe saw above for the sentences of the form ∀(c, s).

In the other direction, suppose that ∀(c, a)(j) ∈ Γ∗; we show that ([j], Q) ∈[[∀(c, a)]]. For this, let ([k], Q′) ∈ [[c]]. By induction hypothesis, c(k) ∈ Γ∗. Wemust verify that if ∀(b, a)(k) ∈ Γ∗, then also ∀(b, a)(j) ∈ Γ∗. This is shown in thederivation below:

c(k) ∀(c, a)(j)a(j, k)

∀E[b(x)]1 ∀(b, a)(k)

a(k, x)∀E

a(j, x)trans

∀(b, a)(j) ∀I1

Since Γ∗ is closed under deduction, we see that indeed ∀(b, a)(j) ∈ Γ∗. Going on,we see from the structure of N that [[s]](([j], Q), ([k], Q′)). This for all ([k], Q′) ∈ [[c]]shows that ([j], Q) ∈ [[∀(c, a)]].∃(c, a). Suppose that ∃(c, a) ∈ Sub(Γ), so that c ∈ Sub(Γ) as well.Let ([j], Q) ∈ [[∃(c, a)]]. Let k and Q′ be such that the following two assertions

hold: [[a]](([k], Q′) and [[a]](([j], Q), ([k], Q′)). By induction hypothesis, c(k) ∈ Γ∗.There are two cases depending on whether Q′ = ∀ or Q′ = ∃. The argument forQ′ = ∀ is the same as the one we saw in our work on sentences ∃(c, s) above. Themore interesting case is when Q′ = ∃. This time, j reaches k. By Lemma 8.19,∃(c, a)(j) ∈ Γ∗.

Conversely, suppose that ∃(c, a)(j) ∈ Γ∗. By the Henkin property, let k be suchthat c(k) and a(j, k) belong to Γ∗. The derivation below shows that if ∀(d, a)(k) ∈Γ∗, then ∀(d, a)(j) ∈ Γ∗ as well:

a(j, k)[d(x)]1 ∀(d, a)(k)

a(k, x)∀E

a(j, x)trans

∀(d, a)(j) ∀I1

So [[a]](([j], Q), ([k],∃)), and by induction hypothesis, [[c]](k). Hence ([j], Q) ∈[[∃(c, a)]].

This completes the induction. � �

Lemma 8.22 N |= Γ.

Proof We check the sentence types in turn, using Lemma 8.21 without men-tion.


First, let Γ contain the sentence ∀(b, c). Then b and c belong to Sub(Γ). Let([j], Q) ∈ [[b]], so that b(j) ∈ Γ∗. We have d(j) ∈ Γ∗ using (∀E). This for all ([j], Q)shows that N |= ∀(b, c).

Second, let ∃(c, d) ∈ Γ. By the Henkin property, let j be such that both c(j)and d(j) belong to Γ∗. The element ([j],∀) shows that [[c]] ∩ [[d]] 6= ∅. That is,N |= ∃(c, d).

Continuing, consider a sentence b(j) ∈ Γ. As b ∈ Sub(Γ), we have ([j],∃) ∈ [[b]],so that N |= b(j).

The work for sentences of the forms s(j, k) and s(j, k) was done in Lemma 8.14.The most intricate part of this proof concerns sentences a(j, k), a(j, k) ∈ Γ.

Recall that we are dealing in this result with sentences of L, and so j and k areconstant symbols of that language. Also recall that [[j]] = ([j],∃), and similarly fork.

Consider sentences in Γ of the form a(j, k). It is easy to see that if ∃(c, a)(k)belongs to Γ, then so does ∃(c, a)(j). (See the ∃(c, a) case in Lemma 8.21.) Fromthis it follows easily that [[a]]([[j]], [[k]]). And so N |= a(j, k) in this case.

We conclude with the consideration of a sentence in Γ of the form a(j, k). Wewish to show that N |= a(j, k). Suppose towards a contradiction that N |= a(j, k).Then we have [[a]](([j],∃), ([k],∃)). There are two possibilities, corresponding tothe alternatives in the semantics of a. The first is when there is a set term c suchthat Γ∗ contains c(k) and ∀(c, a)(j). Using (∀E), Γ∗ then contains a(j, k). Butrecall that Γ contains a(j, k). So in this alternative, Γ∗ ⊇ Γ is inconsistent. In thesecond alternative, j reaches k by a chain of ≡ and a statements. By Lemma 8.19,a(j, k) ∈ Γ∗. So Γ∗ is inconsistent, and we have our contradiction. � �

Once again, this gives us the finite model property for L(adj). The result is notinteresting from a complexity-theoretic point of view, since we already could seefrom Lutz and Sattler [13] that the logic had an ExpTime satisfiability problem.

Conclusion.

CHAPTER 9

Monotonicity and Polarity

9.1. Introduction

To introduce the topic of monotonicity and polarity, let’s consider a very simplesentence:(9.1) Every dog barks.

9.1.1. History. One enterprise at the border of logic and linguistics is thecrafting of logical systems which capture aspects of inference in natural language.The ultimate goal is a logical system which is strong enough to represent the mostcommon inferential phenomena in language and yet is weak enough to be decidable.Another goal would be to use representations which are as close as possible to thesurface forms. This second goal is likely to be unobtainable, and when one backsoff to consider structured representations, it is natural to turn to those suggestedby linguistic formalisms of one form or another.

The particular strand of this enterprise that concerns us in this chapter is thenatural logic program initiated by Johan van Benthem [33, 34], and elaborated inVıctor Sanchez Valencia [32]. As mentioned in van Benthem [35], the “proposedingredients” of a logical system to satisfy the goals would consist of several “mod-ules”:

(1) Monotonicity Reasoning, i.e., Predicate Replacement,(2) Conservativity, i.e., Predicate Restriction, and also(3) Algebraic Laws for inferential features of specific lexical items.

We are only concerned with (1) in this chapter, and we are mainly concerned withan alternative to van Benthem’s seminal proposal.

There are several reasons why monotonicity is worthy of study on its own: it isarguably tied up with linguistic phenomena, it is a pervasive phenomenon in actualreasoning, and finds its way into computational systems for linguistic processing(cf. Nairn, Condoravdi and Karttunen [23] and MacCartney and Manning [14]).The leading idea in van Benthem’s work is to develop a formal approach to mono-tonicity on top of categorial grammar (CG). This makes sense because the semanticsof CG is given in terms of functions, and monotonicity in the semantic sense is allabout functions. The success story of the natural logic program is given by themonotonicity calculus, a way of determining the polarity of words in a given sen-tence. Before turning to it, we should explain the difference between monotonicityand polarity. We follow the usage of Bernardi [1], Section 4.1:

The differences between monotonicity and polarity could be sum-marized in a few words by saying that monotonicity is a propertyof functions . . .. On the other hand, polarity is a static syntac-tic notion which can be computed for all positions in a given

107

108 9. MONOTONICITY AND POLARITY

formula. This connection between the semantic notion of mono-tonicity and the syntactic one of polarity is what one needs toreach a proof theoretical account of natural reasoning and builda natural logic.

This point might be clarified with an example of the monotonicity calculus.Consider the sentence No dog chased every cat. The sentence is to be derived insome categorial grammar, and to be brief we suppress all the details on this exceptto say that a derivation would look like the trees below, except without the + and −marks. Given the derivation, one would first add some notations to indicate that theinterpretation of every, as a function from noun meanings, is monotone decreasingbut results in a function that is monotone increasing. Also, the interpretation ofno, as a function from noun meanings, is monotone decreasing and again results ina function that is monotone decreasing.

no+ dog−

no(dog)+

chase+

every+ cat−

every(cat)+

chase(every(cat))−

no(dog)(chase(every(cat)))+

no+ dog−

no(dog)+

chase+

every− cat+

every(cat)−

chase(every(cat))−

no(dog)(chase(every(cat)))+

Monotonicity Marking Polarity Determination

On the left, we have monotonicity markings in the derivation tree itself. Thesemarkings were propagated from the information about every and no according toa rule which need not concern us. One then turns the Monotonicity Marking treeinto the Polarity Determination tree. The rule is: for each node, count the numberof − signs from it to the root of the tree (at the bottom). This changes the signsfor every, cat, and every cat. The resulting signs are the polarities. For example,the + marking on cat indicates that this occurrence of cat might be replaced by aword with a “larger” value. For example, replacing cat with animal gives a validinference: No dog chased every cat implies No dog chased every animal1. On theother hand, dog is marked −, corresponding to the fact that replacement inferenceswith it go in the opposite direction. So replacing dog with animal would not givea validity; instead, replace with the “smaller” phrase old dog.

As Dowty [4] mentions, “The goal [in his paper] is to ‘collapse’ the indepen-dent steps of Monotonicity Marking and Polarity Determination into the syntacticderivation itself, so that words and constituents are generated with the markingsalready in place that they would receive in Sanchez’ polarity summaries.” His mo-tivation is mainly linguistic rather than logical: “the format of Sanchez’ system. . . is no doubt highly appropriate for the logical studies he developed it for, it isunsuitable for the linguistics applications I am interested in.” For this reason, hegave an “alternative formulation.” This system is “almost certainly equivalent toSanchez’,” but he lacked a proof. Bernardi [1] also notes that while her system andSanchez’ “are proven to be sound, no analogous result is known for” the systemproposed by Dowty. However her presentation makes it clear that there are advan-tages to the internalization of polarity markers. Others who use a system inspiredby [4] include Christodoulopoulos [3].

1Incidentally, we are taking this sentence and others like it in the subject wide-scope reading:there is no dog with the property that it chases every cat.

9.2. BACKGROUND: CATEGORIAL GRAMMAR 109

The purpose of this chapter is to offer a soundness proof for internalized mono-tonicity markings, and to discuss related issues. The first of these issues has todo with a choice in the overall categorial architecture. We wish to take seriouslythe idea that syntactic categories might be interpreted in ordered settings. Giventwo types σ and τ , we want to consider two resulting types: the upward monotonefunctions and the downward monotone functions. The first alternative is to gener-ate types as follows: begin with a set T0 of basic types, and then build the smallestset T1 ⊇ T0 closed in the following way:(9.2) If σ, τ ∈ T1, then also (σ, τ)+ ∈ T1 and (σ, τ)− ∈ T1.This would seem to be the most straightforward way to proceed, and it actuallyseems to be close to the way that is presented in the literature2. Indeed, we workedout the theory this way before deciding to rewrite this chapter based on a secondalternative. This second way is based on the observation that a downward mono-tone function from a preorder P to a preorder Q is the same thing as an upwardmonotone function from P to the opposite order −Q. This order −Q has the samepoints as Q, but the order is “upside down.” The suggestion is that we generatetypes a little differently. We take smallest set T1 ⊇ T0 closed in the following way:(9.3) If σ, τ ∈ T1, then also (σ, τ) ∈ T1.(9.4) If σ ∈ T1, then also −σ ∈ T1.We shall call this the fully internalized scheme. This includes copies of the typesdone the first way, by viewing (σ, τ)+ as (σ, τ) and (σ, τ)− as (σ,−τ). It is a strictlybigger set, since −σ for the basic types enters into the fully internalized hierarchybut not the lesser one.

9.2. Background: Categorial Grammar

This section offers a review of the essential facts concerning categorial grammar.For further background, and much more discussion of the subject, see Moortgat [16].

Fix a set C0 of basic categories. Let C be the smallest superset of C0 closed inthe following way:(9.5) If C,D ∈ C, then also C\D and C/D belong to C.

Let W be a set of words. A lexicon is a subset L ⊆ W × C; that is, a lexiconis a set of pairs consisting of a word and a category. There is no requirement thatthis lexicon is function, and indeed it usually is not a function.

Continuing, we extend L to L∗ ⊆W ∗ × C by the following two rules:(9.6) If t : C\D and s : C, then st : D.(9.7) If t : C/D and s : D, then ts : D.Here the notation s : C means that (s, C) ∈ L∗.

An AB-grammar is a finite lexicon L together with a distinguished categoryS. The language of L is the set of all t ∈ W ∗ such that t : S. These are calledAB -grammars after Kazimierz Ajdukiewicz, the originator of categorial grammar,and Yehoshua Bar-Hillel, the one who added directionality to Ajdukiewicz’ system.

It is useful to recast (9.6) and (9.7) in the style of proof rules, as follows:

2For example, Dowty [4] states his category formation rules in this way, adding directionalversions. But shortly thereafter he mentions that lexical items will “be entered in two categories.”Adding downward-marked versions of the base categories is tantamount to what we call the fully

internalized scheme.


(9.8)s : C t : C\D

st : Dt : C/D s : D

ts : D

That is, the lexicon generates proof trees. We declare the category of a proof treeT as the category of its root node, and we write T : C for this.

Example 9.1 Let C0 = {S, T, U}, let W = {a, b, c}, and let L be the followinglexicon:

a : S/Tb : Tb : T/S

b : Ub : U/Sb : T/U

b : U/Uc : Tc : T/S

An example of a word in the language of this lexicon L is abbac, via the followingderivation:

a : S/Tb : T/U

b : U/Sa : S/T c : T

ac : Sbac : U

bbac : Tabbac : S

Another example of a word in this language is abbbacabbbbb. (Indeed, for thosefamiliar with the notation of regular languages, the language of this grammar isexactly (a(b+c))+. This fact is not intended to be obvious, and indeed it takes somework to establish it.)

Example 9.2 This time we take C0 = {X,S} and W = {a, b}. The lexicon isa : S/X, b : S\X, and b : X. This time the language of the lexicon happens to be

{ab, aabb, aaabbb, . . . , anbn, . . .}.This language is famous in formal language theory because it is context-free butnot regular. For an example of a derivation, here is a reason why aabb is in thelanguage:

a : S/X

a : S/X b : Xab : S b : S\X

abb : Xaabb : S

9.2.1. The syntax-semantics interface in AB-grammars. Fix a set T0

of basic types. Let T be the smallest superset of T0 closed in the following way:(9.9) If σ, τ ∈ T, then also (σ, τ) ∈ T.We associate (semantic) types to (syntactic) categories, starting with a map a :C0 → T, and then extending via(9.10) a(C\D) = a(C/D) = (a(C), a(D)).A model is an assignment Xσ for σ ∈ T0, and it extends to the full set T by(9.11) X(σ,τ) = [Xσ, Xτ ], the set of all functions f : Xσ → Xτ .Each entry in w : C in the lexicon then is given a semantics [[w]] ∈ Xa(C). To eachderivation tree T : C then has an associated [[T ]] ∈ Xa(C). The definition is byinduction on T . For T a one-point tree, T is essentially a typed word w : C, andso [[T ]] is given by the semantics of the lexicon. If T comes from (9.8), say from thefirst rule and via subtrees T0 : C and T1 : C\D, then [[T ]] = [[T1]]([[T0]]).

9.3. BACKGROUND: PREORDERS AND THEIR OPPOSITES 111

9.3. Background: Preorders and their Opposites

For the work in the rest of this chapter we need to use preorders. Our goalin this section is to state the facts which are needed in as compact a manner aspossible, and also to mention a few examples that will be used in the sequel.

A preorder is a pair P = (P,≤) consisting of a set P together with a relation≤ which is reflexive and transitive. This means that the following hold:

(1) p ≤ p for all p ∈ P .(2) If p ≤ q and q ≤ r, then p ≤ r.

Frequently one assumes that the order has additional properties: it may be alsoanti-symmetric, or a lattice order, etc. For most of what we are doing, no extraassumptions are needed. Indeed, the specific results do not use the preorder prop-erties very much, and some of the work in this section goes through for absolutelyarbitrary relations. However, since the intended application is to a setting in whichthe relation represents some kind of inference, it makes sense to state the resultsfor the weakest class of mathematical objects that has something that looks like“inference,” and this seems to be preorders.

Example 9.3 For any set X, we have a preorder X = (X,≤), where x ≤ y iffx = y. This is called the flat preorder on X. More interestingly, for any set X wehave a preorder P(X) whose set part is the set of subsets of X, and where p ≤ q iffp is a subset of q. Another preorder is 2 = {F,T} with F ≤ T.

The natural class of maps between preorders P and Q is the set of monotonefunctions: the functions f : P → Q with the property that if p ≤ q in P, thenf(p) ≤ f(q) in Q. When we write f : P → Q in this chapter, we mean that f ismonotone. We write [P,Q] for the set of monotone functions from P to Q. [P,Q]is itself a preorder, with the pointwise order :

f ≤ g in [P,Q] iff for all p ∈ P , f(p) ≤ g(p) in Q

Let P and Q be preorders. The product preorder P×Q is the preorder whoseuniverse is the cartesian product P × Q, and with (p, q) ≤ (p′, q′) iff p ≤ p′ in Pand q ≤ q′ in Q.

Preorders P and Q are isomorphic if there is a function f : P → Q which isbijective (one-to-one and maps P onto Q) and also has the property that p ≤ p′ ifff(p) ≤ f(p′). We write P ∼= Q. We also identify isomorphic preorders, and so weeven write P = Q to make this point.

Proposition 9.4 For all preorders P, Q, and R, and all sets X:(1) For each p ∈ P , the function appp : [P,Q] → Q given by appp(f) = f(p)

is an element of [[P,Q],Q].(2) [P×Q,R] ∼= [P, [Q,R]].(3) [X,2] ∼= P(X).

Proof For part (1), let f ≤ g in [P,Q]. Then appp(f) = f(p) ≤ g(p) =appp(g).

In the second part, let f : [P×Q,R]→ [P, [Q,R]] be f(a)(p)(q) = a(p, q). Itis routine to check that f is an isomorphism.

In part (3), the isomorphism is ϕ : [X,2] → P(X) given by ϕ(f) = {p ∈ P :f(p) = T}. �


T

F

2

F

T

2op

(T,F) (F,T)

(F,F)

(T,T)

2× 2

Figure 9.1. Pictures of some useful preorders

Antitone functions and opposites of preorders. We are also going to be inter-ested in antitone functions from a preorder P to a preorder Q. These are thefunctions f : P → Q with the property that if p ≤ q in P, then f(q) ≤ f(p) inQ. We can express things in a more elegant way using the concept of the oppositepreorder −P of a given preorder P.3 This is the preorder with the same set part,but with the opposite order. Formally,

−P = Pp ≤ q in −P iff q ≤ p in P

Exercise 34 This exercise and the next ones ask you to classify some functionsas monotone or antitone. To start, look at or draw pictures of 2, 2op, 2×2, (2×2)op,2op × 2, and 2× 2op.

(1) Consider the negation function on {T,F}. Which of the following is true,and which false: ¬ : 2 → −2, or ¬ : 2 → 2, So would we say that ¬ is amonotone function on 2, or an antitone function?

(2) Is the conjunction operation ∧ a monotone function on from 2 × 2 to 2or an antitone function, or neither?

(3) Is the implication operation → a monotone function on from 2 × 2 to 2or an antitone function, or neither?

Exercise 35 LetP be any preorder, and letQ be P(P ), the power set preorderon the underlying set P of P. Let ↑: P → −Q be defined by

↑ (p) = {p′ ∈ P : p ≤ p′ in P}.

3I have wavered in my presentation between writing −P and Pop. The latter notation would

be more familiar to people who had seen duality in category theory, and might also make sensebecause the superscript notation matches the way that polarity marking works in the literature.

On the other hand, the −P notation makes the laws in Proposition 9.5 look more natural. In any

case, I welcome advice from readers.

9.3. BACKGROUND: PREORDERS AND THEIR OPPOSITES 113

Show that ↑: P→ −Q.

Exercise 36 Can a function f : P → Q be both monotone and antitone atthe same time? Show that f

We collect the most important facts on this operation below.

Proposition 9.5 For all preorders P, Q, and R, and all sets X:(1) −(−P) = P.(2) [P,−Q] = −[−P,Q].(3) [−P,−Q] = −[P,Q].(4) If f : P→ Q and g : Q→ R, then g · f : P→ R.(5) If f : P→ −Q and g : Q→ −R, then g · f : P→ R.(6) −(P×Q) ∼= −P×−Q.(7) X ∼= −X.(8) −[X,P] ∼= [X,−P].

Proof For part (1), recall first that P and −P have the same underlying set.It follows that P and −(−P) also have this the same underlying set. Concerningthe order: x ≤ y in P iff y ≤ x in −P iff x ≤ y in −(−P).

For (2), suppose that f : P→ −Q. Then f is a set function from P to Q. Tosee that f is a monotone function from −P to Q, let x ≤ y in −P. Then y ≤ x inP. So f(y) ≤ f(x) in −Q. And thus f(x) ≤ f(y) in Q, as desired. This verifiesthat as sets, [P,−Q] = [−P,Q]. To show that as preorders [P,−Q] = −[−P,Q],we must check that if f ≤ g in the order on [P,−Q], then g ≤ f in the order on[−P,Q] as well. For this, let p ∈ −P ; so p ∈ P . Then f(p) ≤ g(p) in −Q. Thusg(p) ≤ f(p) in Q. This for all p ∈ P shows that g ≤ f in [−P,Q].

Part (3) follows from parts (1) and (2).For part (4), assume x ≤ y. Then f(x) ≤ f(y), and so g(f(x)) ≤ g(f(y)).For part (5), note that by part (2), g : −Q→ R. So by part (4), g · f : P→ R.Continuing with part (6), note first that −(P × Q) and −P × −Q have the

same elements. To check that the orders are appropriately related, note that thefollowing are equivalent: (p, q) ≤ (p′, q′) in −(P ×Q); (p′, q′) ≤ (p, q) in (P ×Q);p′ ≤ p in P and q′ ≤ q in Q; p ≤ p′ in −P and q ≤ q′ in −Q; (p, q) ≤ (p′, q′) in−P×−Q.

For part (7), we verify that the identity map on P is an isomorphism from P

to −P. If p ≤ q in P, then clearly q ≤ p in −P. And if q ≤ p in −P, then since Pis flat, p = q. Thus p ≤ q in P,.

The last part comes from parts (3) and (8). �

Example 9.6 Part (3) in Proposition 9.5 is illustrated by considering [2,2]and [−2,−2]. Let c be the constant function T, let d be the constant function F,and let i be the identity. Then [2,2] is d < i < c, and [−2,−2] is c < i < d.

Part (2) is illustrated by [−2,2] and [2,−2]. Let us write ¬ for the negationfunction on truth values which we saw in Example ??. Then [−2,2] is d < ¬ < c,and [2,−2] is c < ¬ < d.

Example 9.7 The classical conditional → gives a monotone function from−2× 2 to 2. Since [−2× 2,2] ∼= [−2, [2,2]], this gives

if ∈ [−2, [2,2]].


This same function belongs to the opposite preorder, and so we could just as wellwrite

if ∈ −[−2, [2,2]] ∼= [2,−[2,2]] ∼= [2, [−2,−2]].

Example 9.8 Let X be any set. We use letters like p and q to denote elementsof [X,2]. Please keep in mind that the set of (monotone) functions from X to2 is in one-to-one correspondence with the set of subsets of X (see Part (3) ofProposition 9.4). Define

every ∈ [−[X,2], [[X,2],2]]some ∈ [[X,2], [[X,2],2]]no ∈ [−[X,2], [−[X,2],2]]

in the standard way:

every(p)(q) ={

T if p ≤ qF otherwise

some(p)(q) = ¬every(p)(¬ · q)

no(p)(q) = ¬some(p)(q)

It is routine to verify that these functions really belong to the sets mentioned above.And as in Example 9.7, each of these functions belongs to the opposite preorder aswell, and we therefore have

every ∈ [[X,2], [−[X,2],−2]]some ∈ [−[X,2], [−[X,2],−2]]no ∈ [[X,2], [[X,2],−2]]

Summary. The main point of this section is the fact mentioned in Proposi-tion 9.5, part (3). In words, the monotone functions from −P to −Q are exactlythe same as the monotone functions fromP toQ, but as preordered sets themselves,[−P,−Q] and [P,Q] are opposite orders. A simple illustration of the opposite or-ders was presented in Example 9.6. This fact, and also part (2) of Proposition 9.5,is important in the linguistic examples because it justifies why lexical items aretypically assigned two categories. For example, if we interpret a common noun byan element f of a preorder of the form [X,2], then the same f is an element of−[X,2], so we might as declare our noun to also have some typing which calls foran element of −[X,2]. And in general, if we type a lexical item w with a type (σ, τ)corresponding to an element of some preorder [P,Q], then we might as well endoww with the type (−σ,−τ) with the very same interpretation function.

9.4. Higher-order Terms over Preorders

Fix a set T0 of basic types. Let T1 be the smallest superset of T0 closed in thefollowing way:(9.12) If σ, τ ∈ T1, then also (σ, τ) ∈ T1.(9.13) If σ ∈ T1, then also −σ ∈ T1.Let ≡ be the smallest equivalence relation on T1 such that the following hold:

(1) −(−σ) ≡ σ.

9.4. HIGHER-ORDER TERMS OVER PREORDERS 115

(2) −(σ, τ) ≡ (−σ,−τ).(3) If σ ≡ σ′, then also −σ ≡ −σ′.(4) If σ ≡ σ′ and τ ≡ τ ′, then (σ, τ) ≡ (σ′, τ ′).

Definition 9.9 T = T1/≡. This is the set of types over T0.

The operations σ 7→ −σ and σ, τ 7→ (σ, τ) are well-defined on T. We alwaysuse letters like σ and τ to denote elements of T, as opposed to writing [σ] and [τ ].That is, we simply work with the elements of T1, but identify equivalent types.

Definition 9.10 Let T0 be a set of basic types. A typed language over T0 is acollection of typed variables v : σ and typed constants c : σ, where σ in each of theseis an element of T. We generally assume that the set of typed variables includesinfinitely many of each type. But there might be no constants whatsoever. We useL to denote a typed language in this sense.

Let L be a typed language. We form typed terms t : σ as follows:(1) If v : σ (as a typed variable), then v : σ (as a typed term).(2) If c : σ (as a typed constant), then c : σ (as a typed term).(3) If t : (σ, τ) and u : σ, then t(u) : τ .

Frequently we do not display the types of our terms.

9.4.1. Semantics. For the semantics of our higher-order language L we usemodels M of the following form. M consists of an assignment of preorders σ 7→ Pσon T0, together with some data which we shall mention shortly. Before this, extendthe assignment σ 7→ Pσ to T1 by

P(σ,τ) = [Pσ,Pτ ]P−σ = −Pσ

An easy induction shows that if σ ≡ τ , then Pσ = Pτ . So we have Pσ for σ ∈ T.We use Pσ to denote the set underlying the preorder Pσ.

The rest of the structure of a model M consists of an assignment [[c]] ∈ Pσ foreach constant c : σ, and also a typed map f ; this is just a map which to a typedvariable v : σ gives some f(v) ∈ Pσ.

9.4.2. Ground terms and contexts. A ground term is a term with no freevariables. Each ground term t : σ has a denotation [[t]] ∈ Pσ defined in the obviousway:

[[c]] = is given at the outset for constants c : σ[[t(u)]] = [[t]]([[u]])

A context is a typed term with exactly one variable, x. (This variable may beof any type.) We write t for a context, We’ll be interested in contexts of the formt(u). Note that if t(u) is a context and if x appears in u, then t is a ground term;and vice-versa.

In the definition below, we remind you that subterms are not necessarily proper.That, is a variable x is a subterm of itself.

Definition 9.11 Fix a model M for L. Let x : ρ, and let t : σ be a context.We associate to t a set function

ft : Pρ → Pσ

in the following way:


(1) If t = x, so that σ = ρ, then fx : Pσ → Pσ is the identity.(2) If t is u(v) with u : (τ, σ) and v : τ , and if x is a subterm of u, then ft is

app[[v]] · fu. That is, ft(u) is

a ∈ Pρ 7→ fu(a)([[v]]) .

(3) If t is u(v) with u : (τ, σ) and v : τ , and if x is a subterm of v, then ft is[[u]] · fv. That is, ft is

a ∈ Pρ 7→ [[u]](fv(a)) .

The idea of ft is that as a ranges over its interpretation space Pρ, ft(a) wouldbe the result of substituting various values of this space in for the variable, andthen evaluating the result.

Notice that we defined ft as a set function and wrote ft : Pρ → Pσ instead offt : Pρ → Pσ. The reason why we did this is that it is not immediately clear thatft is monotone. This is the content of the following result.

Lemma 9.12 (Context Lemma) Let t be a context, where t : σ and x : ρ. Thenft is element of [Pρ,Pσ].

Proof By induction on t. In the case that t is a type variable x : σ, then fx isthe identity on Pσ, and indeed the identity is a monotone function on any preorder.

Next, assume that x occurs in u. By induction hypothesis, fu belongs to[Pρ,P(τ,σ)] = [Pρ, [Pτ ,Pσ]]. The function app[[u]] belongs to [[Pτ ,Pσ],Pσ] byProposition 9.4, part (1), and so ft belongs to [Pρ,Pσ] by Proposition 9.5, part(4).

Finally, assume that x occurs in v. By induction hypothesis, fv : Pρ → Pτ .And [[u]] ∈ P(τ,σ), so that [[u]] : Pτ → Pσ. By Proposition 9.5, part (4), [[u]] · fv isan element of [Pρ → Pσ]. �

This Context Lemma is exactly the soundness fact underlying internalized po-larity markers. We’ll see some examples in the next section. Notice also that it isa very general result: the only facts about preorders were the very general resultsin Section 9.3.

The Context Lemma also allows one to generalize our work from the applicative(AB) grammars to the setting of CG using the Lambek Calculus. In detail, onegeneralizes the notion of a context to one which allows more than one free variablebut requires that all variables which occur free have only one free occurrence. Sup-pose that x1 : ρ1, . . . , xn : ρn are the free variables in such a generalized contextt(x1, . . . , xn) : σ. Then t defines a set function a set function ft :

∏i Pρi → Pσ. A

generalization of the Context Lemma then shows that ft is monotone as a functionfrom the product preorder

∏iPρi

. So functions given by lambda abstraction oneach of the variables are also monotone, and this amounts to the soundness of theintroduction rules of the Lambek Calculus in the internalized setting.

This point an advantage of the internalized system in comparison to the mono-tonicity calculus: in the latter approach, the results of polarity determination be-come side conditions on the introduction rules for the lambda operator.

9.4.3. Logic. Figure 9.2 several sound logical principles. These are implicitin Fyodorov, Winter, and Francez [6]; see also Zamansky, Francez, and Winter [39]for a more developed proposal. The rules the reflexive and transitive properties ofthe preorder, the fact that all function types are interpreted by sets of monotone

9.5. EXAMPLES AND DISCUSSION 117

t : σ ≤ t : σt : σ ≤ u : σ u : σ ≤ v : σ

t : σ ≤ v : σ

u : σ ≤ v : σ t : (σ, τ)t(u) : τ ≤ t(v) : τ

u : (σ, τ) ≤ v : (σ, τ) t : σu(t) : τ ≤ v(t) : τ

Figure 9.2. Monotonicity Principles

functions, and the pointwise definition of the order on function sets. The rulesin Figure 9.2 define a logical system whose assertions order statements such asu : σ ≤ v : σ; then the statements such as u : σ become side conditions on therules. (For simplicity, we are only dealing with ground terms u and v, but it isnot hard to generalize the treatment to allow variables.) The logic thus defines arelation Γ ` ϕ on order statements. Given a model M and an order statement ψ ofthe form u : σ ≤ v : σ, we say that M satisfies ψ iff [[u]] ≤ [[v]] in Pσ. The soundnessof the logic is the statement that every model M satisfying all of the sentences inΓ also satisfies ϕ.

The Context Lemma then gives another sound principle:

(9.14)x : σ in the context t u : σ ≤ v : σ

t(u) : τ ≤ t(v) : τ

However, all instances of (9.14) are already provable in the logic as we have definedit. This is an easy inductive argument on the context t.

9.5. Examples and Discussion

We present a small example to illustrate the ideas. It is a version of the languageRC which we saw in Chapter 6. We also take the opportunity to discuss internalizedmarking in a general way. We have a few comments on the challenges of buildinga logic based on internalized markings. And we also have a discussion of the wordany and how it might be treated.

First, we describe a language L corresponding to this vocabulary. Let’s takeour set T0 of basic types to be {t, pr}. (These stand for truth value and property.In more traditional presentations, the type pr might be (e, t), where e is a type ofentities.)

Here are the constants of the language L and their types:(1) We have typed constants

every+ : (−pr, (pr, t))some+ : (pr, (pr, t))no+ : (−pr, (−pr, t))any+ : (−pr, (pr, t))

every− : (pr, (−pr,−t))some− : (−pr, (−pr,−t))no− : (pr, (pr,−t))any− : (−pr, (−pr,−t))

(2) We fix a set of unary atoms corresponding to some plural nouns andlexical verb phrases in English. For definiteness, we take cat, dog, animal,


runs, sleeps. Each unary atom p gives two typed constants: p+ : pr andp− : −pr.

(3) We also fix a set of binary atoms corresponding to some transitive verbsin English. To be definite, we take chase, see. Every binary atom r givesfour type constants:

r+1 : ((pr, t), pr)

r−1 : ((−pr,−t),−pr)r+2 : ((−pr, t), pr)

r−2 : ((pr,−t),−pr)This completes the definition of our typed language L.

We might mention that the superscripts + and − on the constants c : σ of L

are exactly the polarities pol(σ) of the associated types as we defined them in theprevious section. These notations + and − are mnemonic; we could do withoutthem.

As always with categorial grammars, a lexicon is a set of pairs consisting ofwords in a natural language together with terms. We have been writing the wordsin the target language in italics, and then terms for them are written in sans serif. Itis very important that the lexicon allows a given word to appear with many terms.As we have seen, we need every to appear with every+ and every−, for example.We still are only concerned with the syntax at this point, and the semantics willenter once we have seen some examples.

Examples of typed terms and contexts. Here are a few examples of typed termsalong with their derivations. First, here is a derivation of the term correspondingto the English sentence No dog chases every cat :

no+ : (−pr, (−pr, t)) dog− : −prno+(dog−) : (−pr, t)

chase−1 : ((−pr,−t),−pr)every− : (pr, (−pr,−t)) cat+ : pr

every−(cat+) : (−pr,−t)chase−1 (every−(cat+)) : −pr

no+(dog−)(chase−1 (every−(cat+))) : t

One should compare this treatment with what we saw of the same sentence inthe Introduction. The internalized setting does not have steps of MonotonicityMarking and Polarity Determination. It only involves a derivation in the applicativeCG at hand. Please also keep in mind that the superscripts + and − above arefrom the lexicon, not from any additional process, and that the lexicon could havebeen formulated without those superscripts. They are in fact the polarities ofthe attendant categories. (For example, every− is of type (pr, (−pr,−t)), and thepolarity of its occurrence above is −.)

We similarly have the following terms:

some+(dog+)(chase+1 (every+(cat−))) : t

some+(dog+)(chase+2 (no+(cat−))) : t

no+(dog−)(chase−2 (no+(cat+))) : t

The point here is that all four different typings of the transitive verbs are neededto represent sentences of English. I do not believe that this point has been noticedabout the internalized scheme before. It raises an issue that we shall take up inthe next subsection: how would a grammar writer come up with all of the neededcategories?

Here is an example of a context: no+(x : −pr)(chase−1 (every−(cat+))) : t. Sox is a variable of type −pr. In any model, this context gives a function frominterpretations of type −pr to those of type −t. The Context Lemma would tell us


that this function is a monotone function. Turning things around, it would be anantitone function from interpretations of type −pr to those of type −t.

Any. The internalized approach enables a treatment of any that has it anymean the same thing as every when it has positive polarity, and the same thing assome when it has negative polarity. For example, here is a sentence intended tomean everything which sees any cat runs:

every+ : (−pr, (pr, t))see−2 : ((−pr,−t),−pr)

any− : (−pr, (−pr,−t)) cat− : −prany−(cat−) : (−pr,−t)

see−2 (any−(cat−)) : −prevery+(see−2 (any−(cat−))) : (pr, t) runs+ : pr

every+(see−2 (any−(cat−))(runs+) : t

The natural reading is for any to have an existential reading. Another contextfor existential readings of any is in the antecedent of a conditional. In the otherdirection, consider

any+(cat−)(see−1 (any+(dog−))) : t.The natural reading of both occurrences is universal.

9.5.1. Standard models. Up until now, we have only given one example ofa typed language L. Now we describe a family of models for this language. Thefamily is based sets with interpretations of the unary and binary atoms. To makethis precise, let us call a pre-model a structure M0 consisting of a set M togetherwith subsets [[p]] ⊆ M for unary atoms p and relations [[r]] ⊆ M ×M for binaryatoms r.

Every pre-model M0 now gives a bona fide model M of L in the following way.The underlying universe M gives a flat preorderM. We take Ppr = [M,2] ∼= P(M).We also take Pt = 2.

We interpret the typed constants p+ : pr corresponding to unary atoms by

[[p+]](m) = T iff m ∈ [[p]].

(On the right we use the interpretation of p in the model M.) Usually we write p+

instead of [[p]].The constants p− : −pr are interpreted by the same functions.The interpretations of every+, some+, and no+ are given in Example 9.8 (taking

the set X to be the universe M of M). And the interpretations of every−, some−,and no− are the same functions.

We interpret any+ the same way as every+, and any− the same way as some+.Recall that the binary atom r gives four typed constants r+

1 , r−1 , r+2 , and r−2 .

These are all interpreted in model in the same way, by

r∗(q)(m) = q({m′ ∈M : [[r]](m,m′)})It is clear that for all q ∈ P(pr,t) = [[M,2],2], r∗(q) is a function from M to {T,F}.The monotonicity of this function is trivial, since M is flat.

We do need to check that all of the interpretations of the constants are correctlytyped. This is important, and it raises a general issue.


Issue 1: What are the sources of the multiple typings of the lexical items? Asthis chapter draws to a close, we wish to identify some issues with the internalizedscheme of polarity marking. Here is the first: The scheme requires many entries foreach lexical item, usually with the same semantics in every model. These amount tomultiple typings of the same function. What is the source of these multiple typings,and can they be automatically inferred?

The principal source for multiple typings is the fact noted in Proposition 9.5,part (3): [−P,−Q] = −[P,Q]. That is, we have two (opposite) order relations onthe same set, and so it makes sense that we should type an element of that setin two ways. This accounts for the multiple typings of nouns and determiners inour example, but not for transitive verbs. For them, and for all categories in morecomplicated languages, we suspect that there is no simple answer. Perhaps the bestone could hope for would be a formal system in which to derive the various typings.In the same way that one could propose logical systems which process inferences inparallel with syntactic derivations, one could hope to do the same thing for orderstatements. Here is our vision of a what a derivation might look like in such asystem. Let us write ϕ(m, r) for {m′ ∈ M : [[r]](m,m′)}. A justification for thetyping r−2 : [[[M,2],−2], [M,−2]] could be the derivation below:

q ≤ q′ in [[M,2],−2] ϕ(m, r) ∈ [M,2]

r−2 (q) = q(ϕ(m, r)) ≤ q′(ϕ(m, r)) = r−2 (q′) in − 2r−2 (q) ≤ r−2 (q′) in [M,−2]

r−2 : [[[M,2],−2], [M,−2]]

The challenge would be to propose a system which is small enough to be decidable,and yet big enough to allow us to use the notation ϕ(m, r) as we have done it.

Here is a related point. As Dowty [4] notes, the parsing problem for internal-ized grammars is no harder than that of CG in the first place. In fact, the workfor internalized grammars is a special case, since they are CGs with nothing ex-tra. However, this could be a misleading comparison: if one is given a CG G andsome additional information (for example, the intended meanings of the constantscorresponding to determiners), one would need to convert this to an internalizedgrammar G∗. It may be that the size of the lexicon of G∗ is exponential in the sizeof the lexicon of G. The fact that in our small grammar we need four typings forthe verbs makes one worry. And in the worst case, the internalized scheme wouldbe less attractive for computational purposes.

Issue 2: what is the relation of polarity and monotonicity? It is “well-known”in this area that negative polarity items are those which must occur, or usuallyoccur in “downward monotone positions” in a given sentence. But without sayingwhat the orders on the semantic spaces, this point could be confusing. For example,suppose one has a reason to assume that the order on truth values is T < F ratherthan F < T. In this case, the notions of upward monotone positions and downwardmonotone ones would switch. More seriously, given a complicated function type(σ,−τ), it is not so clear whether this should be recast as −(−σ, τ). In this case, itwould not be so clear what the upward and downward monotone positions are in asentence.

This matter might be clarified by looking at no vs. at most one in a sentencesuch as No dog chases every cat. As we have seen, our treatment suggests thatthe occurrence of no is in a positive position, and so we expect it to be monotone


increasing. We would like to say that no ≤ at most one, since this is the verdictfrom generalized quantifier theory. Indeed, the interpretations functions no andat most one are taken to be functions of type ((M, 2), ((M, 2), 2)), and then theorder on this space is taken to be ultimately determined by the last “2”. However,we do not have a single function no but rather two functions, no+ and no−. Theseare not quite restrictions of no and at most one. For that matter, why exactly do wesay that no+ is monotone increasing? It’s type was (−pr, (−pr, t)), after all. Whatseems to be going on here is that in a curried form, we have (−pr×−pr, t). Whencurried like this, no+ and no−. are restrictions of no and at most one. Moreover,the order on on a product space is determined by the codomain order, forgettingthe domain order. Also, the “meaning space” in a model for types pr and −pr arethe same. Taken together, this means that no+ indeed corresponds to a monotonefunction and no− to an antitone function.

Here is a general definition of the polarity of a type. Define pol(σ) : T1 → {+,−}as follows:

(1) If σ ∈ T0, then pol(σ) = +.(2) pol((σ, τ)) = pol(τ).(3) pol(−σ) = −pol(σ).

It is easy to check that if σ ≡ τ , then pol(σ) = pol(τ). (It is enough to show thatpol(−(−σ)) = pol(σ), and also that pol(−(σ, τ)) = pol((−σ,−τ)).) This allows usdefine pol on the quotient set, T1/≡, our set T of types.

The definition of the pol function gives an algorithm to calculate pol(σ). An-other way to do this would be to: (1) put σ in “negation normal form”, by drivingthe - signs through function spaces so that there are no function types inside thescope of any − sign; (2) remove an even number of − signs from basic types, sothat the remaining basic types have either zero or one − sign in front; (3), finally,starting at the top, proceed “to the right” through the term until one reaches eithera basic type or its negation.

Let u be a subterm occurrence in the term t. Then there is a unique contextt′(x) such that t = t′[u ← x]. Let σ be the type of x in t′. We say that u haspositive polarity if pol(x) = +, and that u has negative polarity if pol(x) = −.

We naturally would like to say that if an occurrence u has positive polarity int, then that occurrence is in a monotone increasing position, and if the occurrencehas negative polarity, then it is in a monotone decreasing position.

Issue 3: how can we build a logic to reason about classes of standard models?Our last issue concerns the contribution of the types in an internalized CG to anatural logic. The matter is illustrated by a discussion concerning the example lan-guage L presented at the beginning of Section 9.5. Suppose we are given entailmentrelations concerning the unary and binary atoms of L. For unary atoms, these takethe form p ⇒ q, where p and q are unary atoms. For binary atoms, entailmentrelations look similarly: r ⇒ s. (Incidentally, one is also interested in exclusionrelations: see MacCartney and Manning [14] for applications, and Icard [10] fortheory behind this.)

These entailment relations correspond to semantic restrictions on standardmodels. The first corresponds to the requirement that [[p]] ⊆ [[q]], and the sec-ond to the parallel requirement that [[r]] ⊆ [[s]]. Working with entailment relationsin this way means restricting attention to standard models which conform to the re-strictions. It is natural to then look for additional axioms and/or rules of inference


in the logical system that would give a sound and complete logic for these models.The entailment relations correspond to axioms of the sentential kind, and also onthe orders. The sentential axioms can be stated only for the unary atoms, and theywould be sentences like (every+(p−))q−. For the binary atoms, the statements aremore involved and we leave them as exercises. Turning to axioms on the orders,our entailment relations give us the following:

p+ ≤ q+ in prq− ≤ p− in −prr+1 ≤ s+

1 in ((pr, t), pr)

r−1 ≤ s+1 in ((−pr,−t),−pr)

r+2 ≤ s+

2 in ((−pr, t), pr)r−2 ≤ s−2 in ((pr,−t),−pr)

However, even adopting both kinds of axioms on top of the logical principles wesaw in Section 9.4.3 would not result in a complete logical system, so it would fallshort of what we want in a natural logic. One source of problems was noted inZamansky, Francez, and Winter [39]: the system would not have any way to deriveprinciples corresponding to de Morgan’s laws. Another problem for us would beeven more basic. The resulting logic does not appear to be strong enough to derivethe sound sentences of the form every X is an X. To be sure, there are unsoundsentences of this form in the logic, due to the typing. For example, the way we aredoing things, every dog who chases any cat chases any cat is not a logical validity atall, since the first any is to be interpreted existentially and the second universally.(I first heard an example of this type in a lecture of Barbara Partee.) But withoutany, we would like to derive, for example

every+(see−1 (every−(cat+)))(see+1 (every+(cat−)))

At this point, I can merely speculate and offer a suggestion. My feeling is thatone should look to logics with individual variables, and then the properties of theindividual words become rules of inference in the logic. Specifically, it should bethe case that if x is an individual or NP-level variable, then one should be able toinfer cat+(x) from cat−(x), and vice-versa. For more on logics of this type, see [19].However, that paper does not build on the framework of categorial grammar andhas nothing to say about polarity and monotonicity. Integrating the work therewith the proposal in this chapter is the most important open issue in work on thistopic.

Conclusion. This chapter explores what happens to CG if we adopt the ideathat interpretation needs to happen in ordered domains, especially if we add tothe type formation rules a construction for the opposite order. Doing this gives aContext Lemma in a straightforward way, and the Context Lemma is tantamountto the soundness of the internalized polarity marking scheme. The types in theinternalized scheme are more plentiful than in ordinary CG, and this is a source ofissues as well as possibilities. The issues concern how the new typings are to begenerated, and how they are to interact with each other in proof-theoretic settings.There certainly is more to do on this matter, and this chapter has only providedthe groundwork. With some additional work, one could easily imagine that theinternalized scheme will be an important contribution to the research agenda ofnatural logic.

Bibliography

[1] Bernardi, R. Reasoning with Polarity in Categorial Type Logic. PhD thesis,University of Utrecht, 2002.

[2] Calude, C. S., Hertling, P. H., and Svozil, K. Embedding quantumuniverses into classical ones. Foundations of Physics 29, 3 (1999), 349–379.

[3] Christodoulopoulos, C. Creating a natural logic inference system withcombinatory categorial grammar. Master’s thesis, University of Edinburgh,2008.

[4] Dowty, D. The role of negative polarity and concord marking in naturallanguage reasoning. In Proceedings of SALT IV. Cornell University, Ithaca,NY, 1994, p. 114144.

[5] Fitch, F. B. Natural deduction rules for English. Philosophical Studies 24,2 (1931), 89–104.

[6] Fyodorov, Y., Winter, Y., and Francez, N. Order-based inference innatural logic. Log. J. IGPL 11, 4 (2003), 385–417. Inference in computationalsemantics: the Dagstuhl Workshop 2000.

[7] Gradel, E., Kolaitis, P., and Vardi, M. On the decision problem fortwo-variable first-order logic. Bulletin of Symbolic Logic 3, 1 (1997), 53–69.

[8] Gradel, E., Otto, M., and Rosen, E. Undecidability results on two-variable logics. Archive for Mathematical Logic 38 (1999), 313–354.

[9] Harel, D., Kozen, D., and Tiuryn, J. Dynamic Logic. MIT Press, Cam-bridge, MA., 2000.

[10] Icard, T. Exclusion and containment in natural language. ms., StanfordUniversity, 2010.

[11] Katrnoska, F. On the representation of orthocomplemented posets. Com-ment. Math. Univ. Carolinae 23 (1982), 489–498.

[12] Lukasiewicz, J. Aristotle’s Syllogistic, 2nd ed. Clarendon Press, Oxford,1957.

[13] Lutz, C., and Sattler, U. The complexity of reasoning with boolean modallogics. In Advances in Modal Logics, F. Wolter, H. Wansing, M. D. Rijke, andM. Zakharyaschev, Eds., vol. 3. CSLI Publications, Stanford, 2001.

[14] MacCartney, B., and Manning, C. D. An extended model of naturallogic. In Proceedings of the Eighth International Conference on ComputationalSemantics (IWCS-8) (Tilburg, Netherlands, 2009).

[15] McAllester, D. A., and Givan, R. Natural language syntax and first-orderinference. Artificial Intelligence 56 (1992), 1–20.

[16] Moortgat, M. Categorial type logics. In Handbook of Logic and Language.MIT Press, Cambridge, MA, 1997, pp. 93–178.

[17] Mortimer, M. On languages with two variables. Zeitschrift fur Mathematis-che Logik und Grundlagen der Mathematik 21 (1975), 135–140.

123

124 Bibliography

[18] Moss, L. S. Completeness theorems for syllogistic fragments. In Logics forLinguistic Structures, F. Hamm and S. Kepser, Eds. Mouton de Gruyter, 2008,pp. 143–173.

[19] Moss, L. S. Logics for two fragments beyond the syllogistic boundary. InFields of Logic and Computation: Essays Dedicated to Yuri Gurevich on theOccasion of His 70th Birthday, vol. 6300 of LNCS. Springer-Verlag, 2010.

[20] Moss, L. S. Syllogistic logics with comparative adjectives. special issues ofpapers from MoL 2007, to appear.

[21] Moss, L. S. Syllogistic logics with verbs. Journal of Logic and Computation,to appear.

[22] Moss, L. S. Syllogistic logic with complements. In Games, Norms and Rea-sons: Proceedings of the Second Indian Conference on Logic and its Applica-tions. Springer Synthese Library Series, Mumbai, to appear 2010, p. 19 pp.

[23] Nairn, R., Condoravdi, C., and Karttunen, L. Computing relativepolarity for textual inference. In Proceedings of ICoS-5 (Inference in Compu-tational Semantics) (Buxton, UK, 2006).

[24] Nishihara, N., Morita, K., and Iwata, S. An extended syllogistic sys-tem with verbs and proper nouns, and its completeness proof. Systems andComputers in Japan 21, 1 (1990), 760–771.

[25] Pratt-Hartmann, I. Fragments of language. Journal of Logic, Languageand Information 13 (2004), 207–223.

[26] Pratt-Hartmann, I. On the computational complexity of the numericallydefinite syllogistic and related logics. Bulletin of Symbolic Logic 14, 1 (2008),1–28.

[27] Pratt-Hartmann, I. On the computational complexity of the numericallydefinite syllogistic and related logics. Bulletin of Symbolic Logic 14, 1 (2008),1–28.

[28] Pratt-Hartmann, I. No syllogisms for the numerical syllogistic. In Lan-guages: from Formal to Natural, vol. 5533 of LNCS. Springer, 2009, pp. 192–203.

[29] Pratt-Hartmann, I., and Moss, L. S. Logics for the relational syllogistic.Review of Symbolic Logic 2, 4 (2009), 647–683.

[30] Purdy, W. C. Inexpressiveness of first-order fragments. Australasian Journalof Logic 4 (2006), 1–12.

[31] Schaefer, T. J. The complexity of satisfiability problems. In Proceedingsof the Tenth Annual ACM Symposium on Theory of Computing (San Diego,California, 1978), pp. 216–226.

[32] Valencia, V. M. S. Studies on natural logic and categorical grammar. PhDthesis, University of Amsterdam, 1991.

[33] van Benthem, J. Essays in logical semantics, vol. 29 of Studies in Linguisticsand Philosophy. D. Reidel Publishing Co., Dordrecht, 1986.

[34] van Benthem, J. Language in Action, vol. 130 of Studies in Logic and theFoundations of Mathematics. North Holland, Amsterdam, 1991.

[35] van Benthem, J. A brief history of natural logic. In Logic, Navya-Nyaya andApplications, Homage to Bimal Krishna Matilal, M. N. M. M. Chakraborty,B. Lowe and S. Sarukkai, Eds. College Publications, London, 2008.

[36] van Dalen, D. Logic and Structure, fourth ed. Springer-Verlag, Berlin, 2004.[37] Vardi, M. Y., and Wolper, P. Automata-theoretic techniques for modal

Bibliography 125

logics of programs. Journal of Computer and System Sciences 32 (1986), 183–221.

[38] Westerstahl, D. Aristotelian syllogisms and generalized quantifiers. StudiaLogica XLVIII, 4 (1989), 577–585.

[39] Zamansky, A., Francez, N., and Winter, Y. A ‘natural logic’ inferencesystem using the Lambek calculus. J. Log. Lang. Inf. 15, 3 (2006), 273–295.

[40] Ziegler, S. Some completeness results in syllogistic logic. Student Reports,Research Experience for Undergraduates Program, Indiana University, 2008.

[41] Zierler, N., and Schlessinger, M. Boolean embeddings of orthomodularsets and quantum logic. Duke Mathematical Journal 32 (1965), 251–262.

Index

atom

binary, 49

complemented, 30unary, 5, 49

boolean algebra, 24power set, 24

syntactic, 25

Boolean modal logic, 51bottom (⊥), 53

categorial grammar, 109

completelogical system, 7

set of sentences, 55strongly, 22

weakly, 22

consistent, 16, 55

ex falso quodlibet, 16

filter, 25

inconsistent, 16, 55

literal, 13, 30, 49logic

Boolean modal, 51

sentential, 21–27logical system

sentential, 23

model, 5, 50, 114

negationsemantic, 52

orthoposet, 33morphism of, 34

point of, 34

polaritynegative, 121

positive, 121

poset, 33preorder, 8, 33

flat, 111

function space, 111opposite, 112

proof tree, 6, 31, 53

RAA, 52

set terms, 49sound, 7

syllogistic proof system, 59

syllogistic rule, 59

tautology, 22

ultrafilter, 25

valuation, 21

127

iulg.sitehost.iu.edu · contents chapter 1. introduction 1 1.1. logic for natural language, logic...

Documents