contents - university of...

[git] • Branch: master @ 6af5838 • Release: (2017-01-14)

OLP: Example a la Enderton

Open Logic Project

Revision: 6af5838 (2017-01-14)

Contents

I First-order Logic 3

1 Syntax and Semantics 51.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 First-Order Languages . . . . . . . . . . . . . . . . . . . . . . 61.3 Terms and Wffs . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Unique Readability . . . . . . . . . . . . . . . . . . . . . . . . 91.5 Main operator of a Formula . . . . . . . . . . . . . . . . . . . 121.6 Subformulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.7 Free Variables and Sentences . . . . . . . . . . . . . . . . . . . 131.8 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.9 Structures for First-order Languages . . . . . . . . . . . . . . 151.10 Covered Structures for First-order Languages . . . . . . . . . 171.11 Satisfaction of a Wff in a Structure . . . . . . . . . . . . . . . 171.12 Extensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.13 Semantic Notions . . . . . . . . . . . . . . . . . . . . . . . . . 22

2 Theories and Their Models 252.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2 Expressing Properties of Structures . . . . . . . . . . . . . . . 262.3 Examples of First-Order Theories . . . . . . . . . . . . . . . . 272.4 Expressing Relations in a Structure . . . . . . . . . . . . . . . 292.5 The Theory of Sets . . . . . . . . . . . . . . . . . . . . . . . . 302.6 Expressing the Size of Structures . . . . . . . . . . . . . . . . 33

1


2 CONTENTS

3 The Sequent Calculus 353.1 Rules and Derivations . . . . . . . . . . . . . . . . . . . . . . . 353.2 Examples of Derivations . . . . . . . . . . . . . . . . . . . . . 373.3 Proof-Theoretic Notions . . . . . . . . . . . . . . . . . . . . . 413.4 Properties of Derivability . . . . . . . . . . . . . . . . . . . . . 423.5 Soundness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.6 Derivations with Equality symbol . . . . . . . . . . . . . . . . 47

4 Natural Deduction 494.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2 Rules and Derivations . . . . . . . . . . . . . . . . . . . . . . . 504.3 Examples of Derivations . . . . . . . . . . . . . . . . . . . . . 534.4 Proof-Theoretic Notions . . . . . . . . . . . . . . . . . . . . . 594.5 Properties of Derivability . . . . . . . . . . . . . . . . . . . . . 604.6 Soundness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.7 Derivations with Equality symbol . . . . . . . . . . . . . . . . 654.8 Soundness of Equality symbol Rules . . . . . . . . . . . . . . . 66

5 The Completeness Theorem 675.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.2 Outline of the Proof . . . . . . . . . . . . . . . . . . . . . . . . 675.3 Maximally Consistent Sets of Sentences . . . . . . . . . . . . . 695.4 Henkin Expansion . . . . . . . . . . . . . . . . . . . . . . . . . 705.5 Lindenbaum’s Lemma . . . . . . . . . . . . . . . . . . . . . . . 715.6 Construction of a Model . . . . . . . . . . . . . . . . . . . . . 725.7 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.8 The Completeness Theorem . . . . . . . . . . . . . . . . . . . 765.9 The Compactness Theorem . . . . . . . . . . . . . . . . . . . . 765.10 The Lowenheim-Skolem Theorem . . . . . . . . . . . . . . . . 78

6 Beyond First-order Logic 816.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.2 Many-Sorted Logic . . . . . . . . . . . . . . . . . . . . . . . . 826.3 Second-Order logic . . . . . . . . . . . . . . . . . . . . . . . . 836.4 Higher-Order logic . . . . . . . . . . . . . . . . . . . . . . . . 866.5 Intuitionistic Logic . . . . . . . . . . . . . . . . . . . . . . . . 896.6 Modal Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.7 Other Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93


Part I

First-order Logic

3


Chapter 1

Syntax and Semantics

1.1 Introduction

In order to develop the theory and metatheory of first-order logic, we mustfirst define the syntax and semantics of its expressions. The expressions offirst-order logic are terms and wffs. Terms are formed from variables, constantsymbols, and function symbols. Wffs, in turn, are formed from predicate sym-bols together with terms (these form the smallest, “atomic” wffs), and thenfrom atomic wffs we can form more complex ones using logical connectives andquantifiers. There are many different ways to set down the formation rules;we give just one possible one. Other systems will chose different symbols, willselect different sets of connectives as primitive, will use parentheses differently(or even not at all, as in the case of so-called Polish notation). What all ap-proaches have in common, though, is that the formation rules define the set ofterms and wffs inductively. If done properly, every expression can result essen-tially in only one way according to the formation rules. The inductive definitionresulting in expressions that are uniquely readable means we can give meaningsto these expressions using the same method—inductive definition.

Giving the meaning of expressions is the domain of semantics. The centralconcept in semantics is that of satisfaction in a structure. A structure givesmeaning to the building blocks of the language: a domain is a non-emptyset of objects. The quantifiers are interpreted as ranging over this domain,constant symbols are assigned elements in the domain, function symbols areassigned functions from the domain to itself, and predicate symbols are assignedrelations on the domain. The domain together with assignments to the basicvocabulary constitutes a structure. Variables may appear in wffs, and in orderto give a semantics, we also have to assign elements of the domain to them—thisis a variable assignment. The satisfaction relation, finally, brings these together.A wff may be satisfied in a structure A relative to a variable assignment s,written as |=A α[s]. This relation is also defined by induction on the structureof α, using the truth tables for the logical connectives to define, say, satisfactionof α ∧ β in terms of satisfaction (or not) of α and β. It then turns out that

5


6 CHAPTER 1. SYNTAX AND SEMANTICS

the variable assignment is irrelevant if the wff α is a sentence, i.e., has no freevariables, and so we can talk of sentences being simply satisfied (or not) instructures.

On the basis of the satisfaction relation |=A α for sentences we can thendefine the basic semantic notions of validity, entailment, and satisfiability. Asentence is valid, α, if every structure satisfies it. It is entailed by a set ofsentences, Γ α, if every structure that satisfies all the sentences in Γ alsosatisfies α. And a set of sentences is satisfiable if some structure satisfies allsentences in it at the same time. Because wffs are inductively defined, andsatisfaction is in turn defined by induction on the structure of wffs, we canuse induction to prove properties of our semantics and to relate the semanticnotions defined.

1.2 First-Order Languages

Expressions of first-order logic are built up from a basic vocabulary containingvariables, constant symbols, predicate symbols and sometimes function symbols.From them, together with logical connectives, quantifiers, and punctuationsymbols such as parentheses and commas, terms and wffs are formed.

Informally, predicate symbols are names for properties and relations, con-stant symbols are names for individual objects, and function symbols are namesfor mappings. These, except for the equality symbol =, are the non-logical sym-bols and together make up a language. Any first-order language L is determinedby its non-logical symbols. In the most general case, L contains infinitely manysymbols of each kind.

In the general case, we make use of the following symbols in first-order logic:

1. Logical symbols

a) Logical connectives: ¬ (negation), → (conditional), ∀ (universalquantifier).

b) The two-place equality symbol =.

c) A denumerable set of variables: v0, v1, v2, . . .

2. Non-logical symbols, making up the standard language of first-order logic

a) A denumerable set of n-place predicate symbols for each n > 0: An0 ,An1 , An2 , . . .

b) A denumerable set of constant symbols: c0, c1, c2, . . . .

c) A denumerable set of n-place function symbols for each n > 0: fn0 ,fn1 , fn2 , . . .

3. Punctuation marks: (, ), and the comma.

Most of our definitions and results will be formulated for the full standardlanguage of first-order logic. However, depending on the application, we may


1.2. FIRST-ORDER LANGUAGES 7

also restrict the language to only a few predicate symbols, constant symbols,and function symbols.

Example 1.2.1. The language LA of arithmetic contains a single two-place pred-icate symbol <, a single constant symbol 0, one one-place function symbol ′,and two two-place function symbols + and ×.

Example 1.2.2. The language of set theory LZ contains only the single two-place predicate symbol ∈.

Example 1.2.3. The language of orders L≤ contains only the two-place predi-cate symbol ≤.

Again, these are conventions: officially, these are just aliases, e.g., <, ∈,and ≤ are aliases for A2

0, 0 for c0, ′ for f10 , + for f20 , × for f21 .

In addition to the primitive connectives and quantifier introduced above,we also use the following defined symbols: ∧ (conjunction), ∨ (disjunction), ↔(biconditional), ∃ (existential quantifier), falsity ⊥, truth >

A defined symbol is not officially part of the language, but is introduced asan informal abbreviation: it allows us to abbreviate formulas which would, if weonly used primitive symbols, get quite long. This is obviously an advantage.The bigger advantage, however, is that proofs become shorter. If a symbolis primitive, it has to be treated separately in proofs. The more primitivesymbols, therefore, the longer our proofs.

You may be familiar with different terminology and symbols than the oneswe use above. Logic texts (and teachers) commonly use either ∼, ¬, and ! for“negation”, ∧, ·, and & for “conjunction”. Commonly used symbols for the“conditional” or “implication” are→, ⇒, and ⊃. Symbols for “biconditional,”“bi-implication,” or “(material) equivalence” are ↔, ⇔, and ≡. The ⊥ sym-bol is variously called “falsity,” “falsum,”, “absurdity,”, or “bottom.” The >symbol is variously called “truth,” “verum,”, or “top.”

It is conventional to use lower case letters (e.g., a, b, c) from the beginningof the Latin alphabet for constant symbols (sometimes called names), and lowercase letters from the end (e.g., x, y, z) for variables. Quantifiers combine withvariables, e.g., x; notational variations include ∀x, (∀x), (x), Πx,

∧x for the

universal quantifier and ∃x, (∃x), (Ex), Σx,∨x for the existential quantifier.

We might treat all the propositional operators and both quantifiers as prim-itive symbols of the language. We might instead choose a smaller stock ofprimitive symbols and treat the other logical operators as defined. “Truthfunctionally complete” sets of Boolean operators include ¬,∨, ¬,∧, and¬,→—these can be combined with either quantifier for an expressively com-plete first-order language.

You may be familiar with two other logical operators: the Sheffer stroke |(named after Henry Sheffer), and Peirce’s arrow ↓, also known as Quine’s dag-ger. When given their usual readings of “nand” and “nor” (respectively), theseoperators are truth functionally complete by themselves.



1.3 Terms and Wffs

Once a first-order language L is given, we can define expressions built up fromthe basic vocabulary of L. These include in particular terms and wffs.

Definition 13A (Terms) The set of terms Trm(L) of L is defined induc-tively by:

1. Every variable is a term.

2. Every constant symbol of L is a term.

3. If f is an n-place function symbol and t1, . . . , tn are terms, then ft1 . . . tnis a term.

A term containing no variables is a closed term.

The constant symbols appear in our specification of the language and theterms as a separate category of symbols, but they could instead have beenincluded as zero-place function symbols. We could then do without the secondclause in the definition of terms. We just have to understand ft1 . . . tn as justf by itself if n = 0.

Definition 13B (Formula) The set of wffs Frm(L) of the language L isdefined inductively as follows:

1. If R is an n-place predicate symbol of L and t1, . . . , tn are terms of L,then Rt1 . . . tn is an atomic wff.

2. If t1 and t2 are terms of L, then = t1t2 is an atomic wff.

3. If α is a wff, then ¬α is wff.

4. If α and β are wffs, then (α→ β) is a wff.

5. If α is a wff and x is a variable, then ∀xα is a wff.

The definitions of the set of terms and that of wffs are inductive definitions.Essentially, we construct the set of wffs in infinitely many stages. In the initialstage, we pronounce all atomic formulas to be formulas; this corresponds tothe first few cases of the definition, i.e., the cases for Rt1 . . . tn and = t1t2.“Atomic wff” thus means any wff of this form.

The other cases of the definition give rules for constructing new wffs out ofwffs already constructed. At the second stage, we can use them to constructwffs out of atomic wffs. At the third stage, we construct new formulas fromthe atomic formulas and those obtained in the second stage, and so on. A wffis anything that is eventually constructed at such a stage, and nothing else.

By convention, we write = between its arguments and leave out the paren-theses: t1 = t2 is an abbreviation for = t1t2. Moreover, ¬ = t1t2 is abbreviatedas t1 6= t2. When writing a formula (β ∗ γ) constructed from β, γ using a two-place connective ∗, we will often leave out the outermost pair of parenthesesand write simply β ∗ γ.


1.4. UNIQUE READABILITY 9

Some logic texts require that the variable x must occur in α in order for∀xα to count as a wff. Nothing bad happens if you don’t require this, and itmakes things easier.

Definition 13C Formulas constructed using the defined operators are to beunderstood as follows:

1. > abbreviates (α ∨ ¬α) for some fixed atomic wff α.

2. ⊥ abbreviates (α ∧ ¬α) for some fixed atomic wff α.

3. α ∨ β abbreviates ¬α→ β.

4. α ∧ β abbreviates ¬(α→ ¬β).

5. α↔ β abbreviates (α→ β) ∧ (β → α).

6. ∃xα abbreviates ¬∀x¬α.

If we work in a language for a specific application, we will often write two-place predicate symbols and function symbols between the respective terms,e.g., t1 < t2 and (t1 + t2) in the language of arithmetic and t1 ∈ t2 in thelanguage of set theory. The successor function in the language of arithmetic iseven written conventionally after its argument: t′. Officially, however, these arejust conventional abbreviations for A2

0(t1, t2), f20 (t1, t2), A20(t1, t2) and f10 (t),

respectively.

Definition 13D (Syntactic identity) The symbol ≡ expresses syntacticidentity between strings of symbols, i.e., α ≡ β iff α and β are strings of symbolsof the same length and which contain the same symbol in each place.

The ≡ symbol may be flanked by strings obtained by concatenation, e.g.,α ≡ (β ∨ γ) means: the string of symbols α is the same string as the oneobtained by concatenating an opening parenthesis, the string β, the ∨ symbol,the string γ, and a closing parenthesis, in this order. If this is the case, thenwe know that the first symbol of α is an opening parenthesis, α contains β as asubstring (starting at the second symbol), that substring is followed by ∨, etc.

1.4 Unique Readability

The way we defined wffs guarantees that every wff has a unique reading, i.e.,there is essentially only one way of constructing it according to our forma-tion rules for wffs and only one way of “interpreting” it. If this were not so,we would have ambiguous wffs, i.e., wffs that have more than one readingor intepretation—and that is clearly something we want to avoid. But moreimportantly, without this property, most of the definitions and proofs we aregoing to give will not go through.

Perhaps the best way to make this clear is to see what would happen if wehad given bad rules for forming wffs that would not guarantee unique read-ability. For instance, we could have forgotten the parentheses in the formationrules for connectives, e.g., we might have allowed this:



If α and β are wffs, then so is α→ β.

Starting from an atomic formula δ, this would allow us to form δ → δ. Fromthis, together with δ, we would get δ → δ → δ. But there are two ways to dothis:

1. We take δ to be α and δ → δ to be β.

2. We take α to be δ → δ and β is δ.

Correspondingly, there are two ways to “read” the wff δ → δ → δ. It is of theform β → γ where β is δ and γ is δ → δ, but it is also of the form β → γ withβ being δ → δ and γ being δ.

If this happens, our definitions will not always work. For instance, when wedefine the main operator of a formula, we say: in a formula of the form β → γ,the main operator is the indicated occurrence of →. But if we can match theformula δ → δ → δ with β → γ in the two different ways mentioned above,then in one case we get the first occurrence of → as the main operator, andin the second case the second occurrence. But we intend the main operator tobe a function of the wff, i.e., every wff must have exactly one main operatoroccurrence.

Lemma 14A The number of left and right parentheses in a wff α are equal.

Proof. We prove this by induction on the way α is constructed. This requirestwo things: (a) We have to prove first that all atomic formulas have the prop-erty in question (the induction basis). (b) Then we have to prove that whenwe construct new formulas out of given formulas, the new formulas have theproperty provided the old ones do.

Let l(α) be the number of left parentheses, and r(α) the number of rightparentheses in α, and l(t) and r(t) similarly the number of left and right paren-theses in a term t. We leave the proof that for any term t, l(t) = r(t) as anexercise.

1. α ≡ Rt1 . . . tn: l(α) = 1+l(t1)+· · ·+l(tn) = 1+r(t1)+· · ·+r(tn) = r(α).Here we make use of the fact, left as an exercise, that l(t) = r(t) for anyterm t.

2. α ≡ t1 = t2: l(α) = l(t1) + l(t2) = r(t1) + r(t2) = r(α).

3. α ≡ ¬β: By induction hypothesis, l(β) = r(β). Thus l(α) = l(β) =r(β) = r(α).

4. α ≡ (β ∗γ): By induction hypothesis, l(β) = r(β) and l(γ) = r(γ). Thusl(α) = 1 + l(β) + l(γ) = 1 + r(β) + r(γ) = r(α).

5. α ≡ ∀xβ: By induction hypothesis, l(β) = r(β). Thus, l(α) = l(β) =r(β) = r(α).


1.4. UNIQUE READABILITY 11

Definition 14B (Proper prefix) A string of symbols β is a proper prefixof a string of symbols α if concatenating β and a non-empty string of symbolsyields α.

Lemma 14C If α is a wff, and β is a proper prefix of α, then β is not a wff.

Proof. Exercise.

Proposition 14D If α is an atomic wff, then it satisfes one, and only one ofthe following conditions.

1. α ≡ Rt1 . . . tn where R is an n-place predicate symbol, t1, . . . , tn areterms, and each of R, t1, . . . , tn is uniquely determined.

2. α ≡ t1 = t2 where t1 and t2 are uniquely determined terms.

Proof. Exercise.

Proposition 14E (Unique Readability) Every wff satisfies one, and onlyone of the following conditions.

1. α is atomic.

2. α is of the form ¬β.

3. α is of the form (β → γ).

4. α is of the form ∀xβ.

Moreover, in each case β, or β and γ, are uniquely determined. This meansthat, e.g., there are no different pairs β, γ and β′, γ′ so that α is both of theform (β → γ) and (β′ → γ′).

Proof. The formation rules require that if a wff is not atomic, it must start withan opening parenthesis (, ¬, or with a quantifier. On the other hand, everywff that start with one of the following symbols must be atomic: a predicatesymbol, a function symbol, a constant symbol.

So we really only have to show that if α is of the form (β ∗ γ) and also ofthe form (β′ ∗′ γ′), then β ≡ β′, γ ≡ γ′, and ∗ = ∗′.

So suppose both α ≡ (β ∗ γ) and α ≡ (β′ ∗′ γ′). Then either β ≡ β′ or not.If it is, clearly ∗ = ∗′ and γ ≡ γ′, since they then are substrings of α that beginin the same place and are of the same length. The other case is γ 6≡ γ′. Sinceγ and γ′ are both substrings of α that begin at the same place, one must be aprefix of the other. But this is impossible by Lemma 14C.



1.5 Main operator of a Formula

It is often useful to talk about the last operator used in constructing a wff α.This operator is called the main operator of α. Intuitively, it is the “outermost”operator of α. For example, the main operator of ¬α is ¬, the main operatorof (α ∨ β) is ∨, etc.

Definition 15A (Main operator) The main operator of a wff α is definedas follows:

1. α is atomic: α has no main operator.

2. α ≡ ¬β: the main operator of α is ¬.

3. α ≡ (β → γ): the main operator of α is →.

4. α ≡ ∀xβ: the main operator of α is ∀.

In each case, we intend the specific indicated occurrence of the main oper-ator in the formula. For instance, since the formula ((δ → φ)→ (φ→ δ)) is ofthe form (β → γ) where β is (δ → φ) and γ is (φ→ δ), the second occurrenceof → is the main operator.

This is a recursive definition of a function which maps all non-atomic wffs totheir main operator occurrence. Because of the way wffs are defined inductively,every wff α satisfies one of the cases in Definition 15A. This guarantees thatfor each non-atomic wff α a main operator exists. Because each wff satisfiesonly one of these conditions, and because the smaller wffs from which α isconstructed are uniquely determined in each case, the main operator occurrenceof α is unique, and so we have defined a function.

We call wffs by the following names depending on which symbol their mainoperator is:

Main operator Type of wff Examplenone atomic (wff) Rt1 . . . tn, t1 = t2¬ negation ¬α∧ conjunction (α ∧ β)∨ disjunction (α ∨ β)→ conditional (α→ β)∀ universal (wff) ∀xα∃ existential (wff) ∃xα

1.6 Subformulas

It is often useful to talk about the wffs that “make up” a given wff. We callthese its subformulas. Any wff counts as a subformula of itself; a subformulaof α other than α itself is a proper subformula.

Definition 16A (Immediate Subformula) If α is a wff, the immediatesubformulas of α are defined inductively as follows:


1.7. FREE VARIABLES AND SENTENCES 13

1. Atomic wffs have no immediate subformulas.

2. α ≡ ¬β: The only immediate subformula of α is β.

3. α ≡ (β ∗ γ): The immediate subformulas of α are β and γ (∗ is any oneof the two-place connectives).

4. α ≡ ∀xβ: The only immediate subformula of α is β.

Definition 16B (Proper Subformula) If α is a wff, the proper subformu-las of α are recursively as follows:

1. Atomic wffs have no proper subformulas.

2. α ≡ ¬β: The proper subformulas of α are β together with all propersubformulas of β.

3. α ≡ (β ∗ γ): The proper subformulas of α are β, γ, together with allproper subformulas of β and those of γ.

4. α ≡ ∀xβ: The proper subformulas of α are β together with all propersubformulas of β.

Definition 16C (Subformula) The subformulas of α are α itself togetherwith all its proper subformulas.

Note the subtle difference in how we have defined immediate subformulasand proper subformulas. In the first case, we have directly defined the imme-diate subformulas of a formula α for each possible form of α. It is an explicitdefinition by cases, and the cases mirror the inductive definition of the set ofwffs. In the second case, we have also mirrored the way the set of all wffs isdefined, but in each case we have also included the proper subformulas of thesmaller wffs β, γ in addition to these wffs themselves. This makes the definitionrecursive. In general, a definition of a function on an inductively defined set(in our case, wffs) is recursive if the cases in the definition of the function makeuse of the function itself. To be well defined, we must make sure, however, thatwe only ever use the values of the function for arguments that come “before”the one we are defining—in our case, when defining “proper subformula” for(β ∗ γ) we only use the proper subformulas of the “earlier” wffs β and γ.

1.7 Free Variables and Sentences

Definition 17A (Free occurrences of a variable) The free occur-rences of a variable in a wff are defined inductively as follows:

1. α is atomic: all variable occurrences in α are free.

2. α ≡ ¬β: the free variable occurrences of α are exactly those of β.

3. α ≡ (β ∗ γ): the free variable occurrences of α are those in β togetherwith those in γ.



4. α ≡ ∀xβ: the free variable occurrences in α are all of those in β exceptfor occurrences of x.

Definition 17B (Bound Variables) An occurrence of a variable in a for-mula α is bound if it is not free.

Definition 17C (Scope) If ∀xβ is an occurrence of a subformula in a for-mula α, then the corresponding occurrence of β in α is called the scope of thecorresponding occurrence of ∀x.

If β is the scope of a quantifier occurrence ∀x in α, then all occurrences of xwhich are free in β are said to be bound by the mentioned quantifier occurrence.

Example 1.7.4. Consider the following formula:

∃v0 A20v0v1︸︷︷︸β

β represents the scope of ∃v0. The quantifier binds the occurence of v0 in β,but does not bind the occurence of v1. So v1 is a free variable in this case.

We can now see how this might work in a more complicated wff α:

∀v0 (A10v0 → A2

0v0v1)︸︷︷︸β

→ ∃v1 (A21v0v1 ∨ ∀v0

δ︷︸︸︷¬A1

1v0)︸︷︷︸γ

β is the scope of the first ∀v0, γ is the scope of ∃v1, and δ is the scope of thesecond ∀v0. The first ∀v0 binds the occurrences of v0 in β, ∃v1 the occurrence ofv1 in γ, and the second ∀v0 binds the occurrence of v0 in δ. The first occurrenceof v1 and the fourth occurrence of v0 are free in α. The last occurrence of v0is free in δ, but bound in γ and α.

Definition 17E (Sentence) A wff α is a sentence iff it contains no freeoccurrences of variables.

1.8 Substitution

Definition 18A (Substitution in a term) We define s[t/x], the result ofsubstituting t for every occurrence of x in s, recursively:

1. s ≡ c: s[t/x] is just s.

2. s ≡ y: s[t/x] is also just s, provided y is a variable other than x.

3. s ≡ x: s[t/x] is t.

4. s ≡ ft1 . . . tn: s[t/x] is ft1[t/x] . . . tn[t/x].

Definition 18B A term t is free for x in α if none of the free occurrencesof x in α occur in the scope of a quantifier that binds a variable in t.


1.9. STRUCTURES FOR FIRST-ORDER LANGUAGES 15

Definition 18C (Substitution in a wff) If α is a wff, x is a variable, andt is a term free for x in α, then α[t/x] is the result of substituting t for all freeoccurrences of x in α.

1. α ≡ Pt1 . . . tn: α[t/x] is Pt1[t/x] . . . tn[t/x].

2. α ≡ t1 = t2: α[t/x] is t1[t/x] = t2[t/x].

3. α ≡ ¬β: α[t/x] is ¬β[t/x].

4. α ≡ (β → γ): α[t/x] is (β[t/x]→ γ[t/x]).

5. α ≡ ∀y β: α[t/x] is ∀y β[t/x], provided y is a variable other than x;otherwise α[t/x] is just α.

Note that substitution may be vacuous: If x does not occur in α at all, thenα[t/x] is just α.

The restriction that tmust be free for x in α is necessary to exclude cases likethe following. If α ≡ ∃y x < y and t ≡ y, then α[t/x] would be ∃y y < y. In thiscase the free variable y is “captured” by the quantifier ∃y upon substitution,and that is undesirable. For instance, we would like it to be the case thatwhenever ∀xβ holds, so does β[t/x]. But consider ∀x∃y x < y (here β is∃y x < y). It is sentence that is true about, e.g., the natural numbers: forevery number x there is a number y greater than it. If we allowed y as apossible substitution for x, we would end up with β[y/x] ≡ ∃y y < y, which isfalse. We prevent this by requiring that none of the free variables in t wouldend up being bound by a quantifier in α.

We often use the following convention to avoid cumbersume notation: If αis a formula with a free variable x, we write α(x) to indicate this. When it isclear which α and x we have in mind, and t is a term (assumed to be free forx in α(x)), then we write α(t) as short for α(x)[t/x].

1.9 Structures for First-order Languages

First-order languages are, by themselves, uninterpreted: the constant symbols,function symbols, and predicate symbols have no specific meaning attached tothem. Meanings are given by specifying a structure. It specifies the domain, i.e.,the objects which the constant symbols pick out, the function symbols operateon, and the quantifiers range over. In addition, it specifies which constantsymbols pick out which objects, how a function symbol maps objects to objects,and which objects the predicate symbols apply to. Structures are the basis forsemantic notions in logic, e.g., the notion of consequence, validity, satisfiablity.They are variously called “structures,” “interpretations,” or “models” in theliterature.

Definition 19A (Structures) A structure A, for a language L of first-order logic consists of the following elements:

1. Domain: a non-empty set, |A|



2. Interpretation of constant symbols: for each constant symbol c of L, an el-ement cA ∈ |A|

3. Interpretation of predicate symbols: for each n-place predicate symbol Rof L (other than =), an n-place relation RA ⊆ |A|n

4. Interpretation of function symbols: for each n-place function symbol f ofL, an n-place function fA : |A|n → |A|

Example 1.9.2. A structure A for the language of arithmetic consists of a set,an element of |A|, 0A, as interpretation of the constant symbol 0, a one-place

function ′A : |A| → |A|, two two-place functions +A and ×A, both |A|2 → |A|,and a two-place relation <A⊆ |A|2.

An obvious example of such a structure is the following:

1. |B| = N

2. 0B = 0

3. ′B(n) = n+ 1 for all n ∈ N

4. +B(n,m) = n+m for all n,m ∈ N

5. ×B(n,m) = n ·m for all n,m ∈ N

6. <B= 〈n,m〉 : n ∈ N,m ∈ N, n < m

The structure B for LA so defined is called the standard model of arithmetic,because it interprets the non-logical constants of LA exactly how you wouldexpect.

However, there are many other possible structures for LA. For instance,we might take as the domain the set Z of integers instead of N, and define theinterpretations of 0, ′, +, ×, < accordingly. But we can also define structuresfor LA which have nothing even remotely to do with numbers.

Example 1.9.3. A structure A for the language LZ of set theory requires justa set and a single-two place relation. So technically, e.g., the set of people plusthe relation “x is older than y” could be used as a structure for LZ , as well asN together with n ≥ m for n,m ∈ N.

A particularly interesting structure for LZ in which the elements of thedomain are actually sets, and the interpretation of ∈ actually is the relation“x is an element of y” is the structure HF of hereditarily finite sets:

1. |HF| = ∅ ∪ ℘(∅) ∪ ℘(℘(∅)) ∪ ℘(℘(℘(∅))) ∪ . . . ;

2. ∈HF= 〈x, y〉 : x, y ∈ |HF| , x ∈ y.

The stipulations we make as to what counts as a structure impact ourlogic. For example, the choice to prevent empty domains ensures, given theusual account of satisfaction (or truth) for quantified sentences, that ∃x (α(x)∨¬α(x)) is valid—that is, a logical truth. And the stipulation that all constant


1.10. COVERED STRUCTURES FOR FIRST-ORDER LANGUAGES 17

symbols must refer to an object in the domain ensures that the existentialgeneralization is a sound pattern of inference: α(a), therefore ∃xα(x). If weallowed names to refer outside the domain, or to not refer, then we would be onour way to a free logic, in which existential generalization requires an additionalpremise: α(a) and ∃xx = a, therefore ∃xα(x).

1.10 Covered Structures for First-order Languages

Recall that a term is closed if it contains no variables.

Definition 110A (Value of closed terms) If t is a closed term of thelanguage L and A is a structure for L, the value tA is defined as follows:

1. If t is just the constant symbol c, then cA = cA.

2. If t is of the form ft1 . . . tn, then

tA = fA(t1A, . . . , tn

A).

Definition 110B (Covered structure) A structure is covered if everyelement of the domain is the value of some closed term.

Example 1.10.3. Let L be the language with constant symbols zero, one, two,. . . , the binary predicate symbol <, and the binary function symbols + and×. Then a structure A for L is the one with domain |A| = 0, 1, 2, . . . andassignments zeroA = 0, oneA = 1, twoA = 2, and so forth. For the binaryrelation symbol <, the set <A is the set of all pairs 〈c1, c2〉 ∈ |A|2 such thatc1 is less than c2: for example, 〈1, 3〉 ∈<A but 〈2, 2〉 /∈<A. For the binaryfunction symbol +, define +A in the usual way—for example, +A(2, 3) mapsto 5, and similarly for the binary function symbol ×. Hence, the value offour is just 4, and the value of ×(two,+(three, zero)) (or in infix notation,two× (three+ zero) ) is

×(two,+(three, zero)A

=

= ×A(twoA, two,+(three, zero)A

)

= ×A(twoA,+A(threeA, zeroA))

= ×A(twoA,+A(threeA, zeroA))

= ×A(2,+A(3, 0))

= ×A(2, 3)

= 6

1.11 Satisfaction of a Wff in a Structure

The basic notion that relates expressions such as terms and wffs, on the onehand, and structures on the other, are those of value of a term and satisfaction



of a wff. Informally, the value of a term is an element of a structure—if theterm is just a constant, its value is the object assigned to the constant by thestructure, and if it is built up using function symbols, the value is computedfrom the values of constants and the functions assigned to the functions inthe term. A wff is satisfied in a structure if the interpretation given to thepredicates makes the wff true in the domain of the structure. This notion ofsatisfaction is specified inductively: the specification of the structure directlystates when atomic wffs are satisfied, and we define when a complex wff issatisfied depending on the main connective or quantifier and whether or notthe immediate subformulas are satisfied. The case of the quantifiers here is abit tricky, as the immediate subformula of a quantified wff has a free variable,and structures don’t specify the values of variables. In order to deal withthis difficulty, we also introduce variable assignments and define satisfactionnot with respect to a structure alone, but with respect to a structure plus avariable assignment.

Definition 111A (Variable Assignment) A variable assignment s for astructure A is a function which maps each variable to an element of |A|, i.e.,s : Var→ |A|.

A structure assigns a value to each constant symbol, and a variable assign-ment to each variable. But we want to use terms built up from them to alsoname elements of the domain. For this we define the value of terms induc-tively. For constant symbols and variables the value is just as the structureor the variable assignment specifies it; for more complex terms it is computedrecursively using the functions the structure assigns to the function symbols.

Definition 111B (Value of Terms) If t is a term of the language L, A isa structure for L, and s is a variable assignment for A, the value s(t) is definedas follows:

1. t ≡ c: s(t) = cA.

2. t ≡ x: s(t) = s(x).

3. t ≡ ft1 . . . tn:s(t) = fA(s(t1), . . . , s(tn)).

Definition 111C (x-Variant) If s is a variable assignment for a structure A,then any variable assignment s′ for A which differs from s at most in what itassigns to x is called an x-variant of s. If s′ is an x-variant of s we writes ∼x s′.

Note that an x-variant of an assignment s does not have to assign somethingdifferent to x. In fact, every assignment counts as an x-variant of itself.

Definition 111D (Satisfaction) Satisfaction of a wff α in a structure Arelative to a variable assignment s, in symbols: |=A α[s], is defined recursivelyas follows. (We write 6|=A α[s] to mean “not |=A α[s].”)

1. α ≡ Rt1 . . . tn: |=A α[s] iff 〈s(t1), . . . , s(tn)〉 ∈ RA.


1.11. SATISFACTION OF A WFF IN A STRUCTURE 19

2. α ≡ t1 = t2: |=A α[s] iff s(t1) = s(t2).

3. α ≡ ¬β: |=A α[s] iff 6|=A β[s].

4. α ≡ (β → γ): |=A α[s] iff 6|=A β[s] or |=A γ[s] (or both).

5. α ≡ ∀xβ: |=A α[s] iff for every x-variant s′ of s, |=A β[s′].

The variable assignments are important in the last clause. We cannot definesatisfaction of ∀xβ(x) by “for all a ∈ |A|, |=A β(a).” The reason is that ais not symbol of the language, and so β(a) is not a wff (that is, β[a/x] isundefined). We also cannot assume that we have constant symbols or termsavailable that name every element of A, since there is nothing in the definitionof structures that requires it. Even in the standard language the set of constantsymbols is denumerable, so if |A| is not enumerable there aren’t even enoughconstant symbols to name every object.

A variable assignment s provides a value for every variable in the language.This is of course not necessary: whether or not a wff α is satisfied in a structurewith respect to s only depends on the assignments s makes to the free variablesthat actually occur in α. This is the content of the next theorem. We requirevariable assignments to assign values to all variables simply because it makesthings a lot easier.

Proposition 111E If the variables in a term t are among x1, . . . , xn, ands1(xi) = s2(xi) for i = 1, . . . , n, then s1(t) = s2(t).

Proof. By induction on the complexity of t. For the base case, t can bea constant symbol or one one of the variables x1, . . . , xn. If t = c, thens1(t) = cA = s2(t). If t = xi, then s1(t) = s1(xi) = s2(xi) (by the hypothesisof the proposition) = s2(t). For the inductive step, assume that t = ft1 . . . tkfor some terms t1, . . . , tk, and that the claim holds for t1, . . . , tk. Then s1(t) =

s1(ft1 . . . tk) = fA(s1(t1), . . . , s1(tk))

For i = 1, . . . , tk, its variables are among x1, . . . , xn. So by induction hypoth-esis, s1(ti) = sk(ti). So,

= fA(s1(t1), . . . , s1(tk)) =

= fA(s2(t1), . . . , s2(tk)) =

= s2(ft1 . . . tk) = s2(t).

Proposition 111F If the free variables in α are among x1, . . . , xn, ands1(xi) = s2(xi) for i = 1, . . . , n, then |=A α[s1] iff |=A α[s2].

Proof. We use induction on the complexity of α. For the base case, where α isatomic, α can be: Rt1 . . . tk for a k-place predicate R and terms t1, . . . , tk,or t1 = t2 for terms t1 and t2.



1. α ≡ Rt1 . . . tk: let |=A α[s1]. Then

〈s1(t1), . . . , s1(tk)〉 ∈ RA.

For i = 1, . . . , k, s1(ti) = s2(ti) by Proposition 111E. So we also have〈s2(ti), . . . , s2(tk)〉 ∈ RA.

2. α ≡ t1 = t2: if |=A α[s1], s2(t1) = s1(t1) (by Proposition 111E) = s1(t2)(since |=A t1 = t2[s1]) = s2(t2) (by Proposition 111E), so |=A t1 = t2[s2].

Now assume |=A β[s1] iff |=A β[s2] for all wffs β less complex than α. Theinduction step proceeds by cases determined by the main operator of α. Ineach case, we only demonstrate the forward direction of the biconditional; theproof of the reverse direction is symmetrical.

1. α ≡ ¬β: if |=A α[s1], then 6|=A β[s1], so by the induction hypothesis,6|=A β[s2], hence |=A α[s2].

2. α ≡ β → γ: if |=A α[s1], then 6|=A β[s1] or |=A γ[s1]. By the inductionhypothesis, 6|=A β[s2] or |=A γ[s2], so |=A α[s2].

3. α ≡ ∀xβ: if |=A α[s1], then for every x-variant s1 of s1, |=A β[s1]. Ifs2 is an x-variant of s2, let s1 be the x-varaint of s1 which assigns thesame thing to x as does s2. The free variables of β are among x1, . . . ,xn, and x. s1(xi) = s2(xi), since s1 and s2 are x-variants of s1 and s2,respectively, and by hypothesis s1(xi) = s2(xi). s1(x) = s2(x) by theway we have defined s1. Then the induction hypothesis applies to β ands1, s2, and we have |=A β[s2]. Since s2 was an arbitrary x-variant of s2,every x-variant of s2 satisfies β, so |=A α[s2].

By induction, we get that |=A α[s1] iff |=A α[s2] whenever the free variables inα are among x1, . . . , xn and s(xi) = s′(xi) for i = 1, . . . , n.

Definition 111G If α is a sentence, we say that a structure A satisfies α,|=A α, iff |=A α[s] for all variable assignments s.

If |=A α, we also say that α is true in A.

Proposition 111H Suppose α(x) only contains x free, and A is a structure.Then:

1. |=A ∃xα(x) iff |=A α(x)[s] for at least one variable assignment s.

2. |=A ∀xα(x) iff |=A α(x)[s] for all variable assignments s.

Proof. Exercise.


1.12. EXTENSIONALITY 21

1.12 Extensionality

Extensionality, sometimes called relevance, can be expressed informally as fol-lows: the only thing that bears upon the satisfaction of wff α in a structure Arelative to a variable assignment s, are the assignments made by A and s tothe elements of the language that actually appear in α.

One immediate consequence of extensionality is that where two structures Aand A′ agree on all the elements of the language appearing in a sentence αand have the same domain, A and A′ must also agree on α itself.

Proposition 112A (Extensionality) Let α be a sentence, and A and A′

be structures. If cA = cA′, RA = RA′ , and fA = fA

′for every constant

symbol c, relation symbol R, and function symbol f occurring in α, then |=A αiff |=A′ α.

Moreover, the value of a term, and whether or not a structure satisfies awff, only depends on the values of its subterms.

Proposition 112B Let A be a structure, t and t′ terms, and s a variableassignment. Let s′ ∼x s be the x-variant of s given by s′(x) = s(t′). Thens(t[t′/x]) = s′(t).

Proof. By induction on t.

1. If t is a constant, say, t ≡ c, then t[t′/x] = c, and s(c) = cA = s′(c).

2. If t is a variable other than x, say, t ≡ y, then t[t′/x] = y, and s(y) = s′(y)since s′ ∼x s.

3. If t ≡ x, then t[t′/x] = t′. But s′(x) = s(t′) by definition of s′.

4. If t ≡ ft1 . . . tn then we have:

s(t[t′/x]) =

= s(ft1[t′/x] . . . tn[t′/x])

by definition of t[t′/x]

= fA(s(t1[t′/x]), . . . , s(tn[t′/x]))

by definition of s(f . . .)

= fA(s′(t1), . . . , s′(tn))

by induction hypothesis

= s′(t) by definition of s′(f . . .)

Proposition 112C Let A be a structure, α a wff, t a term, and s a variableassignment. Let s′ ∼x s be the x-variant of s given by s′(x) = s(t). Then|=A α[t/x][s] iff |=A α[s′].

Proof. Exercise.



1.13 Semantic Notions

Give the definition of structures for first-order languages, we can define somebasic semantic properties of and relationships between sentences. The simplestof these is the notion of validity of a sentence. A sentence is valid if it is satisfiedin every structure. Valid sentences are those that are satisfied regardless of howthe non-logical symbols in it are interpreted. Valid sentences are therefore alsocalled logical truths—they are true, i.e., satisfied, in any structure and hencetheir truth depends only on the logical symbols occurring in them and theirsyntactic structure, but not on the non-logical symbols or their interpretation.

Definition 113A (Validity) A sentence α is valid, α, iff |=A α for everystructure A.

Definition 113B (Entailment) A set of sentences Γ entails a sentence α,Γ α, iff for every structure A with |=A Γ , |=A α.

Definition 113C (Satisfiability) A set of sentences Γ is satisfiable if |=A

Γ for some structure A. If Γ is not satisfiable it is called unsatisfiable.

Proposition 113D A sentence α is valid iff Γ α for every set of sentences Γ .

Proof. For the forward direction, let α be valid, and let Γ be a set of sentences.Let A be a structure so that |=A Γ . Since α is valid, |=A α, hence Γ α.

For the contrapositive of the reverse direction, let α be invalid, so there is astructure A with 6|=A α. When Γ = >, since > is valid, |=A Γ . Hence, thereis a structure A so that |=A Γ but 6|=A α, hence Γ does not entail α.

Proposition 113E Γ α iff Γ ∪ ¬α is unsatisfiable.

Proof. For the forward direction, suppose Γ α and suppose to the contrarythat there is a structure A so that |=A Γ ∪ ¬α. Since |=A Γ and Γ α,|=A α. Also, since |=A Γ ∪ ¬α, |=A ¬α, so we have both |=A α and 6|=A

α, a contradiction. Hence, there can be no such structure A, so Γ ∪ α isunsatisfiable.

For the reverse direction, suppose Γ ∪ ¬α is unsatisfiable. So for everystructure A, either 6|=A Γ or |=A α. Hence, for every structure A with |=A Γ ,|=A α, so Γ α.

Proposition 113F If Γ ⊆ Γ ′ and Γ α, then Γ ′ α.

Proof. Suppose that Γ ⊆ Γ ′ and Γ α. Let A be such that |=A Γ ′; then|=A Γ , and since Γ α, we get that |=A α. Hence, whenever |=A Γ ′, |=A α, soΓ ′ α.

Theorem 113G (Semantic Deduction Theorem) Γ ∪ α β iff Γ α→ β.

Proof. For the forward direction, let Γ ∪ α β and let A be a structure sothat |=A Γ . If |=A α, then |=A Γ ∪ α, so since Γ ∪ α entails β, we get|=A β. Therefore, |=A α→ β, so Γ α→ β.


Chapter 2

Theories and Their Models

2.1 Introduction

The development of the axiomatic method is a significant achievement in thehistory of science, and is of special importance in the history of mathematics.An axiomatic development of a field involves the clarification of many questions:What is the field about? What are the most fundamental concepts? How arethey related? Can all the concepts of the field be defined in terms of thesefundamental concepts? What laws do, and must, these concepts obey?

The axiomatic method and logic were made for each other. Formal logicprovides the tools for formulating axiomatic theories, for proving theoremsfrom the axioms of the theory in a precisely specified way, for studying theproperties of all systems satisfying the axioms in a systematic way.

Definition 21A A set of sentences Γ is closed iff, whenever Γ α thenα ∈ Γ . The closure of a set of sentences Γ is α : Γ α.

We say that Γ is axiomatized by a set of sentences ∆ if Γ is the closureof ∆

We can think of an axiomatic theory as the set of sentences that is ax-iomatized by its set of axioms ∆. In other words, when we have a first-orderlanguage which contains non-logical symbols for the primitives of the axiomat-ically developed science we wish to study, together with a set of sentencesthat express the fundamental laws of the science, we can think of the theoryas represented by all the sentences in this language that are entailed by theaxioms. This ranges from simple examples with only a single primitive andsimple axioms, such as the theory of partial orders, to complex theories suchas Newtonian mechanics.

The important logical facts that make this formal approach to the axiomaticmethod so important are the following. Suppose Γ is an axiom system for atheory, i.e., a set of sentences.

1. We can state precisely when an axiom system captures an intended classof structures. That is, if we are interested in a certain class of struc-

25


26 CHAPTER 2. THEORIES AND THEIR MODELS

tures, we will successfully capture that class by an axiom system Γ iffthe structures are exactly those A such that |=A Γ .

2. We may fail in this respect because there are A such that |=A Γ , but Ais not one of the structures we intend. This may lead us to add axiomswhich are not true in A.

3. If we are successful at least in the respect that Γ is true in all the intendedstructures, then a sentence α is true in all intended structures wheneverΓ α. Thus we can use logical tools (such as proof methods) to showthat sentences are true in all intended structures simply by showing thatthey are entailed by the axioms.

4. Sometimes we don’t have intended structures in mind, but instead startfrom the axioms themselves: we begin with some primitives that wewant to satisfy certain laws which we codify in an axiom system. Onething that we would like to verify right away is that the axioms do notcontradict each other: if they do, there can be no concepts that obeythese laws, and we have tried to set up an incoherent theory. We canverify that this doesn’t happen by finding a model of Γ . And if thereare models of our theory, we can use logical methods to investigate them,and we can also use logical methods to construct models.

5. The independence of the axioms is likewise an important question. It mayhappen that one of the axioms is actually a consequence of the others,and so is redundant. We can prove that an axiom α in Γ is redundant byproving Γ \ α α. We can also prove that an axiom is not redundantby showing that (Γ \α)∪¬α is satisfiable. For instance, this is how itwas shown that the parallel postulate is independent of the other axiomsof geometry.

6. Another important question is that of definability of concepts in a theory:The choice of the language determines what the models of a theory con-sists of. But not every aspect of a theory must be represented separatelyin its models. For instance, every ordering ≤ determines a correspondingstrict ordering <—given one, we can define the other. So it is not neces-sary that a model of a theory involving such an order must also containthe corresponding strict ordering. When is it the case, in general, thatone relation can be defined in terms of others? When is it impossible todefine a relation in terms of other (and hence must add it to the primitivesof the language)?

2.2 Expressing Properties of Structures

It is often useful and important to express conditions on functions and relations,or more generally, that the functions and relations in a structure satisfy theseconditions. For instance, we would like to have ways of distinguishing those


2.3. EXAMPLES OF FIRST-ORDER THEORIES 27

structures for a language which “capture” what we want the predicate symbolsto “mean” from those that do not. Of course we’re completely free to specifywhich structures we “intend,” e.g., we can specify that the interpretation ofthe predicate symbol ≤ must be an ordering, or that we are only interested ininterpretations of L in which the domain consists of sets and ∈ is interpretedby the “is an element of” relation. But can we do this with sentences of thelanguage? In other words, which conditions on a structure A can we express bya sentence (or perhaps a set of sentences) in the language of A? There are someconditions that we will not be able to express. For instance, there is no sentenceof LA which is only true in a structure A if |A| = N. We cannot express “thedomain contains only natural numbers.” But there are “structural properties”of structures that we perhaps can express. Which properties of structurescan we express by sentences? Or, to put it another way, which collections ofstructures can we describe as those making a sentence (or set of sentences)true?

Definition 22A (Model of a set) Let Γ be a set of sentences in a lan-guage L. We say that a structure A is a model of Γ if |=A α for all α ∈ Γ .

Example 2.2.2. The sentence ∀xx ≤ x is true in A iff ≤A is a reflexive relation.The sentence ∀x∀y ((x ≤ y ∧ y ≤ x) → x = y) is true in A iff ≤A is anti-symmetric. The sentence ∀x∀y ∀z ((x ≤ y ∧ y ≤ z) → x ≤ z) is true in A iff≤A is transitive. Thus, the models of

∀xx ≤ x,∀x∀y ((x ≤ y ∧ y ≤ x)→ x = y),

∀x∀y ∀z ((x ≤ y ∧ y ≤ z)→ x ≤ z)

are exactly those structures in which ≤A is reflexive, anti-symmetric, and tran-sitive, i.e., a partial order. Hence, we can take them as axioms for the first-ordertheory of partial orders.

2.3 Examples of First-Order Theories

Example 2.3.1. The theory of strict linear orders in the language L< is axiom-atized by the set

∀x¬x < x,

∀x∀y ((x < y ∨ y < x) ∨ x = y),

∀x∀y ∀z ((x < y ∧ y < z)→ x < z)

It completely captures the intended structures: every strict linear order is amodel of this axiom system, and vice versa, if R is a linear order on a set X,then the structure A with |A| = X and <A= R is a model of this theory.



Example 2.3.2. The theory of groups in the language 1 (constant symbol), ·(two-place function symbol) is axiomatized by

∀x (x · 1) = x

∀x∀y ∀z (x · (y · z)) = ((x · y) · z)∀x∃y (x · y) = 1

Example 2.3.3. The theory of Peano arithmetic is axiomatized by the followingsentences in the language of arithmetic LA.

¬∃xx′ = 0

∀x ∀y (x′ = y′ → x = y)

∀x ∀y (x < y ↔ ∃z (x+ z′ = y))

∀x (x+ 0) = x

∀x ∀y (x+ y′) = (x+ y)′

∀x (x× 0) = 0

∀x∀y (x× y′) = ((x× y) + x)

plus all sentences of the form

(α(0) ∧ ∀x (α(x)→ α(x′)))→ ∀xα(x)

Since there are infinitely many sentences of the latter form, this axiom systemis infinite. The latter form is called the induction schema. (Actually, theinduction schema is a bit more complicated than we let on here.)

The third axiom is an explicit definition of <.

Example 2.3.4. The theory of pure sets plays an important role in the founda-tions (and in the philosophy) of mathematics. A set is pure if all its elementsare also pure sets. The empty set counts therefore as pure, but a set that hassomething as an element that is not a set would not be pure. So the pure setsare those that are formed just from the empty set and no “urelements,” i.e.,objects that are not themselves sets.

The following might be considered as an axiom system for a theory of puresets:

∃x¬∃y y ∈ x∀x∀y (∀z(z ∈ x↔ z ∈ y)→ x = y)

∀x∀y ∃z ∀u (u ∈ z ↔ (u = x ∨ u = y))

∀x∃y ∀z (z ∈ y ↔ ∃u (z ∈ u ∧ u ∈ x))

plus all sentences of the form

∃x∀y (y ∈ x↔ α(y))


2.4. EXPRESSING RELATIONS IN A STRUCTURE 29

The first axiom says that there is a set with no elements (i.e., ∅ exists); thesecond says that sets are extensional; the third that for any sets X and Y , theset X,Y exists; the fourth that for any sets X and Y , the set X ∪ Y exists.

The sentences mentioned last are collectively called the naive comprehen-sion scheme. It essentially says that for every α(x), the set x : α(x) exists—soat first glance a true, useful, and perhaps even necessary axiom. It is called“naive” because, as it turns out, it makes this theory unsatisfiable: if you takeα(y) to be ¬y ∈ y, you get the sentence

∃x ∀y (y ∈ x↔ ¬y ∈ y)

and this sentence is not satisfied in any structure.

Example 2.3.5. In the area of mereology, the relation of parthood is a funda-mental relation. Just like theories of sets, there are theories of parthood thataxiomatize various conceptions (sometimes conflicting) of this relation.

The language of mereology contains a single two-place predicate symbol P ,and Pxy “means” that x is a part of y. When we have this interpretation inmind, a structure for this language is called a parthood structure. Of course, notevery structure for a single two-place predicate will really deserve this name. Tohave a chance of capturing “parthood,” PA must satisfy some conditions, whichwe can lay down as axioms for a theory of parthood. For instance, parthoodis a partial order on objects: every object is a part (albeit an improper part)of itself; no two different objects can be parts of each other; a part of a partof an object is itself part of that object. Note that in this sense “is a part of”resembles “is a subset of,” but does not resemble “is an element of” which isneither reflexive nor transitive.

∀xPxx,∀x∀y ((Pxy ∧ Pyx)→ x = y),

∀x∀y ∀z ((Pxy ∧ Pyz)→ Pxz),

Moreover, any two objects have a mereological sum (an object that has thesetwo objects as parts, and is minimal in this respect).

∀x∀y ∃z ∀u (Pzu↔ (Pxu ∧ Pyu))

These are only some of the basic principles of parthood considered by meta-physicians. Further principles, however, quickly become hard to formulate orwrite down without first introducting some defined relations. For instance,most metaphysicians interested in mereology also view the following as a validprinciple: whenever an object x has a proper part y, it also has a part z thathas no parts in common with y, and so that the fusion of y and z is x.

2.4 Expressing Relations in a Structure

One main use wffs can be put to is to express properties and relations ina structure A in terms of the primitives of the language L of A. By this we



mean the following: the domain of A is a set of objects. The constant symbols,function symbols, and predicate symbols are interpreted in A by some objectsin|A|, functions on |A|, and relations on |A|. For instance, if A2

0 is in L, then A

assigns to it a relation R = A20A

. Then the formula A20v1v2 expresses that very

relation, in the following sense: if a variable assignment s maps v1 to a ∈ |A|and v2 to b ∈ |A|, then

Rab iff |=A A20v1v2[s].

Note that we have to involve variable assignments here: we can’t just say“Rab iff |=A A2

0ab” because a and b are not symbols of our language: they areelements of |A|.

Since we don’t just have atomic wffs, but can combine them using the logicalconnectives and the quantifiers, more complex wffs can define other relationswhich aren’t directly built into A. We’re interested in how to do that, andspecifically, which relations we can define in a structure.

Definition 24A Let α(v1, . . . , vn) be a wff of L in which only v1,. . . , vn occurfree, and let A be a structure for L. α(v1, . . . , vn) expresses the relation R ⊆|A|n iff

Ra1 . . . an iff |=A αv1 . . . vn[s]

for any variable assignment s with s(vi) = ai (i = 1, . . . , n).

Example 2.4.2. In the standard model of arithmetic B, the wff v1 < v2∨v1 = v2expresses the ≤ relation on N. The wff v2 = v′1 expresses the successor relation,i.e., the relation R ⊆ N2 where Rnm holds if m is the successor of n. Theformula v1 = v′2 expresses the predecessor relation. The wffs ∃v3 (v3 6= 0∧v2 =(v1 +v3)) and ∃v3 (v1 +v3

′) = v2 both express the < relation. This means thatthe predicate symbol < is actually superfluous in the language of arithmetic;it can be defined.

This idea is not just interesting in specific structures, but generally when-ever we use a language to describe an intended model or models, i.e., when weconsider theories. These theories often only contain a few predicate symbols asbasic symbols, but in the domain they are used to describe often many otherrelations play an important role. If these other relations can be systematicallyexpressed by the relations that interpret the basic predicate symbols of thelanguage, we say we can define them in the language.

2.5 The Theory of Sets

Almost all of mathematics can be developed in the theory of sets. Developingmathematics in this theory involves a number of things. First, it requires aset of axioms for the relation ∈. A number of different axiom systems havebeen developed, sometimes with conflicting properties of ∈. The axiom systemknown as ZFC, Zermelo-Fraenkel set theory with the axiom of choice standsout: it is by far the most widely used and studied, because it turns out that its


2.5. THE THEORY OF SETS 31

axioms suffice to prove almost all the things mathematicians expect to be ableto prove. But before that can be established, it first is necessary to make clearhow we can even express all the things mathematicians would like to express.For starters, the language contains no constant symbols or function symbols, soit seems at first glance unclear that we can talk about particular sets (such as∅ or N), can talk about operations on sets (such as X ∪Y and ℘(X)), let aloneother constructions which involve things other than sets, such as relations andfunctions.

To begin with, “is an element of” is not the only relation we are interestedin: “is a subset of” seems almost as important. But we can define “is a subsetof” in terms of “is an element of.” To do this, we have to find a wff α(x, y) inthe language of set theory which is satisfied by a pair of sets 〈X,Y 〉 iff X ⊆ Y .But X is a subset of Y just in case all elements of X are also elements of Y .So we can define ⊆ by the formula

∀z (z ∈ x→ z ∈ y)

Now, whenever we want to use the relation ⊆ in a formula, we could insteaduse that formula (with x and y suitably replaced, and the bound variable zrenamed if necessary). For instance, extensionality of sets means that if anysets x and y are contained in each other, then x and y must be the same set.This can be expressed by ∀x ∀y ((x ⊆ y ∧ y ⊆ x)→ x = y), or, if we replace ⊆by the above definition, by

∀x ∀y ((∀z (z ∈ x→ z ∈ y) ∧ ∀z (z ∈ y → z ∈ x))→ x = y).

This is in fact one of the axioms of ZFC, the “axiom of extensionality.”There is no constant symbol for ∅, but we can express “x is empty” by

¬∃y y ∈ x. Then “∅ exists” becomes the sentence ∃x¬∃y y ∈ x. This isanother axiom of ZFC. (Note that the axiom of extensionality implies thatthere is only one empty set.) Whenever we want to talk about ∅ in the languageof set theory, we would write this as “there is a set that’s empty and . . . ” Asan example, to express the fact that ∅ is a subset of every set, we could write

∃x (¬∃y y ∈ x ∧ ∀z x ⊆ z)

where, of course, x ⊆ z would in turn have to be replaced by its definition.To talk about operations on sets, such has X ∪Y and ℘(X), we have to use

a similar trick. There are no function symbols in the language of set theory,but we can express the functional relations X ∪ Y = Z and ℘(X) = Y by

∀u ((u ∈ x ∨ u ∈ y)↔ u ∈ z)∀u (u ⊆ x↔ u ∈ y)

since the elements of X∪Y are exactly the sets that are either elements of X orelements of Y , and the elements of ℘(X) are exactly the subsets of X. However,this doesn’t allow us to use x∪y or ℘(x) as if they were terms: we can only use



the entire wffs that define the relations X ∪ Y = Z and ℘(X) = Y . In fact, wedo not know that these relations are ever satisfied, i.e., we do not know thatunions and power sets always exist. For instance, the sentence ∀x ∃y ℘(x) = yis another axiom of ZFC (the power set axiom).

Now what about talk of ordered pairs or functions? Here we have to explainhow we can think of ordered pairs and functions as special kinds of sets. Oneway to define the ordered pair 〈x, y〉 is as the set x, x, y. But like before,we cannot introduce a function symbol that names this set; we can only definethe relation 〈x, y〉 = z, i.e., x, x, y = z:

∀u (u ∈ z ↔ (∀v (v ∈ u↔ v = x) ∨ ∀v (v ∈ u↔ (v = x ∨ v = y))))

This says that the elements u of z are exactly those sets which either have xas its only element or have x and y as its only elements (in other words, thosesets that are either identical to x or identical to x, y). Once we have this,we can say further things, e.g., that X × Y = Z:

∀z (z ∈ Z ↔ ∃x∃y (x ∈ X ∧ y ∈ Y ∧ 〈x, y〉 = z))

A function f : X → Y can be thought of as the relation f(x) = y, i.e., asthe set of pairs 〈x, y〉 : f(x) = y. We can then say that a set f is a functionfrom X to Y if (a) it is a relation ⊆ X × Y , (b) it is total, i.e., for all x ∈ Xthere is some y ∈ Y such that 〈x, y〉 ∈ f and (c) it is functional, i.e., whenever〈x, y〉, 〈x, y′〉 ∈ f , y = y′ (because values of functions must be unique). So “fis a function from X to Y ” can be written as:

∀u (u ∈ f → ∃x ∃y (x ∈ X ∧ y ∈ Y ∧ 〈x, y〉 = u)) ∧∀x (x ∈ X → (∃y (y ∈ Y ∧maps(f, x, y)) ∧

(∀y ∀y′ ((maps(f, x, y) ∧maps(f, x, y′))→ y = y′)))

where maps(f, x, y) abbreviates ∃v (v ∈ f ∧ 〈x, y〉 = v) (this wff expresses“f(x) = y”).

It is now also not hard to express that f : X → Y is injective, for instance:

f : X → Y ∧ ∀x∀x′ ((x ∈ X ∧ x′ ∈ X ∧∃y (maps(f, x, y) ∧maps(f, x′, y)))→ x = x′)

A function f : X → Y is injective iff, whenever f maps x, x′ ∈ X to a single y,x = x′. If we abbreviate this formula as inj(f,X, Y ), we’re already in a positionto state in the language of set theory something as non-trivial as Cantor’stheorem: there is no injective function from ℘(X) to X:

∀X ∀Y (℘(X) = Y → ¬∃f inj(f, Y,X))

One might think that set theory requires another axiom that guaranteesthe existence of a set for every defining property. If α(x) is a formula of settheory with the variable x free, we can consider the sentence

∃y ∀x (x ∈ y ↔ α(x)).


2.6. EXPRESSING THE SIZE OF STRUCTURES 33

This sentence states that there is a set y whose elements are all and only thosex that satisfy α(x). This schema is called the “comprehension principle.” Itlooks very useful; unfortunately it is inconsistent. Take α(x) ≡ ¬x ∈ x, thenthe comprehension principle states

∃y ∀x (x ∈ y ↔ x /∈ x),

i.e., it states the existence of a set of all sets that are not elements of themselves.No such set can exist—this is Russell’s Paradox. ZFC, in fact, contains arestricted—and consistent—version of this principle, the separation principle:

∀z ∃y ∀x (x ∈ y ↔ (x ∈ z ∧ α(x)).

2.6 Expressing the Size of Structures

There are some properties of structures we can express even without using thenon-logical symbols of a language. For instance, there are sentences which aretrue in a structure iff the domain of the structure has at least, at most, orexactly a certain number n of elements.

Proposition 26A The sentence

α≥n ≡ ∃x1 ∃x2 . . . ∃xn (x1 6= x2 ∧ x1 6= x3 ∧ x1 6= x4 ∧ · · · ∧ x1 6= xn ∧x2 6= x3 ∧ x2 6= x4 ∧ · · · ∧ x2 6= xn ∧

...

xn−1 6= xn)

is true in a structure A iff |A| contains at least n elements. Consequently,|=A ¬α≥n+1 iff |A| contains at most n elements.

Proposition 26B The sentence

α=n ≡ ∃x1 ∃x2 . . . ∃xn (x1 6= x2 ∧ x1 6= x3 ∧ x1 6= x4 ∧ · · · ∧ x1 6= xn ∧x2 6= x3 ∧ x2 6= x4 ∧ · · · ∧ x2 6= xn ∧

...

xn−1 6= xn ∧∀y (y = x1 ∨ . . . y = xn) . . . ))

is true in a structure A iff |A| contains exactly n elements.

Proposition 26C A structure is infinite iff it is a model of

α≥1, α≥2, α≥3, . . .

There is no single purely logical sentence which is true in A iff |A| is infinite.However, one can give sentences with non-logical predicate symbols which onlyhave infinite models (although not every infinite structure is a model of them).The property of being a finite structure, and the property of being a non-enumerable structure cannot even be expressed with an infinite set of sentences.These facts follow from the compactness and Lowenheim-Skolem theorems.


Chapter 3

The Sequent Calculus

3.1 Rules and Derivations

Let L be a first-order language with the usual constants, variables, logicalsymbols, and auxiliary symbols (parentheses and the comma).

Definition 31A (sequent) A sequent is an expression of the form

Γ ⇒ ∆

where Γ and ∆ are finite (possibly empty) sets of sentences of the language L.The wffs in Γ are the antecedent wffs, while the formulae in ∆ are the succedentwffs.

The intuitive idea behind a sequent is: if all of the antecedent wffs hold,then at least one of the succedent wffs holds. That is, if Γ = Γ1, . . . , Γm and∆ = ∆1, . . . ,∆n, then Γ ⇒ ∆ holds iff

(Γ1 ∧ · · · ∧ Γm)→ (∆1 ∨ · · · ∨∆n)

holds.When m = 0, ⇒ ∆ holds iff ∆1 ∨ · · · ∨∆n holds. When n = 0, Γ ⇒

holds iff Γ1 ∧ · · · ∧ Γm does not.

The empty sequent ⇒ canonically represents a contradiction.

We write Γ, α (or α, Γ ) for Γ ∪ α, and Γ,∆ for Γ ∪∆.

Definition 31B (Inference) An inference is an expression of the form

S1

Sor

S1 S2

S

where S, S1, and S2 are sequents. S1 and S2 are called the upper sequents andS the lower sequent of the inference.

In sequent calculus derivations, a correct inference yields a valid sequent,provided the upper sequents are valid.

For the following, let Γ,∆,Π,Λ represent finite sets of sentences.

35


36 CHAPTER 3. THE SEQUENT CALCULUS

The rules for LK are divided into two main types: structural rules andlogical rules. The logical rules are further divided into propositional rules(quantifier-free) and quantifier rules.

Structural rules: Weakening:

Γ ⇒ ∆WL

α, Γ ⇒ ∆and

Γ ⇒ ∆WR

Γ ⇒ ∆,α

where α is called the weakening wff.A series of weakening inferences will often be indicated by double inference

lines.Cut:

Γ ⇒ ∆,α α,Π ⇒ Λ

Γ,Π ⇒ ∆,Λ

Logical rules: The rules are named by the main operator of the principalwff of the inference (the formula containing α and/or β in the lower sequent).The designations “left” and “right” indicate whether the logical symbol hasbeen introduced in an antecedent formula or a succedent formula (to the leftor to the right of the sequent symbol).

Propositional Rules:

Γ ⇒ ∆,α¬L¬α, Γ ⇒ ∆

α,Γ ⇒ ∆¬R

Γ ⇒ ∆,¬α

α, Γ ⇒ ∆∧L

α ∧ β, Γ ⇒ ∆

β, Γ ⇒ ∆∧L

α ∧ β, Γ ⇒ ∆

Γ ⇒ ∆,α Γ ⇒ ∆,β∧R

Γ ⇒ ∆,α ∧ β

α, Γ ⇒ ∆ β, Γ ⇒ ∆∨L

α ∨ β, Γ ⇒ ∆

Γ ⇒ ∆,α∨R

Γ ⇒ ∆,α ∨ βΓ ⇒ ∆,β

∨RΓ ⇒ ∆,α ∨ β

Γ ⇒ ∆,α β,Π ⇒ Λ→ L

α→ β, Γ,Π ⇒ ∆,Λ

α, Γ ⇒ ∆,β→ R

Γ ⇒ ∆,α→ β

Quantifier Rules:

α(t), Γ ⇒ ∆∀L∀xα(x), Γ ⇒ ∆

Γ ⇒ ∆,α(a)∀R

Γ ⇒ ∆,∀xα(x)

where t is a ground term (i.e., one without variables), and a is a constant whichdoes not occur anywhere in the lower sequent of the ∀R rule. We call a theeigenvariable of the ∀R inference.

α(a), Γ ⇒ ∆∃L∃xα(x), Γ ⇒ ∆

Γ ⇒ ∆,α(t)∃R

Γ ⇒ ∆,∃xα(x)


3.2. EXAMPLES OF DERIVATIONS 37

where t is a ground term, and a is a constant which does not occur in the lowersequent of the ∃L rule. We call a the eigenvariable of the ∃L inference.

The condition that an eigenvariable not occur in the upper sequent of the∀R or ∃L inference is called the eigenvariable condition.

We use the term “eigenvariable” even though a in the above rules is aconstant. This has historical reasons.

In ∃R and ∀L there are no restrictions, and the term t can be anything,so we do not have to worry about any conditions. However, because the tmay appear elsewhere in the sequent, the values of t for which the sequentis satisfied are constrained. On the other hand, in the ∃L and ∀ right rules,the eigenvariable condition requires that a does not occur anywhere else in thesequent. Thus, if the upper sequent is valid, the truth values of the formulasother than α(a) are independent of a.

Definition 31C (Initial Sequent) An initial sequent is a sequent of theform α⇒ α for any sentence α in the language.

Definition 31D (LK derivation) An LK-derivation of a sequent S is atree of sequents satisfying the following conditions:

1. The topmost sequents of the tree are initial sequents.

2. Every sequent in the tree (except S) is an upper sequent of an inferencewhose lower sequent stands directly below that sequent in the tree.

We then say that S is the end-sequent of the derivation and that S is derivablein LK (or LK-derivable).

Definition 31E (LK theorem) A sentence α is a theorem of LK if thesequent ⇒ α is LK-derivable.

3.2 Examples of Derivations

Example 3.2.1. Give an LK-derivation for the sequent α ∧ β ⇒ α.We begin by writing the desired end-sequent at the bottom of the derivation.

α ∧ β ⇒ α

Next, we need to figure out what kind of inference could have a lower sequentof this form. This could be a structural rule, but it is a good idea to startby looking for a logical rule. The only logical connective occurring in a wff inthe lower sequent is ∧, so we’re looking for an ∧ rule, and since the ∧ symboloccurs in the antecedent wffs, we’re looking at the ∧L rule.

∧Lα ∧ β ⇒ α

There are two options for what could have been the upper sequent of the ∧Linference: we could have an upper sequent of α ⇒ α, or of β ⇒ α. Clearly,α⇒ α is an initial sequent (which is a good thing), while β ⇒ α is not derivablein general. We fill in the upper sequent:



α ⇒ α ∧Lα ∧ β ⇒ α

We now have a correct LK-derivation of the sequent α ∧ β ⇒ α.

Example 3.2.2. Give an LK-derivation for the sequent ¬α ∨ β ⇒ α→ β.Begin by writing the desired end-sequent at the bottom of the derivation.

¬α ∨ β ⇒ α→ β

To find a logical rule that could give us this end-sequent, we look at the logicalconnectives in the end-sequent: ¬, ∨, and →. We only care at the momentabout ∨ and→ because they are main operators of sentences in the end-sequent,while ¬ is inside the scope of another connective, so we will take care of it later.Our options for logical rules for the final inference are therefore the ∨L ruleand the→ R rule. We could pick either rule, really, but let’s pick the→ R rule(if for no reason other than it allows us to put off splitting into two branches).According to the form of → R inferences which can yield the lower sequent,this must look like:

α,¬α ∨ β ⇒ β→ R¬α ∨ β ⇒ α→ β

Now we can apply the ∨L rule. According to the schema, this must split intotwo upper sequents as follows:

α,¬α ⇒ β α, β ⇒ β∨L

α,¬α ∨ β ⇒ β→ R¬α ∨ β ⇒ α→ β

Remember that we are trying to wind our way up to initial sequents; we seemto be pretty close! The right branch is just one weakening away from an initialsequent and then it is done:

α,¬α ⇒ β

β ⇒ βWL

α, β ⇒ β∨L

α,¬α ∨ β ⇒ β→ R¬α ∨ β ⇒ α→ β

Now looking at the left branch, the only logical connective in any sentenceis the ¬ symbol in the antecedent sentences, so we’re looking at an instance ofthe ¬L rule.

α ⇒ β, α¬L

α,¬α ⇒ β

β ⇒ βWL

α, β ⇒ β∨L

α,¬α ∨ β ⇒ β→ R¬α ∨ β ⇒ α→ β

Similarly to how we finished off the right branch, we are just one weakeningaway from finishing off this left branch as well.



α ⇒ αWR

α ⇒ β, α¬L

α,¬α ⇒ β

β ⇒ βWL

α, β ⇒ β∨L

α,¬α ∨ β ⇒ β→ R¬α ∨ β ⇒ α→ β

Example 3.2.3. Give an LK-derivation of the sequent ¬α ∨ ¬β ⇒ ¬(α ∧ β)Using the techniques from above, we start by writing the desired end-

sequent at the bottom.

¬α ∨ ¬β ⇒ ¬(α ∧ β)

The available main connectives of sentences in the end-sequent are the ∨ symboland the ¬ symbol. It would work to apply either the ∨L or the ¬R rule here,but we start with the ¬R rule because it avoids splitting up into two branchesfor a moment:

α ∧ β,¬α ∨ ¬β ⇒¬R¬α ∨ ¬β ⇒ ¬(α ∧ β)

Now we have a choice of whether to look at the ∧L or the ∨L rule. Let’s seewhat happens when we apply the ∧L rule: we have a choice to start with eitherthe sequent α,¬α ∨ β ⇒ or the sequent β,¬α ∨ β ⇒ . Since the proof issymmetric with regards to α and β, let’s go with the former:

α,¬α ∨ ¬β ⇒∧L

α ∧ β,¬α ∨ ¬β ⇒¬R¬α ∨ ¬β ⇒ ¬(α ∧ β)

Continuing to fill in the derivation, we see that we run into a problem:

α ⇒ α ¬Lα,¬α ⇒

?α ⇒ β

¬Lα,¬β ⇒

∨Lα,¬α ∨ ¬β ⇒

∧Lα ∧ β,¬α ∨ ¬β ⇒

¬R¬α ∨ ¬β ⇒ ¬(α ∧ β)

The top of the right branch cannot be reduced any further, and it cannot bebrought by way of structural inferences to an initial sequent, so this is not theright path to take. So clearly, it was a mistake to apply the ∧L rule above.Going back to what we had before and carrying out the ∨L rule instead, weget

α ∧ β,¬α ⇒ α ∧ β,¬β ⇒∨L

α ∧ β,¬α ∨ ¬β ⇒¬R¬α ∨ ¬β ⇒ ¬(α ∧ β)



Completing each branch as we’ve done before, we get

α ⇒ α ∧Lα ∧ β ⇒ α

¬Lα ∧ β,¬α ⇒

β ⇒ β∧L

α ∧ β ⇒ β¬L

α ∧ β,¬β ⇒∨L

α ∧ β,¬α ∨ ¬β ⇒¬R¬α ∨ ¬β ⇒ ¬(α ∧ β)

(We could have carried out the ∧ rules lower than the ¬ rules in these stepsand still obtained a correct derivation).

Example 3.2.4. Give an LK-derivation of the sequent ∃x¬α(x)⇒ ¬∀xα(x).When dealing with quantifiers, we have to make sure not to violate the

eigenvariable condition, and sometimes this requires us to play around withthe order of carrying out certain inferences. In general, it helps to try andtake care of rules subject to the eigenvariable condition first (they will be lowerdown in the finished proof). Also, it is a good idea to try and look ahead andtry to guess what the initial sequent might look like. In our case, it will have tobe something like α(a)⇒ α(a). That means that when we are “reversing” thequantifier rules, we will have to pick the same term—what we will call a—forboth the ∀ and the ∃ rule. If we picked different terms for each rule, we wouldend up with something like α(a)⇒ α(b), which, of course, is not derivable.

Starting as usual, we write

∃x¬α(x) ⇒ ¬∀xα(x)

We could either carry out the ∃L rule or the ¬R rule. Since the ∃L rule issubject to the eigenvariable condition, it’s a good idea to take care of it soonerrather than later, so we’ll do that one first.

¬α(a) ⇒ ¬∀xα(x)∃L∃x¬α(x) ⇒ ¬∀xα(x)

Applying the ¬L and ¬R rules and then eliminating the double ¬ signs on bothsides—the reader may do this as an exercise—we get

∀xα(x) ⇒ α(a)

¬α(a) ⇒ ¬∀xα(x)∃L∃x¬α(x) ⇒ ¬∀xα(x)

At this point, our only option is to carry out the ∀L rule. Since this rule is notsubject to the eigenvariable restriction, we’re in the clear. Remember, we wantto try and obtain an initial sequent (of the form α(a) ⇒ α(a)), so we shouldchoose a as our argument for α when we apply the rule.


3.3. PROOF-THEORETIC NOTIONS 41

α(a) ⇒ α(a)∀L∀xα(x) ⇒ α(a)

¬α(a) ⇒ ¬∀xα(x)∃L∃x¬α(x) ⇒ ¬∀xα(x)

It is important, especially when dealing with quantifiers, to double check atthis point that the eigenvariable condition has not been violated. Since theonly rule we applied that is subject to the eigenvariable condition was ∃L, andthe eigenvariable a does not occur in its lower sequent (the end-sequent), thisis a correct derivation.

3.3 Proof-Theoretic Notions

Just as we’ve defined a number of important semantic notions (validity, en-tailment, satisfiabilty), we now define corresponding proof-theoretic notions.These are not defined by appeal to satisfaction of sentences in structures, butby appeal to the derivability or non-derivability of certain sequents. It was animportant discovery, due to Godel, that these notions coincide. That they dois the content of the completeness theorem.

Definition 33A (Theorems) A sentence α is a theorem if there is a deriva-tion in LK of the sequent ⇒ α. We write `LK α if α is a theorem and 0LK αif it is not.

Definition 33B (Derivability) A sentence α is derivable from a set ofsentences Γ , Γ `LK α, if there is a finite subset Γ0 ⊆ Γ such that LK derivesΓ0 ⇒ α. If α is not derivable from Γ we write Γ 0LK α.

Definition 33C (Consistency) A set of sentences Γ is consistent iff Γ 0LK

⊥. If Γ is not consistent, i.e., if Γ `LK ⊥, we say it is inconsistent.

Proposition 33D Γ `LK α iff Γ ∪ ¬α is inconsistent.

Proof. Exercise.

Proposition 33E Γ is inconsistent iff Γ `LK α for every sentence α.

Proof. Exercise.

Proposition 33F Γ `LK α iff for some finite Γ0 ⊆ Γ , Γ0 `LK α.

Proof. Follows immediately from the definion of `LK.



3.4 Properties of Derivability

We will now establish a number of properties of the derivability relation. Theyare independently interesting, but each will play a role in the proof of thecompleteness theorem.

Proposition 34A (Monotony) If Γ ⊆ ∆ and Γ `LK α, then ∆ `LK α.

Proof. Any finite Γ0 ⊆ Γ is also a finite subset of ∆, so a derivation of Γ0 ⇒ αalso shows ∆ `LK α.

Proposition 34B If Γ `LK α and Γ ∪ α `LK ⊥, then Γ is inconsistent.

Proof. There are finite Γ0 and Γ1 ⊆ Γ such that LK derives Γ0 ⇒ α andΓ1, α⇒ ⊥. Let the LK-derivation of Γ0 ⇒ α be Π0 and the LK-derivation ofΓ1, α⇒ ⊥ be Π1. We can then derive

Π0

Γ0 ⇒ α

Π1

Γ1, α ⇒ ⊥cut

Γ0, Γ1 ⇒ ⊥Since Γ0 ⊆ Γ and Γ1 ⊆ Γ , Γ0 ∪ Γ1 ⊆ Γ , hence Γ `LK ⊥.

Proposition 34C If Γ ∪ α `LK ⊥, then Γ `LK ¬α.

Proof. Suppose that Γ ∪ α `LK ⊥. Then there is a finite set Γ0 ⊆ Γ suchthat LK derives Γ0, α ⇒ ⊥. Let Π0 be an LK-derivation of Γ0, α ⇒ ⊥, andconsider

Π0

Γ0, α ⇒ ⊥ ¬RΓ0 ⇒ ⊥,¬α ⊥ ⇒

cutΓ0 ⇒ ¬α

Proposition 34D If Γ ∪ α `LK ⊥ and Γ ∪ ¬α `LK ⊥, then Γ `LK ⊥.

Proof. There are finite sets Γ0 ⊆ Γ and Γ1 ⊆ Γ and LK-derivations Π0 andΠ1 of Γ0, α⇒ ⊥ and Γ1,¬α⇒ ⊥, respectively. We can then derive

Π0

Γ0, α ⇒ ⊥ ¬RΓ0 ⇒ ⊥,¬α

Π1

Γ1,¬α ⇒ ⊥cut

Γ0, Γ1 ⇒ ⊥


3.4. PROPERTIES OF DERIVABILITY 43

Since Γ0 ⊆ Γ and Γ1 ⊆ Γ , Γ0 ∪ Γ1 ⊆ Γ . Hence Γ `LK ⊥.

Proposition 34E If Γ `LK α and Γ `LK α→ β, then Γ `LK β.

Proof. Suppose that Γ `LK α and Γ `LK α → β. There are finite setsΓ0, Γ1 ⊆ Γ such that there are LK-derivations Π0 of Γ0 ⇒ α and Π1 ofΓ1 ⇒ α→ β. Consider:

Π0

Γ1 ⇒ α→ β

Π1

Γ0 ⇒ α β ⇒ β→ L

Γ0, α→ β ⇒ βcut

Γ0, Γ1 ⇒ β

Since Γ0 ∪ Γ1 ⊆ Γ , this means that Γ `LK β.

Proposition 34F If Γ `LK ¬α or Γ `LK β, then Γ `LK α→ β.

Proof. First suppose Γ `LK ¬α. Then for some finite Γ0 ⊆ Γ there is a LK-derivation of Γ0 ⇒ ¬α. The following derivation shows that Γ `LK α→ β:

Π0

Γ0 ⇒ ¬α

α ⇒ α ¬R¬α, α ⇒WR

α,¬α ⇒ β→ R¬α ⇒ α→ βcut

Γ0 ⇒ α→ β

Now suppose Γ `LK β. Then for some finite Γ0 ⊆ Γ there is a LK-derivation of Γ0 ⇒ β. The following derivation shows that Γ `LK α→ β:

Π0

Γ0 ⇒ β

β ⇒ βWL

α, β ⇒ β→ R

β ⇒ α→ βcut

Γ0 ⇒ α→ β

Theorem 34G If c is a constant not occurring in Γ or α(x) and Γ `LK α(c),then Γ `LK ∀xα(x).

Proof. Let Π0 be an LK-derivation of Γ0 ⇒ α(c) for some finite Γ0 ⊆ Γ . Byadding a ∀R inference, we obtain a proof of Γ ⇒ ∀xα(x), since c does notoccur in Γ or α(x) and thus the eigenvariable condition is satisfied.

Theorem 34H If Γ `LK ∀xα(x) then Γ ` α(t).



Proof. Suppose Γ `LK ∀xα(x). Then there is a finite Γ0 ⊆ Γ and an LK-derivation Π of Γ0 ⇒ ∀xα(x). Then

Π

Γ0 ⇒ ∀xα(x)

α(t) ⇒ α(t)∀L∀xα(x) ⇒ α(t)cut

Γ0 ⇒ α(t)

shows that Γ `LK α(t).

3.5 Soundness

A derivation system, such as the sequent calculus, is sound if it cannot derivethings that do not actually hold. Soundness is thus a kind of guaranteed safetyproperty for derivation systems. Depending on which proof theoretic propertyis in question, we would like to know for instance, that

1. every derivable sentence is valid;

2. if a sentence is derivable from some others, it is also a consequence ofthem;

3. if a set of sentences is inconsistent, it is unsatisfiable.

These are important properties of a derivation system. If any of them donot hold, the derivation system is deficient—it would derive too much. Con-sequently, establishing the soundness of a derivation system is of the utmostimportance.

Because all these proof-theoretic properties are defined via derivability inthe sequent calculus of certain sequents, proving (1)–(3) above requires provingsomething about the semantic properties of derivable sequents. We will firstdefine what it means for a sequent to be valid, and then show that everyderivable sequent is valid. (1)–(3) then follow as corollaries from this result.

Definition 35A A structure A satisfies a sequent Γ ⇒ ∆ iff either 6|=A φ forsome φ ∈ Γ or |=A φ for some φ ∈ ∆.

A sequent is valid iff every structure A satisfies it.

Theorem 35B (Soundness) If LK derives Γ ⇒ ∆, then Γ ⇒ ∆ is valid.

Proof. Let Π be a derivation of Γ ⇒ ∆. We proceed by induction on thenumber of inferences in Π.

If the number of inferences is 0, then Π consists only of an initial sequent.Every initial sequent α⇒ α is obviously valid, since for every A, either 6|=A αor |=A α.

If the number of inferences is greater than 0, we distinguish cases accordingto the type of the lowermost inference. By induction hypothesis, we can assumethat the premises of that inference are valid, since the height of the proof ofany premise is smaller than n.


3.5. SOUNDNESS 45

First, we consider the possible inferences with only one premise Γ ′ ⇒ ∆′.

1. The last inference is a weakening. Then Γ ′ ⊆ Γ and ∆ = ∆′ if it’s aweakening on the left, or Γ = Γ ′ and ∆′ ⊆ ∆ if it’s a weaking on theright. In either case, ∆′ ⊆ ∆ and Γ ′ ⊆ Γ . If 6|=A φ for some φ ∈ Γ ′, then,since Γ ′ ⊆ Γ , φ ∈ Γ as well, and so 6|=A φ for the same φ ∈ Γ . Similarly,if |=A φ for some φ ∈ ∆′, as φ ∈ ∆, |=A φ for some φ ∈ ∆. Since Γ ′ ⇒ ∆′

is valid, one of these cases obtains for every A. Consequently, Γ ⇒ ∆ isvalid.

2. The last inference is ¬L: Then for some α ∈ ∆′, ¬α ∈ Γ . Also, Γ ′ ⊆ Γ ,and ∆′ \ α ⊆ ∆.

If |=A α, then 6|=A ¬α, and since ¬α ∈ Γ , A satisfies Γ ⇒ ∆. SinceΓ ′ ⇒ ∆′ is valid, if 6|=A α, then either 6|=A φ for some φ ∈ Γ ′ or |=A φ forsome φ ∈ ∆′ different from α. Consequently, 6|=A φ for some φ ∈ Γ (sinceΓ ′ ⊆ Γ ) or |=A φ for some φ ∈ ∆′ different from α (since ∆′ \ α ⊆ ∆).

3. The last inference is ¬R: Exercise.

4. The last inference is ∧L: There are two variants: α ∧ β may be inferredon the left from α or from β on the left side of the premise. In the firstcase, α ∈ Γ ′. Consider a structure A. Since Γ ′ ⇒ ∆′ is valid, (a) 6|=A α,(b) 6|=A φ for some φ ∈ Γ ′ \ α, or (c) |=A φ for some φ ∈ ∆′. In case(a), 6|=A α ∧ β. In case (b), there is an φ ∈ Γ \ α ∧ β such that 6|=A φ,since Γ ′ \ α ⊆ Γ \ α ∧ β. In case (c), there is a φ ∈ ∆ such that|=A φ, as ∆ = ∆′. So in each case, A satisfies α ∧ β, Γ ⇒ ∆. Since Awas arbitrary, Γ ⇒ ∆ is valid. The case where α ∧ β is inferred from βis handled the same, changing α to β.

5. The last inference is ∨R: There are two variants: α ∨ β may be inferredon the right from α or from β on the right side of the premise. In the firstcase, α ∈ ∆′. Consider a structure A. Since Γ ′ ⇒ ∆′ is valid, (a) |=A α,(b) 6|=A φ for some φ ∈ Γ ′, or (c) |=A φ for some φ ∈ ∆′ \ α. In case(a), |=A α ∨ β. In case (b), there is φ ∈ Γ such that 6|=A φ, as Γ = Γ ′.In case (c), there is an φ ∈ ∆ such that |=A φ, since ∆′ \ α ⊆ ∆. Soin each case, A satisfies α ∧ β, Γ ⇒ ∆. Since A was arbitrary, Γ ⇒ ∆is valid. The case where α ∨ β is inferred from β is handled the same,changing α to β.

6. The last inference is → R: Then α ∈ Γ ′, β ∈ ∆′, Γ ′ \ α ⊆ Γ and∆′ \ β ⊆ ∆. Since Γ ′ ⇒ ∆′ is valid, for any structure A, (a) 6|=A α, (b)|=A β, (c) 6|=A φ for some φ ∈ Γ ′ \ α, or |=A φ for some φ ∈ ∆′ \ β.In cases (a) and (b), |=A α → β. In case (c), for some φ ∈ Γ , 6|=A φ. Incase (d), for some φ ∈ ∆, |=A φ. In each case, A satisfies Γ ⇒ ∆. SinceA was arbitrary, Γ ⇒ ∆ is valid.

7. The last inference is ∀L: Then there is a wff α(x) and a ground term tsuch that α(t) ∈ Γ ′, ∀xα(x) ∈ Γ , and Γ ′ \ α(t) ⊆ Γ . Consider a



structure A. Since Γ ′ ⇒ ∆′ is valid, (a) 6|=A α(t), (b) 6|=A φ for someφ ∈ Γ ′ \α(t), or (c) |=A φ for some φ ∈ ∆′. In case (a), 6|=A ∀xα(x). Incase (b), there is an φ ∈ Γ \ α(t) such that 6|=A φ. In case (c), there isa φ ∈ ∆ such that |=A φ, as ∆ = ∆′. So in each case, A satisfies Γ ⇒ ∆.Since A was arbitrary, Γ ⇒ ∆ is valid.

8. The last inference is ∃R: Exercise.

9. The last inference is ∀R: Then there is a wff α(x) and a constant symbol asuch that α(a) ∈ ∆′, ∀xα(x) ∈ ∆, and ∆′ \ α(a) ⊆ ∆. Furthermore,a /∈ Γ ∪∆. Consider a structure A. Since Γ ′ ⇒ ∆′ is valid, (a) |=A α(a),(b) 6|=A φ for some φ ∈ Γ ′, or (c) |=A φ for some φ ∈ ∆′ \ α(a).

First, suppose (a) is the case but neither (b) nor (c), i.e., |=A φ for allφ ∈ Γ ′ and 6|=A φ for all φ ∈ ∆′\α(a). In other words, assume |=A α(a)and that A does not satisfy Γ ′ ⇒ ∆′ \ α(a). Since a /∈ Γ ∪ ∆, alsoa /∈ Γ ′∪(∆′\α(a)). Thus, if M′ is like A except that aA 6= aA

′, M′ also

does not satisfy Γ ′ ⇒ ∆′ \ α(a) by extensionality. But since Γ ′ ⇒ ∆′

is valid, we must have |=A′ α(a).

We now show that |=A ∀xα(x). To do this, we have to show that forevery variable assignment s, |=A ∀xα(x)[s]. This in turn means thatfor every x-variant s′ of s, we must have |=A α(x)[s′]. So consider anyvariable assignment s and let s′ be an x-variant of s. Since Γ ′ and∆′ consist entirely of sentences, |=A φ[s] iff |=A φ[s′] iff |=A φ for allφ ∈ Γ ′ ∪∆′. Let A′ be like A except that aA

′= s′(x). Then |=A α(x)[s′]

iff |=A′ α(a) (as α(x) does not contain a). Since we’ve already establishedthat |=A′ α(a) for all A′ which differ from A at most in what they assignto a, this means that |=A α(x)[s′]. Thus weve shown that |=A ∀xα(x)[s].Since s is an arbitrary variable assignment and ∀xα(x) is a sentence,then |=A ∀xα(x).

If (b) is the case, there is a φ ∈ Γ such that 6|=A φ, as Γ = Γ ′. If (c) isthe case, there is an φ ∈ ∆′ \ α(a) such that |=A φ. So in each case, Asatisfies Γ ⇒ ∆. Since A was arbitrary, Γ ⇒ ∆ is valid.

10. The last inference is ∃L: Exercise.

Now let’s consider the possible inferences with two premises.

1. The last inference is a cut: Suppose the premises are Γ ′ ⇒ ∆′ andΠ ′ ⇒ Λ′ and the cut formula α is in both ∆′ and Π ′. Since each is valid,every structure A satisfies both premises. We distinguish two cases: (a)6|=A α and (b) |=A α. In case (a), in order for A to satisfy the left premise,it must satisfy Γ ′ ⇒ ∆′ \ α. But Γ ′ ⊆ Γ and ∆′ \ α ⊆ ∆, so A alsosatisfies Γ ⇒ ∆. In case (b), in order for A to satisfy the right premise,it must satisfy Π ′ \ α ⇒ Λ′. But Π ′ \ α ⊆ Γ and Λ′ ⊆ ∆, so A alsosatisfies Γ ⇒ ∆.


3.6. DERIVATIONS WITH EQUALITY SYMBOL 47

2. The last inference is ∧R. The premises are Γ ⇒ ∆′ and Γ ⇒ ∆′′, whereα ∈ ∆′ an β ∈ ∆′′. By induction hypothesis, both are valid. Consider astructure A. We have two cases: (a) 6|=A α ∧ β or (b) |=A α ∧ β. In case(a), either 6|=A α or 6|=A β. In the former case, in order for A to satisfyΓ ⇒ ∆′, it must already satisfy Γ ⇒ ∆′ \α. In the latter case, it mustsatisfy Γ ⇒ ∆′′ \ β. But since both ∆′ \ α ⊆ ∆ and ∆′′ \ β ⊆ ∆,that means A satisfies Γ ⇒ ∆. In case (b), A satisfies Γ ⇒ ∆ sinceα ∧ β ∈ ∆.

3. The last inference is ∨L: Exercise.

4. The last inference is → L. The premises are Γ ⇒ ∆′ and Γ ′ ⇒ ∆, whereα ∈ ∆′ and β ∈ Γ ′. By induction hypothesis, both are valid. Considera structure A. We have two cases: (a) |=A α → β or (b) 6|=A α → β.In case (a), either 6|=A α or |=A β. In the former case, in order for Ato satisfy Γ ⇒ ∆′, it must already satisfy Γ ⇒ ∆′ \ α. In the lattercase, it must satisfy Γ ′ \ β ⇒ ∆. But since both ∆′ \ α ⊆ ∆ andΓ ′ \ β ⊆ Γ , that means A satisfies Γ ⇒ ∆. In case (b), A satisfiesΓ ⇒ ∆ since α→ β ∈ Γ .

Corollary 3.5.3 If `LK α then α is valid.

Corollary 3.5.4 If Γ `LK α then Γ α.

Proof. If Γ `LK α then for some finite subset Γ0 ⊆ Γ , there is a derivation ofΓ0 ⇒ α. By Theorem 35B, every structure A either makes some β ∈ Γ0 falseor makes α true. Hence, if |=A Γ then also |=A α.

Corollary 3.5.5 If Γ is satisfiable, then it is consistent.

Proof. We prove the contrapositive. Suppose that Γ is not consistent. ThenΓ `LK ⊥, i.e., there is a finite Γ0 ⊆ Γ and a derivation of Γ0 ⇒ ⊥. ByTheorem 35B, Γ0 ⇒ ⊥ is valid. Since 6|=A ⊥ for every structure A, for A tosatisfy Γ0 ⇒ ⊥ there must be an φ ∈ Γ0 so that 6|=A φ, and since Γ0 ⊆ Γ , thatφ is also in Γ . In other words, no A satisfies Γ , i.e., Γ is not satisfiable.

3.6 Derivations with Equality symbol

Derivations with the equality symbol require additional inference rules.

Initial sequents for =: If t is a closed term, then ⇒ t = t is an initialsequent.



Rules for =:

Γ, t1 = t2 ⇒ ∆,α(t1)=

Γ, t1 = t2 ⇒ ∆,α(t2)and

Γ, t1 = t2 ⇒ ∆,α(t2)=

Γ, t1 = t2 ⇒ ∆,α(t1)

where t1 and t2 are closed terms.

Example 3.6.1. If s and t are ground terms, then α(s), s = t `LK α(t):

α(s) ⇒ α(s)WL

α(s), s = t ⇒ α(s)=

α(s), s = t ⇒ α(t)

This may be familiar as the principle of substitutability of identicals, or Leibniz’Law.

LK proves that = is symmetric and transitive:

⇒ t1 = t1WLt1 = t2 ⇒ t1 = t1 =

t1 = t2 ⇒ t2 = t1

t1 = t2 ⇒ t1 = t2WLt1 = t2, t2 = t3 ⇒ t1 = t2 =

t1 = t2, t2 = t3 ⇒ t1 = t3

In the proof on the left, the wff x = t1 is our α(x). On the right, we take α(x)to be t1 = x.

Proposition 36B LK with initial sequents and rules for identity is sound.

Proof. Initial sequents of the form ⇒ t = t are valid, since for every struc-ture A, |=A t = t. (Note that we assume the term t to be ground, i.e., itcontains no variables, so variable assignments are irrelevant).

Suppose the last inference in a derivation is =. Then the premise Γ ′ ⇒ ∆′

contains t1 = t2 on the left and α(t1) on the right, and the conclusion is Γ ⇒ ∆where Γ = Γ ′ and ∆ = (∆′ \α(t1))∪α(t2). Consider a structure A. Since,by induction hypothesis, the premise Γ ′ ⇒ ∆′ is valid, either (a) for someφ ∈ Γ ′, 6|=A φ, (b) for some φ ∈ ∆′ \ α(s), |=A φ, or (c) |=A α(t1). Inboth cases cases (a) and (b), since Γ = Γ ′, and ∆′ \ α(s) ⊆ ∆, A satisfiesΓ ⇒ ∆. So assume cases (a) and (b) do not apply, but case (c) does. If (a)does not apply, |=A φ for all φ ∈ Γ ′, in particular, |=A t1 = t2. Therefore,t1

A = t2A. Let s be any variable assignment, and s′ be the x-variant given

by s′(x) = t1A = t2

A. By Proposition 112C, |=A α(t2)[s] iff |=A α(x)[s′] iff|=A α(t1)[s]. Since |=A α(t1) therefore |=A α(t2).


Chapter 4

Natural Deduction

4.1 Introduction

Logical systems commonly have not just a semantics, but also proof systems.The purpose of proof systems is to provide a purely syntactic method of es-tablishing entailment and validity. They are purely syntactic in the sense thata derivation in such a system is a finite syntactic object, usually a sequence(or other finite arrangement) of wffs. Moreover, good proof systems have theproperty that any given sequence or arrangement of wffs can be verified me-chanically to be a “correct” proof. The simplest (and historically first) proofsystems for first-order logic were axiomatic. A sequence of wffs counts as aderivation in such a system if each individual wff in it is either among a fixedset of “axioms” or follows from wffs coming before it in the sequence by oneof a fixed number of “inference rules”—and it can be mechanically verified ifa wff is an axiom and whether it follows correctly from other wffs by one ofthe inference rules. Axiomatic proof systems are easy to describe—and alsoeasy to handle meta-theoretically—but derivations in them are hard to readand understand, and are also hard to produce.

Other proof systems have been developed with the aim of making it eas-ier to construct derivations or easier to understand derivations once they arecomplete. Examples are truth trees, also known as tableaux proofs, and thesequent calculus. Some proof systems are designed especially with mechaniza-tion in mind, e.g., the resolution method is easy to implement in software (butits derivations are essentially impossible to understand). Most of these otherproof systems represent derivations as trees of wffs rather than sequences. Thismakes it easier to see which parts of a derivation depend on which other parts.

The proof system we will study is Gentzen’s natural deduction. Naturaldeduction is intended to mirror actual reasoning (especially the kind of reg-imented reasoning employed by mathematicians). Actual reasoning proceedsby a number of “natural” patterns. For instance proof by cases allows us toestablish a conclusion on the basis of a disjunctive premise, by establishing thatthe conclusion follows from either of the disjuncts. Indirect proof allows us to

49


50 CHAPTER 4. NATURAL DEDUCTION

establish a conclusion by showing that its negation leads to a contradiction.Conditional proof establishes a conditional claim “if . . . then . . . ” by showingthat the consequent follows from the antecedent. Natural deduction is a for-malization of some of these natural inferences. Each of the logical connectivesand quantifiers comes with two rules, an introduction and an elimination rule,and they each correspond to one such natural inference pattern. For instance,→ Intro corresponds to conditional proof, and ∨Elim to proof by cases.

One feature that distinguishes natural deduction from other proof systemsis its use of assumptions. In almost every proof system a single wff is at theroot of the tree of wffs—usually the conclusion—and the “leaves” of the tree arewffs from which the conclusion is derived. In natural deduction, some leaf wffsplay a role inside the derivation but are “used up” by the time the derivationreaches the conclusion. This corresponds to the practice, in actual reasoning,of introducing hypotheses which only remain in effect for a short while. Forinstance, in a proof by cases, we assume the truth of each of the disjuncts; inconditional proof, we assume the truth of the antecedent; in indirect proof, weassume the truth of the negation of the conclusion. This way of introducinghypotheticals and then doing away with them in the service of establishing anintermediate step is a hallmark of natural deduction. The formulas at the leavesof a natural deduction derivation are called assumptions, and some of the rulesof inference may “discharge” them. An assumption that remains undischargedat the end of the derivation is (usually) essential to the truth of the conclusion,and so a derivation establishes that its undischarged assumptions entail itsconclusion.

For any proof system it is crucial to verify that it in fact does what it’ssupposed to: provide a way to verify that a sentence is entailed by some others.This is called soundness; and we will prove it for the natural deduction systemwe use. It is also crucial to verify the converse: that the proof system is strongenough to verify that Γ α whenever this holds, that there is a derivation of αfrom Γ whenever Γ α. This is called completeness—but it is much harder toprove.

4.2 Rules and Derivations

Let L be a first-order language with the usual constant symbols, variables,logical symbols, and auxiliary symbols (parentheses and the comma).

Definition 42A (Inference) An inference is an expression of the form

αγ or α β

γ

where α, β, and γ are wffs. α and β are called the upper wffs or premises andγ the lower wffs or conclusion of the inference.

The rules for natural deduction are divided into two main types: proposi-tional rules (quantifier-free) and quantifier rules. The rules come in pairs, anintroduction and an elimination rule for each logical operator. They introduced


4.2. RULES AND DERIVATIONS 51

a logical operator in the conclusion or remove a logical operator from a premiseof the rule. Some of the rules allow an assumption of a certain type to be dis-charged. To indicate which assumption is discharged by which inference, wealso assign labels to both the assumption and the inference. This is indicatedby writing the assumption formula as “[α]n”.

It is customary to consider rules for all logical operators, even for those (ifany) that we consider as defined.

Propositional Rules

Rules for ⊥α ¬α ⊥Intro⊥

⊥ ⊥Elimα

Rules for ∧

α β∧Intro

α ∧ βα ∧ β

∧Elimαα ∧ β

∧Elimβ

Rules for ∨

α ∨Introα ∨ β

β∨Intro

α ∨ βα ∨ β

[α]n

γ

[β]n

γn ∨Elimγ

Rules for ¬[α]n

⊥n ¬Intro¬α

¬¬α ¬Elimα

Rules for →

[α]n

βn → Introα→ β

α α→ β→ Elim

β

Quantifier Rules

Rules for ∀α(a)

∀Intro∀xα(x)

∀xα(x)∀Elim

α(t)



where t is a ground term (a term that does not contain any variables), anda is a constant symbol which does not occur in α, or in any assumption whichis undischarged in the derivation ending with the premise α. We call a theeigenvariable of the ∀Intro inference.

Rules for ∃

αt ∃Intro∃xαx∃xαx

[αa]n

γn ∃Elimγ

where t is a ground term, and a is a constant which does not occur in thepremise ∃xα(x), in γ, or any assumption which is undischarged in the deriva-tions ending with the two premises γ (other than the assumptions α(a)). Wecall a the eigenvariable of the ∃Elim inference.

The condition that an eigenvariable not occur in the upper sequent of the∀ intro or ∃ elim inference is called the eigenvariable condition.

We use the term “eigenvariable” even though a in the above rules is aconstant. This has historical reasons.

In ∃Intro and ∀Elim there are no restrictions, and the term t can be any-thing, so we do not have to worry about any conditions. However, because thet may appear elsewhere in the derivation, the values of t for which the wff issatisfied are constrained. On the other hand, in the ∃Elim and ∀ intro rules,the eigenvariable condition requires that a does not occur anywhere else in theformula. Thus, if the upper wff is valid, the truth values of the formulas otherthan αa are independent of a.

Natural deduction systems are meant to closely parallel the informal rea-soning used in mathematical proof (hence it is somewhat “natural”). Naturaldeduction proofs begin with assumptions. Inference rules are then applied. As-sumptions are “discharged” by the ¬Intro,→ Intro, ∨Elim and ∃Elim inferencerules, and the label of the discharged assumption is placed beside the inferencefor clarity.

Definition 42B (Initial Wff) An initial wff or assumption is any wff inthe topmost position of any branch.

Definition 42C (Derivation) A derivation of a wff α from assumptions Γis a tree of wffs satisfying the following conditions:

1. The topmost wffs of the tree are either in Γ or are discharged by aninference in the tree.

2. Every wff in the tree is an upper wff of an inference whose lower wffstands directly below that wff in the tree.

We then say that α is the end-wff of the derivation and that α is derivablefrom Γ .



4.3 Examples of Derivations

Example 4.3.1. Let’s give a derivation of the wff (α ∧ β)→ α.We begin by writing the desired end-wff at the bottom of the derivation.

(α ∧ β)→ α

Next, we need to figure out what kind of inference could result in a wff ofthis form. The main operator of the end-wff is →, so we’ll try to arrive atthe end-wff using the → Intro rule. It is best to write down the assumptionsinvolved and label the inference rules as you progress, so it is easy to see whetherall assumptions have been discharged at the end of the proof.

[α ∧ β]1

α1 → Intro

(α ∧ β)→ α

We now need to fill in the steps from the assumption α ∧ β to α. Since weonly have one connective to deal with, ∧, we must use the ∧ elim rule. Thisgives us the following proof:

[α ∧ β]1∧Elimα

1 → Intro(α ∧ β)→ α

We now have a correct derivation of the formula (α ∧ β)→ α.

Example 4.3.2. Now let’s give a derivation of the wff (¬α ∨ β)→ (α→ β).We begin by writing the desired end-wff at the bottom of the derivation.

(¬α ∨ β)→ (α→ β)

To find a logical rule that could give us this end-wff, we look at the logicalconnectives in the end-wff: ¬, ∨, and →. We only care at the moment aboutthe first occurence of → because it is the main operator of the sentence in theend-sequent, while ¬, ∨ and the second occurence of → are inside the scope ofanother connective, so we will take care of those later. We therefore start withthe → Intro rule. A correct application must look as follows:

[¬α ∨ β]1

α→ β1 → Intro

(¬α ∨ β)→ (α→ β)

This leaves us with two possibilities to continue. Either we can keep workingfrom the bottom up and look for another application of the→ Intro rule, or we



can work from the top down and apply a ∨Elim rule. Let us apply the latter.We will use the assumption ¬α ∨ β as the leftmost premise of ∨Elim. For avalid application of ∨Elim, the other two premises must be identical to theconclusion α→ β, but each may be derived in turn from another assumption,namely the two disjuncts of ¬α ∨ β. So our derivation will look like this:

[¬α ∨ β]1

[¬α]2

α→ β

[β]2

α→ β2 ∨Elim

α→ β1 → Intro

(¬α ∨ β)→ (α→ β)

In each of the two branches on the right, we want to derive α → β, whichis best done using → Intro.

[¬α ∨ β]1

[¬α]2, [α]3

β3 → Introα→ β

[β]2, [α]4


2 ∨Elimα→ β

1 → Intro(¬α ∨ β)→ (α→ β)

For the two missing parts of the derivation, we need derivations of β from¬α and α in the middle, and from α and β on the left. Let’s take the formerfirst. ¬α and α are the two premises of ⊥Intro:

[¬α]2 [α]3⊥Intro⊥

β

By using ⊥Elim, we can obtain β as a conclusion and complete the branch.

[¬α ∨ β]1

[¬α]2 [α]3⊥Intro⊥ ⊥Elim


[β]2, [α]4


2 ∨Elimα→ β

1 → Intro(¬α ∨ β)→ (α→ β)

Let’s now look at the rightmost branch. Here it’s important to realizethat the definition of derivation allows assumptions to be discharged but does



not require them to be. In other words, if we can derive β from one of theassumptions α and β without using the other, that’s ok. And to derive β fromβ is trivial: β by itself is such a derivation, and no inferences are needed. Sowe can simply delete the assumtion α.

[¬α ∨ β]1

[¬α]2 [α]3⊥Intro⊥ ⊥Elim


[β]2→ Intro

α→ β2 ∨Elim

α→ β1 → Intro

(¬α ∨ β)→ (α→ β)

Note that in the finished derivation, the rightmost → Intro inference does notactually discharge any assumptions.

Example 4.3.3. When dealing with quantifiers, we have to make sure not toviolate the eigenvariable condition, and sometimes this requires us to playaround with the order of carrying out certain inferences. In general, it helpsto try and take care of rules subject to the eigenvariable condition first (theywill be lower down in the finished proof).

Let’s see how we’d give a derivation of the wff ∃x¬α(x)→ ¬∀xα(x). Start-ing as usual, we write

∃x¬α(x)→ ¬∀xα(x)

We start by writing down what it would take to justify that last step using the→ Intro rule.

[∃x¬α(x)]1

¬∀xα(x)→ Intro∃x¬α(x)→ ¬∀xα(x)

Since there is no obvious rule to apply to ¬∀xα(x), we will proceed by settingup the derivation so we can use the ∃Elim rule. Here we must pay attentionto the eigenvariable condition, and choose a constant that does not appear in∃xα(x) or any assumptions that it depends on. (Since no constant symbolsappear, however, any choice will do fine.)

[∃x¬α(x)]1

[¬α(a)]2

¬∀xα(x)2 ∃Elim¬∀xα(x)

→ Intro∃x¬α(x)→ ¬∀xα(x)

In order to derive ¬∀xα(x), we will attempt to use the ¬Intro rule: this re-quires that we derive a contradiction, possibly using ∀xα(x) as an additional



assumption. Of coursem, this contradiction may involve the assumption ¬α(a)which will be discharged by the → Intro inference. We can set it up as follows:

[∃x¬α(x)]1

[¬α(a)]2, [∀xα(x)]3

⊥3 ¬Intro¬∀xα(x)

2 ∃Elim¬∀xα(x)→ Intro∃x¬α(x)→ ¬∀xα(x)

It looks like we are close to getting a contradiction. The easiest rule to apply isthe ∀Elim, which has no eigenvariable conditions. Since we can use any termwe want to replace the universally quantified x, it makes the most sense tocontinue using a so we can reach a contradiction.

[∃x¬α(x)]1

[¬α(a)]2[∀xα(x)]3

∀Elimα(a)

⊥Intro⊥3 ¬Intro¬∀xα(x)

2 ∃Elim¬∀xα(x)→ Intro∃x¬α(x)→ ¬∀xα(x)

It is important, especially when dealing with quantifiers, to double checkat this point that the eigenvariable condition has not been violated. Since theonly rule we applied that is subject to the eigenvariable condition was ∃Elim,and the eigenvariable a does not occur in any assumptions it depends on, thisis a correct derivation.

Example 4.3.4. Sometimes we may derive a wff from other wffs. In these cases,we may have undischarged assumptions. It is important to keep track of ourassumptions as well as the end goal.

Let’s see how we’d give a derivation of the wff ∃x γ(x, b) from the assump-tions ∃x (α(x) ∧ β(x)) and ∀x (β(x)→ γ(x, b). Starting as usual, we write theend-wff at the bottom.

∃x γ(x, b)

We have two premises to work with. To use the first, i.e., try to find aderivation of ∃x γ(x, b) from ∃x (α(x) ∧ β(x)) we would use the ∃Elim rule.Since it has an eigenvariable condition, we will apply that rule first. We getthe following:

∃x (α(x) ∧ β(x))

[α(a) ∧ β(a)]1

∃x γ(x, b)1 ∃Elim∃x γ(x, b)



The two assumptions we are working with share β. It may be useful at thispoint to apply ∧Elim to separate out β(a).

∃x (α(x) ∧ β(x))

[α(a) ∧ β(a)]1∧Elim

β(a)


The second assumption we have to work with is ∀x (β(x) → γ(x, b). Sincethere is no eigenvariable condition we can instantiate x with the constant sym-bol a using ∀Elim to get β(a)→ γ(a, b). We now have β(a) and β(a)→ γ(a, b).Our next move should be a straightforward application of the → Elim rule.

∃x (α(x) ∧ β(x))


β(a)

∀x (β(x)→ γ(x, b))∀Elim

β(a)→ γ(a, b)→ Elim

γ(a, b)


We are so close! One application of ∃Intro and we have reached our goal.

∃x (α(x) ∧ β(x))


β(a)

∀x (β(x)→ γ(x, b))∀Elim

β(a)→ γ(a, b)→ Elim

γ(a, b)∃Intro∃x γ(x, b)

1 ∃Elim∃x γ(x, b)

Since we ensured at each step that the eigenvariable conditions were not vio-lated, we can be confident that this is a correct derivation.

Example 4.3.5. Give a derivation of the wff ¬∀xα(x) from the assumptions∀xα(x)→ ∃y β(y) and ¬∃y β(y). Starting as usual, we write the target wff atthe bottom.

¬∀xα(x)

The last line of the derivation is a negation, so let’s try using ¬Intro. This willrequire that we figure out how to derive a contradiction.

[∀xα(x)]1




So far so good. We can use ∀Elim but it’s not obvious if that will help usget to our goal. Instead, let’s use one of our assumptions. ∀xα(x) → ∃y β(y)together with ∀xα(x) will allow us to use the → Elim rule.

[∀xα(x)]1 ∀xα(x)→ ∃y β(y)→ Elim∃y β(y)


We now have one final assumption to work with, and it looks like this will helpus reach a contradiction by using ⊥Intro.

[∀xα(x)]1 ∀xα(x)→ ∃y β(y)→ Elim∃y β(y) ¬∃y β(y)

⊥Intro⊥1 ¬Intro¬∀xα(x)

Example 4.3.6. Give a derivation of the wff α(x) ∨ ¬α(x).

α(x) ∨ ¬α(x)

The main connective of the wff is a disjunction. Since we have no assump-tions to work from, we can’t use ∨Intro. Since we don’t want any undischargedassumptions in our proof, our best bet is to use ¬Intro with the assumption¬(α(x) ∨ ¬α(x)). This will allow us to discharge the assumption at the end.

[¬(α(x) ∨ ¬α(x))]1

⊥1 ¬Intro¬¬(α(x) ∨ ¬α(x))

¬Elimα(x) ∨ ¬α(x)

Note that a straightforward application of the ¬Intro rule leaves us with twonegations. We can remove them with the ¬Elim rule.

We appear to be stuck again, since the assumption we introduced has anegation as the main operator. So let’s try to derive another contradiction!Let’s assume α(x) for another ¬Intro. From there we can derive α(x) ∨ ¬α(x)


4.4. PROOF-THEORETIC NOTIONS 59

and get our first contradiction.

[¬(α(x) ∨ ¬α(x))]1[α(x)]2

∨Introα(x) ∨ ¬α(x)

⊥Intro⊥2 ¬Intro¬α(x)

⊥1 ¬Intro¬¬(α(x) ∨ ¬α(x))

¬Elimα(x) ∨ ¬α(x)

Our second assumption is now discharged. We only need to derive one morecontradiction in order to discharge our first assumption. Now we have some-thing to work with—¬α(x). We can use the same strategy as last time (∨Intro)to derive a contradiction with our first assumption.

[¬(α(x) ∨ ¬α(x))]1[α(x)]2

∨Introα(x) ∨ ¬α(x)

⊥Intro⊥2 ¬Intro¬α(x)

∨Introα(x) ∨ ¬α(x) [¬(α(x) ∨ ¬α(x))]1

1 ¬Intro¬¬(α(x) ∨ ¬α(x))¬Elim

α(x) ∨ ¬α(x)

And the proof is done!

4.4 Proof-Theoretic Notions

Just as we’ve defined a number of important semantic notions (validity, en-tailment, satisfiabilty), we now define corresponding proof-theoretic notions.These are not defined by appeal to satisfaction of sentences in structures, butby appeal to the derivability or non-derivability of certain wffs. It was an im-portant discovery, due to Godel, that these notions coincide. That they do isthe content of the completeness theorem.

Definition 44A (Derivability) A wff α is derivable from a set of wffs Γ ,Γ ` α, if there is a derivation with end-wff α and in which every assumptionis either discharged or is in Γ . If α is not derivable from Γ we write Γ 0 α.

Definition 44B (Theorems) A wff α is a theorem if there is a derivation ofα from the empty set, i.e., a derivation with end-wff α in which all assumptionsare discharged. We write ` α if α is a theorem and 0 α if it is not.

Definition 44C (Consistency) A set of sentences Γ is consistent iff Γ 0 ⊥.If Γ is not consistent, i.e., if Γ ` ⊥, we say it is inconsistent.

Proposition 44D Γ ` α iff Γ ∪ ¬α is inconsistent.



Proof. Exercise.

Proposition 44E Γ is inconsistent iff Γ ` α for every sentence α.

Proof. Exercise.

Proposition 44F If Γ ` α iff for some finite Γ0 ⊆ Γ , Γ0 ` α.

Proof. Any derivation of α from Γ can only contain finitely many undischargedassumtions. If all these undischarged assumptions are in Γ , then the set of themis a finite subset of Γ . The other direction is trivial, since a derivation from asubset of Γ is also a derivation from Γ .

4.5 Properties of Derivability

We will now establish a number of properties of the derivability relation. Theyare independently interesting, but each will play a role in the proof of thecompleteness theorem.

Proposition 45A (Monotony) If Γ ⊆ ∆ and Γ ` α, then ∆ ` α.

Proof. Any derivation of α from Γ is also a derivation of α from ∆.

Proposition 45B If Γ ` α and Γ ∪ α ` ⊥, then Γ is inconsistent.

Proof. Let the derivation of α from Γ be δ1 and the derivation of ⊥ fromΓ ∪ α be δ2. We can then derive:

δ1

α

[α]1

δ2

⊥1 ¬Intro¬α ¬Elim⊥

In the new derivation, the assumption α is discharged, so it is a derivation fromΓ .

Proposition 45C If Γ ∪ α ` ⊥, then Γ ` ¬α.

Proof. Suppose that Γ ∪α ` ⊥. Then there is a derivation of ⊥ from Γ ∪α.Let δ be the derivation of ⊥, and consider

[α]1

δ

⊥1 ¬Intro¬α


4.5. PROPERTIES OF DERIVABILITY 61

Proposition 45D If Γ ∪ α ` ⊥ and Γ ∪ ¬α ` ⊥, then Γ ` ⊥.

Proof. There are derivations δ1 and δ2 of ⊥ from Γ∪, α and ⊥ from Γ∪¬α,respectively. We can then derive

[α]1

δ1

⊥1 ¬Intro¬α

[¬α]2

δ2

⊥2 ¬Intro¬¬α ¬Elim⊥

Since the assumptions α and ¬α are discharged, this is a derivation from Γalone. Hence Γ ` ⊥.

Proposition 45E If Γ ∪ α ` ⊥ and Γ ∪ β ` ⊥, then Γ ∪ α ∨ β ` ⊥.

Proof. Exercise.

Proposition 45F If Γ ` α or Γ ` β, then Γ ` α ∨ β.

Proof. Suppose Γ ` α. There is a derivation δ of α from Γ . We can derive

δ

α ∨Introα ∨ β

Therefore Γ ` α ∨ β. The proof for when Γ ` β is similar.

Proposition 45G If Γ ` α ∧ β then Γ ` α and Γ ` β.

Proof. If Γ ` α ∧ β, there is a derivation δ of α ∧ β from Γ . Consider

δ

α ∧ β∧Elimα

Hence, Γ ` α. A similar derivation shows that Γ ` β.

Proposition 45H If Γ ` α and Γ ` β, then Γ ` α ∧ β.

Proof. If Γ ` α as well as Γ ` β, there are derivations δ1 of α and δ2 of β fromΓ . Consider

δ1

α

δ2

β∧Intro

α ∧ βThe undischarged assumptions of the new derivation are all in Γ , so we haveΓ ` α ∧ β.



Proposition 45I If Γ ` α and Γ ` α→ β, then Γ ` β.

Proof. Suppose that Γ ` α and Γ ` α→ β. There are derivations δ1 of α fromΓ and δ2 of α→ β from Γ . Consider:

δ1

α

δ2

α→ β→ Elim

β

This means that Γ ` β.

Proposition 45J If Γ ` ¬α or Γ ` β, then Γ ` α→ β.

Proof. First suppose Γ ` ¬α. Then there is a derivation of ¬α from γ. Thefollowing derivation shows that Γ ` α→ β:

δ0

¬α

[α]1¬Elim⊥ ⊥Elim

β1 → Intro

α→ β

Now suppose Γ ` β. Then there is a derivation δ of β from Γ . The followingderivation shows that Γ ` α→ β:

δ

β [α]1∧Intro

α ∧ β∧Elim


Theorem 45K If c is a constant not occurring in Γ or α(x) and Γ ` α(c),then Γ ` ∀xα(c).

Proof. Let δ be an derivation of α(c) from Γ . By adding a ∀Intro inference, weobtain a proof of ∀xα(x). Since c does not occur in Γ or α(x), the eigenvariablecondition is satisfied.

Theorem 45L If Γ ` ∀xα(x) then Γ ` α(t).

Proof. Suppose Γ ` ∀xα(x). Then there is a derivation δ of ∀xα(x) from Γ .The derivation

δ

∀xα(x)∀Elimαt


4.6. SOUNDNESS 63

shows that Γ ` α(t).

4.6 Soundness

A derivation system, such as natural deduction, is sound if it cannot derivethings that do not actually follow. Soundness is thus a kind of guaranteedsafety property for derivation systems. Depending on which proof theoreticproperty is in question, we would like to know for instance, that

1. every derivable sentence is valid;

2. if a sentence is derivable from some others, it is also a consequence ofthem;

3. if a set of sentences is inconsistent, it is unsatisfiable.

These are important properties of a derivation system. If any of them donot hold, the derivation system is deficient—it would derive too much. Con-sequently, establishing the soundness of a derivation system is of the utmostimportance.

Theorem 46A (Soundness) If α is derivable from the undischarged assump-tions Γ , then Γ α.

Proof. Inductive Hypothesis: The premises of an inference rule follow from theundischarged assumptions of the subproofs ending in those premises.

Inductive Step: Show that α follows from the undischarged assumptions ofthe entire proof.

Let δ be a derivation of α. We proceed by induction on the number ofinferences in δ.

If the number of inferences is 0, then δ consists only of an initial wff. Everyinitial wff α is an undischarged assumption, and as such, any structure A thatsatisfies all of the undischarged assumptions of the proof also satisfies α.

If the number of inferences is greater than 0, we distinguish cases accordingto the type of the lowermost inference. By induction hypothesis, we can assumethat the premises of that inference follow from the undischarged assumptionsof the sub-derivations ending in those premises, respectively.

First, we consider the possible inferences with only one premise.

1. Suppose that the last inference is ¬Intro: By inductive hypothesis, ⊥ fol-lows from the undischarged assumptions Γ ∪α. Consider a structure A.We need to show that, if |=A Γ , then |=A ¬α. Suppose for reductio that|=A Γ , but 6|=A ¬α, i.e., |=A α. This would mean that |=A Γ ∪ α. Thisis contrary to our inductive hypothesis. So, |=A ¬α.

2. The last inference is ¬Elim: Exercise.



3. The last inference is ∧Elim: There are two variants: α or β may beinferred from the premise α ∧ β. Consider the first case. By inductivehypothesis, α∧β follows from the undischarged assumptions Γ . Considera structure A. We need to show that, if |=A Γ , then |=A α. By ourinductive hypothesis, we know that |=A α∧β. So, |=A α. The case whereβ is inferred from α ∧ β is handled similarly.

4. The last inference is ∨Intro: There are two variants: α∨β may be inferredfrom the premise α or the premise β. Consider the first case. By inductivehypothesis, α follows from the undischarged assumptions Γ . Consider astructure A. We need to show that, if |=A Γ , then |=A α∨β. Since |=A Γ ,it must be the case that |=A α, by inductive hypothesis. So it must alsobe the case that |=A α ∨ β. The case where α ∨ β is inferred from β ishandled similarly.

5. The last inference is → Intro: α → β is inferred from a subproof withassumption α and conclusion β. By inductive hypothesis, β follows fromthe undischarged assumptions Γ and α. Consider a structure A. Weneed to show that, if Γ α → β. For reductio, suppose that for somestructure A, |=A Γ but 6|=A α → β. So, |=A α and 6|=A β. But byhypothesis, β is a consequence of Γ ∪ α. So, |=A α→ β.

6. The last inference is ∀Intro: The premise α(a) is a consequence of theundischarged assumptions Γ by induction hypothesis. Consider somestructure, A, such that |=A Γ . Let M′ be exactly like A except thataA 6= aA

′. We must have |=A′ α(a).

We now show that |=A ∀xα(x). Since ∀xα(x) is a sentence, this meanswe have to show that for every variable assignment s, |=A α(x)[s]. SinceΓ consists entirely of sentences, |=A β[s] for all β ∈ Γ . Let M′ be like Aexcept that aA

′= s(x). Then |=A α(x)[s] iff |=A′ α(a) (as α(x) does not

contain a). Since a also does not occur in Γ , |=A′ Γ . Since Γ α(a),|=A′ α(a). This means that |=A α(x)[s]. Since s is an arbitrary variableassignment, |=A ∀xα(x).

7. The last inference is ∃Intro: Exercise.

8. The last inference is ∀Elim: Exercise.

Now let’s consider the possible inferences with several premises: ∨Elim,∧Intro, → Elim, and ∃Elim.

1. The last inference is ∧Intro. α∧β is inferred from the premises α and β.By induction hypothesis, α follows from the undischarged assumptions Γand β follows from the undischarged assumptions ∆. We have to showthat Γ ∪∆ α∧β. Consider a structure A with |=A Γ ∪∆. Since |=A Γ ,it must be the case that |=A α, and since |=A ∆, |=A β, by inductivehypothesis. Together, |=A α ∧ β.


4.7. DERIVATIONS WITH EQUALITY SYMBOL 65

2. The last inference is ∨Elim: Exercise.

3. The last inference is → Elim. β is inferred from the premises α → βand α. By induction hypothesis, α → β follows from the undischargedassumptions Γ and α follows from the undischarged assumptions ∆. Con-sider a structure A. We need to show that, if |=A Γ ∪∆, then |=A β. Itmust be the case that |=A α → β, and |=A α, by inductive hypothesis.Thus it must be the case that |=A β.

4. The last inference is ∃Elim: Exercise.

Corollary 4.6.2 If ` α, then α is valid.

Corollary 4.6.3 If Γ is satisfiable, then it is consistent.

Proof. We prove the contrapositive. Suppose that Γ is not consistent. ThenΓ ` ⊥, i.e., there is a derivation of ⊥ from undischarged assumptions in Γ . ByTheorem 46A, any structure A that satisfies Γ must satisfy ⊥. Since 6|=A ⊥ forevery structure A, no A can satisfy Γ , i.e., Γ is not satisfiable.

4.7 Derivations with Equality symbol

Derivations with the equality symbol require additional inference rules.

Rules for =:

= Introt = t

t1 = t2 α(t1)= Elim

α(t2)and

t1 = t2 α(t2)= Elim

α(t1)

where t1 and t2 are closed terms. The = Intro rule allows us to derive anyidentity statement of the form t = t outright, from no assumptions.

Example 4.7.1. If s and t are closed terms, then α(s), s = t ` α(t):

α(s) s = t= Elim

α(t)

This may be familiar as the “principle of substitutability of identicals,” orLeibniz’ Law.

Example 4.7.2. We derive the sentence

∀x∀y ((α(x) ∧ α(y))→ x = y)

from the sentence

∃x∀y (α(y)→ y = x)

We develop the derivation backwards:



∃x∀y (α(y)→ y = x) [α(a) ∧ α(b)]1

a = b1 → Intro

((α(a) ∧ α(b))→ a = b)∀Intro∀y ((α(a) ∧ α(y))→ a = y)∀Intro∀x∀y ((α(x) ∧ α(y))→ x = y)

We’ll now have to use the main assumption: since it is an existential wff, weuse ∃Elim to derive the intermediary conclusion a = b.

∃x ∀y (α(y)→ y = x)

[∀y (α(y)→ y = c)]2

[α(a) ∧ α(b)]1

a = b2 ∃Elim

a = b1 → Intro

((α(a) ∧ α(b))→ a = b)∀Intro∀y ((α(a) ∧ α(y))→ a = y)∀Intro∀x ∀y ((α(x) ∧ α(y))→ x = y)

The sub-derivation on the top right is completed by using its assumptionsto show that a = c and b = c. This requies two separate derivations. Thederivation for a = c is as follows:

[∀y (α(y)→ y = c)]2∀Elim

α(a)→ a = c

[α(a) ∧ α(b)]1∧Elim

α(a)→ Elima = c

From a = c and b = c we derive a = b by = Elim.

4.8 Soundness of Equality symbol Rules

Proposition 48A Natural deduction with rules for identity is sound.

Proof. Any wff of the form t = t is valid, since for every structure A, |=A t = t.(Note that we assume the term t to be ground, i.e., it contains no variables, sovariable assignments are irrelevant).

Suppose the last inference in a derivation is = Elim. Then the premisesare t1 = t2 and α(t1); they are derived from undischarged assumptions Γ and∆, respectively. We want to show that α(s) follows from Γ ∪ ∆. Considera structure A with |=A Γ ∪ ∆. By induction hypothesis, A satisfies the twopremises by induction hypothesis. So, |=A t1 = t2. Therefore, t1

A = t2A. Let

s be any variable assignment, and s′ be the x-variant given by s′(x) = t1A =

t2A. By Proposition 112C, |=A α(t2)[s] iff |=A α(x)[s′] iff |=A α(t1)[s]. Since|=A α(t1) therefore |=A α(t2).


Chapter 5

The Completeness Theorem

5.1 Introduction

The completeness theorem is one of the most fundamental results about logic.It comes in two formulations, the equivalence of which we’ll prove. In itsfirst formulation it says something fundamental about the relationship betweensemantic consequence and our proof system: if a sentence α follows from somesentences Γ , then there is also a derivation that establishes Γ ` α. Thus, theproof system is as strong as it can possibly be without proving things thatdon’t actually follow. In its second formulation, it can be stated as a modelexistence result: every consistent set of sentences is satisfiable.

These aren’t the only reasons the completeness theorem—or rather, itsproof—is important. It has a number of important consequences, some ofwhich we’ll discuss separately. For instance, since any derivation that showsΓ ` α is finite and so can only use finitely many of the sentences in Γ , it followsby the completeness theorem that if α is a consequence of Γ , it is already aconsequence of a finite subset of Γ . This is called compactness. Equivalently,if every finite subset of Γ is consistent, then Γ itself must be consistent. Italso follows from the proof of the completeness theorem that any satisfiableset of sentences has a finite or denumerable model. This result is called theLowenheim-Skolem theorem.

5.2 Outline of the Proof

The proof of the completeness theorem is a bit complex, and upon first readingit, it is easy to get lost. So let us outline the proof. The first step is a shiftof perspective, that allows us to see a route to a proof. When completeness isthought of as “whenever Γ α then Γ ` α,” it may be hard to even come upwith an idea: for to show that Γ ` α we have to find a derivation, and it doesnot look like the hypothesis that Γ α helps us for this in any way. For someproof systems it is possible to directly construct a derivation, but we will takea slightly different tack. The shift in perspective required is this: completeness

67


68 CHAPTER 5. THE COMPLETENESS THEOREM

can also be formulated as: “if Γ is consistent, it has a model.” Perhaps we canuse the information in Γ together with the hypothesis that it is consistent toconstruct a model. After all, we know what kind of model we are looking for:one that is as Γ describes it!

If Γ contains only atomic sentences, it is easy to construct a model for it:for atomic sentences are all of the form Pa1 . . . an where the ai are constantsymbols. So all we have to do is come up with a domain |A| and an interpreta-tion for P so that |=A Pa1 . . . an. But nothing’s easier than that: put |A| = N,cAi = i, and for every Pa1 . . . an ∈ Γ , put the tuple 〈k1, . . . , kn〉 into PA, whereki is the index of the constant symbol ai (i.e., ai ≡ cki).

Now suppose Γ contains some sentence ¬β, with β atomic. We might worrythat the construction of A interferes with the possibility of making ¬β true.But here’s where the consistency of Γ comes in: if ¬β ∈ Γ , then β /∈ Γ , or elseΓ would be inconsistent. And if β /∈ Γ , then according to our constructionof A, 6|=A β, so |=A ¬β. So far so good.

Now what if Γ contains complex, non-atomic formulas? Say, it containsα∧β. Then we should proceed as if both α and β were in Γ . And if α∨β ∈ Γ ,then we will have to make at least one of them true, i.e., proceed as if one ofthem was in Γ .

This suggests the following idea: we add additional sentences to Γ so as to(a) keep the resulting set consistent and (b) make sure that for every possibleatomic sentence α, either α is in the resulting set, or ¬α, and (c) such that,whenever α∧β is in the set, so are both α and β, if α∨β is in the set, at leastone of α or β is also, etc. We keep doing this (potentially forever). Call theset of all sentences so added Γ ∗. Then our construction above would provideus with a structure for which we could prove, by induction, that all sentencesin Γ ∗ are true in A, and hence also all sentence in Γ since Γ ⊆ Γ ∗.

There is one wrinkle in this plan: if ∃xα(x) ∈ Γ we would hope to be ableto pick some constant symbol c and add α(c) in this process. But how do weknow we can always do that? Perhaps we only have a few constant symbolsin our language, and for each one of them we have ¬β(c) ∈ Γ . We can’t alsoadd β(c), since this would make the set inconsistent, and we wouldn’t knowwhether A has to make β(c) or ¬β(c) true. Moreover, it might happen that Γcontains only sentences in a language that has no constant symbols at all (e.g.,the language of set theory).

The solution to this problem is to simply add infinitely many constants atthe beginning, plus sentences that connect them with the quantifiers in the rightway. (Of course, we have to verify that this cannot introduce an inconsistency.)

Our original construction works well if we only have constant symbols in theatomic sentences. But the language might also contain function symbols. Inthat case, it might be tricky to find the right functions on N to assign to thesefunction symbols to make everything work. So here’s another trick: insteadof using i to interpret ci, just take the set of constant symbols itself as thedomain. Then A can assign every constant symbol to itself: cAi = ci. But whynot go all the way: let |A| be all terms of the language! If we do this, there is anobvious assignment of functions (that take terms as arguments and have terms


5.3. MAXIMALLY CONSISTENT SETS OF SENTENCES 69

as values) to function symbols: we assign to the function symbol fni the functionwhich, given n terms t1, . . . , tn as input, produces the term fni (t1, . . . , tn) asvalue.

The last piece of the puzzle is what to do with =. The predicate symbol =has a fixed interpretation: |=A t = t′ iff tA = t′

A. Now if we set things up so

that the value of a term t is t itself, then this structure will make no sentenceof the form t = t′ true unless t and t′ are one and the same term. And ofcourse this is a problem, since basically every interesting theory in a languagewith function symbols will have as theorems sentences t = t′ where t and t′

are not the same term (e.g., in theories of arithmetic: (0 + 0) = 0). To solvethis problem, we change the domain of A: instead of using terms as the objectsin |A|, we use sets of terms, and each set is so that it contains all those termswhich the sentences in Γ require to be equal. So, e.g., if Γ is a theory ofarithmetic, one of these sets will contain: 0, (0 + 0), (0× 0), etc. This will bethe set we assign to 0, and it will turn out that this set is also the value of allthe terms in it, e.g., also of (0 + 0). Therefore, the sentence (0 + 0) = 0 will betrue in this revised structure.

5.3 Maximally Consistent Sets of Sentences

Definition 53A (Maximally consistent set) A set Γ of sentences ismaximally consistent iff

1. Γ is consistent, and

2. if Γ ( Γ ′, then Γ ′ is inconsistent.

An alternate definition equivalent to the above is: a set Γ of sentences ismaximally consistent iff

1. Γ is consistent, and

2. If Γ ∪ α is consistent, then α ∈ Γ .

In other words, one cannot add sentences not already in Γ to a maximallyconsistent set Γ without making the resulting larger set inconsistent.

Maximally consistent sets are important in the completeness proof sincewe can guarantee that every consistent set of sentences Γ is contained in amaximally consistent set Γ ∗, and a maximally consistent set contains, for eachsentence α, either α or its negation ¬α. This is true in particular for atomicsentences, so from a maximally consistent set in a language suitably expandedby constant symbols, we can construct a structure where the interpretation ofpredicate symbols is defined according to which atomic sentences are in Γ ∗.This structure can then be shown to make all sentences in Γ ∗ (and hence alsoin Γ ) true. The proof of this latter fact requires that ¬α ∈ Γ ∗ iff α /∈ Γ ∗,(α ∨ β) ∈ Γ ∗ iff α ∈ Γ ∗ or β ∈ Γ ∗, etc.

Proposition 53B Suppose Γ is maximally consistent. Then:



1. If Γ ` α, then α ∈ Γ .

2. For any α, either α ∈ Γ or ¬α ∈ Γ .

3. (α→ β) ∈ Γ iff either α /∈ Γ or β ∈ Γ .

Proof. Let us suppose for all of the following that Γ is maximally consistent.

1. If Γ ` α, then α ∈ Γ .

Suppose that Γ ` α. Suppose to the contrary that α /∈ Γ : then sinceΓ is maximally consistent, Γ ∪ α is inconsistent, hence Γ ∪ α ` ⊥.By Propositions 34B and 45B, Γ is inconsistent. This contradicts theassumption that Γ is consistent. Hence, it cannot be the case that α /∈ Γ ,so α ∈ Γ .

2. For any α, either α ∈ Γ or ¬α ∈ Γ .

Suppose to the contrary that for some α both α /∈ Γ and ¬α /∈ Γ . SinceΓ is maximally consistent, Γ ∪ α and Γ ∪ ¬α are both inconsistent,so Γ ∪ α ` ⊥ and Γ ∪ ¬α ` ⊥. By Propositions 34D and 45D, Γ isinconsistent, a contradiction. Hence there cannot be such a sentence αand, for every α, α ∈ Γ or ¬α ∈ Γ .

3. (α→ β) ∈ Γ iff either α /∈ Γ or β ∈ Γ :

For the forward direction, let (α→ β) ∈ Γ , and suppose to the contrarythat α ∈ Γ and β /∈ Γ . On these assumptions, Γ ` α→ β and Γ ` α. ByPropositions 34E and 45I, Γ ` β. But then by (1), β ∈ Γ , contradictingthe assumption that β /∈ Γ .

For the reverse direction, first consider the case where α /∈ Γ . By (2),¬α ∈ Γ and hence Γ ` ¬α. By Propositions 34F and 45J, Γ ` α → β.Again by (1), we get that (α→ β) ∈ Γ , as required.

Now consider the case where β ∈ Γ . Then Γ ` β and by Propositions 34Fand 45J, Γ ` α→ β. By (1), (α→ β) ∈ Γ .

5.4 Henkin Expansion

Part of the challenge in proving the completeness theorem is that the modelwe construct from a maximally consistent set Γ must make all the quantifiedwffs in Γ true. In order to guarantee this, we use a trick due to Leon Henkin.In essence, the trick consists in expanding the language by infinitely manyconstants and adding, for each wff with one free variable α(x) a formula ofthe form ¬∀xα → ¬α(c), where c is one of the new constant symbols. Whenwe construct the structure satisfying Γ , this will guarantee that each falseuniversal sentence has a counterexample among the new constants.


5.5. LINDENBAUM’S LEMMA 71

Lemma 54A If Γ is consistent in L and L′ is obtained from L by addinga denumerable set of new constant symbols d1, d2, . . . , then Γ is consistentin L′.Definition 54B (Saturated set) A set Γ of wffs of a language L is satu-rated if and only if for each wff α ∈ Frm(L) and variable x there is a constantsymbol c such that ¬∀xα→ ¬α(c) ∈ Γ .

The following definition will be used in the proof of the next theorem.

Definition 54C Let L′ be as in Lemma 54A. Fix an enumeration 〈α1, x1〉,〈α2, x2〉, . . . of all wff-variable pairs of L′. We define the sentences δn by recur-sion on n. Assuming that δ1, . . . , δn have already been defined, let cn+1 be thefirst new constant symbol among the di that does not occur in δ1, . . . , δn, andlet δn+1 be the wff ¬∀xn+1 αn+1(xn+1)→ ¬αn+1(cn+1). This includes the casewhere n = 0 and the list of previous δi’s is empty, i.e., δ1 is ¬∀x1 α1 → ¬α1(c1).

Theorem 54D Every consistent set Γ can be extended to a saturated con-sistent set Γ ′.

Proof. Given a consistent set of sentences Γ in a language L, expand the lan-guage by adding a denumerable set of new constant symbols to form L′. By theprevious Lemma, Γ is still consistent in the richer language. Further, let δi beas in the previous definition: then Γ ∪δ1, δ2, . . . is saturated by construction.Let

Γ0 = Γ

Γn+1 = Γn ∪ δn+1

i.e., Γn = Γ ∪ δ1, . . . , δn, and let Γ ′ =⋃n Γn. To show that Γ ′ is consistent

it suffices to show, by induction on n, that each set Γn is consistent.The induction basis is simply the claim that Γ0 = Γ is consistent, which is

the hypothesis of the theorem. For the induction step, suppose that Γn−1 is con-sistent but Γn = Γn−1 ∪ δn is inconsistent. Recall that δn is ¬∀xn αn(xn)→¬αn(cn). where α(x) is a wff of L′ with only the variable xn free and notcontaining any constant symbols ci where i ≥ n.

If Γn−1 ∪ δn is inconsistent, then Γn−1 ` ¬δn, and hence both of thefollowing hold:

Γn−1 ` ¬∀xn αn(xn) Γn−1 ` αn(cn)

Here cn does not occur in Γn−1 or αn(xn) (remember, it was added onlywith δn). By Theorems 34G and 45K, from Γ ` αn(cn), we obtain Γ `∀xn αn(xn). Thus we have that both Γn−1 ` ¬∀xn αn(xn) and Γn−1 ` ∀xn αn(xn),so Γ itself is inconsistent. Contradiction: Γn−1 was supposed to be consistent.Hence Γn ∪ δn is consistent.

5.5 Lindenbaum’s Lemma

We now prove a lemma that shows that any consistent set of sentences iscontained in some set of sentences which is not just consistent, but maximally



so, and moreover, is saturated. The proof works by first extending the set to asaturated set, and then adding one sentence at a time, guaranteeing at each stepthat the set remains consistent. The union of all stages in that constructionthen contains, for each sentence α, either it or its negation ¬α, is saturated,and is also consistent.

Lemma 55A (Lindenbaum’s Lemma) Every consistent set Γ can be ex-tended to a maximally consistent saturated set Γ ∗.

Proof. Let Γ be consistent, and let Γ ′ be as in the proof of Theorem 54D: weproved there that Γ ∪Γ ′ is a consistent saturated set in the richer language L′(with the denumerable set of new constants). Let α0, α1, . . . be an enumerationof all the wffs of L′. Define Γ0 = Γ ∪ Γ ′, and

Γn+1 =

Γn ∪ αn if Γn ∪ αn is consistent;

Γn ∪ ¬αn otherwise.

Let Γ ∗ =⋃n≥0 Γn. Since Γ ′ ⊆ Γ ∗, for each wff α, Γ ∗ contains a wff of the

form ¬∀xα→ ¬α(c) and thus is saturated.Each Γn is consistent: Γ0 is consistent by definition. If Γn+1 = Γn ∪ α,

this is because the latter is consistent. If it isn’t, Γn+1 = Γn∪¬α, which mustbe consistent. If it weren’t, i.e., both Γn ∪α and Γn ∪¬α are inconsistent,then Γn ` ¬α and Γn ` α, so Γn would be inconsistent contrary to inductionhypothesis.

Every wff of Frm(L′) appears on the list used to define Γ ∗. If αn /∈ Γ ∗,then that is because Γn ∪ αn was inconsistent. But that means that Γ ∗ ismaximally consistent.

5.6 Construction of a Model

We will begin by showing how to construct a structure which satisfies a maxi-mally consistent, saturated set of sentences in a language L without =.

Definition 56A (Term model) Let Γ ∗ be a maximally consistent, satu-rated set of sentences in a language L. The term model A(Γ ∗) of Γ ∗ is thestructure defined as follows:

1. The domain |A(Γ ∗)| is the set of all closed terms of L.

2. The interpretation of a constant symbol c is c itself: cA(Γ∗) = c.

3. The function symbol f is assigned the function which, given as argumentsthe closed terms t1, . . . , tn, has as value the closed term f(t1, . . . , tn):

fA(Γ∗)(t1, . . . , tn) = f(t1, . . . , tn)

4. IfR is an n-place predicate symbol, then 〈t1, . . . , tn〉 ∈ RA(Γ∗) iffRt1 . . . tn ∈Γ ∗.


5.7. IDENTITY 73

Lemma 56B (Truth Lemma) Suppose α does not contain =. Then |=A(Γ∗)

α iff α ∈ Γ ∗.

Proof. We prove both directions simultaneously, and by induction on α.

1. α ≡ R(t1, . . . , tn): |=A(Γ∗) Rt1 . . . tn iff 〈t1, . . . , tn〉 ∈ RA(Γ∗) (by the def-inition of satisfaction) iff R(t1, . . . , tn) ∈ Γ ∗ (the construction of A(Γ ∗).

2. α ≡ ¬β: |=A(Γ∗) α iff 6|=A(Γ∗) β (by definition of satisfaction). Byinduction hypothesis, 6|=A(Γ∗) β iff β /∈ Γ ∗. By Proposition 53B(2),¬β ∈ Γ ∗ if β /∈ Γ ∗; and ¬β /∈ Γ ∗ if β ∈ Γ ∗ since Γ ∗ is consistent.

3. α ≡ β → γ: |=A(Γ∗) α iff 6|=A(Γ∗ β or |=A(Γ∗) γ (by definition ofsatisfaction) iff β /∈ Γ ∗ or γ ∈ Γ ∗ (by induction hypothesis). This is thecase iff (β → γ) ∈ Γ ∗ (by Proposition 53B(3)).

4. α ≡ ∀xβ(x): Suppose that |=A(Γ∗) α, then for every variable assignments, |=A(Γ∗) β(x)[s]. Suppose to the contrary that ∀xβ(x) /∈ Γ ∗: Thenby Proposition 53B(2), ¬∀xβ(x) ∈ Γ ∗. By saturation, (¬∀xβ(x) →¬β(c)) ∈ Γ ∗ for some c, so by Proposition 53B(1), ¬β(c) ∈ Γ ∗. SinceΓ ∗ is consistent, β(c) /∈ Γ ∗. By induction hypothesis, 6|=A(Γ∗) β(c).Therefore, if s′ is the variable assignment such that s(x) = c, then 6|=A(Γ∗)

β(x)[s′], contradicting the earlier result that |=A(Γ∗) β(x)[s] for all s.Thus, we have α ∈ Γ ∗.Conversely, suppose that ∀xβ(x) ∈ Γ ∗. By Theorems 34H and 45Ltogether with Proposition 53B(1), β(t) ∈ Γ ∗ for every term t ∈ |A(Γ ∗)|.By inductive hypothesis, |=A(Γ∗) β(t) for every term t ∈ |A(Γ ∗)|. Let sbe the variable assigment with s(x) = t. Then |=A(Γ∗) β(x)[s] for anysuch s, hence |=A(Γ∗) α.

5.7 Identity

The construction of the term model given in the preceding section is enoughto establish completeness for first-order logic for sets Γ that do not contain =.The term model satisfies every α ∈ Γ ∗ which does not contain = (and henceall α ∈ Γ ). It does not work, however, if = is present. The reason is that Γ ∗

then may contain a sentence t = t′, but in the term model the value of anyterm is that term itself. Hence, if t and t′ are different terms, their values inthe term model—i.e., t and t′, respectively—are different, and so t = t′ is false.We can fix this, however, using a construction known as “factoring.”

Definition 57A Let Γ ∗ be a maximally consistent set of sentences in L. Wedefine the relation ≈ on the set of closed terms of L by

t ≈ t′ iff t = t′ ∈ Γ ∗

Proposition 57B The relation ≈ has the following properties:



1. ≈ is reflexive.

2. ≈ is symmetric.

3. ≈ is transitive.

4. If t ≈ t′, f is a function symbol, and t1, . . . , ti−1, ti+1, . . . , tn are terms,then

ft1 . . . ti−1tti+1 . . . tn ≈ ft1 . . . ti−1t′ti+1 . . . tn.

5. If t ≈ t′, R is a predicate symbol, and t1, . . . , ti−1, ti+1, . . . , tn are terms,then

Rt1 . . . ti−1tti+1 . . . tn ∈ Γ ∗ iff

Rt1 . . . ti−1t′ti+1 . . . tn ∈ Γ ∗.

Proof. Since Γ ∗ is maximally consistent, t = t′ ∈ Γ ∗ iff Γ ∗ ` t = t′. Thus it isenough to show the following:

1. Γ ∗ ` t = t for all terms t.

2. If Γ ∗ ` t = t′ then Γ ∗ ` t′ = t.

3. If Γ ∗ ` t = t′ and Γ ∗ ` t′ = t′′, then Γ ∗ ` t = t′′.

4. If Γ ∗ ` t = t′, then

Γ ∗ ` ft1 . . . ti−1tti+1 . . . tn = ft1 . . . ti−1t′ti+1 . . . tn

for every n-place function symbol f and terms t1, . . . , ti−1, ti+1, . . . , tn.

5. If Γ ∗ ` t = t′ and Γ ∗ ` Rt1 . . . ti−1tti+1 . . . tn, then Γ ∗ ` Rt1 . . . ti−1t′ti+1 . . . tnfor every n-place predicate symbol R and terms t1, . . . , ti−1, ti+1, . . . , tn.

Definition 57C Suppose Γ ∗ is a maximally consistent set in a language L,t is a term, and ≈ as in the previous definition. Then:

[t]≈ = t′ : t′ ∈ Trm(L), t ≈ t′

and Trm(L)/≈ = [t]≈ : t ∈ Trm(L).Definition 57D Let A = A(Γ ∗) be the term model for Γ ∗. Then A/≈ is thefollowing structure:

1. |A/≈| = Trm(L)/≈.

2. cA/≈ = [c]≈

3. fA/≈([t1]≈, . . . , [tn]≈) = [ft1 . . . tn]≈


5.7. IDENTITY 75

4. 〈[t1]≈, . . . , [tn]≈〉 ∈ RA/≈ iff |=A Rt1 . . . tn.

Note that we have defined fA/≈ and RA/≈ for elements of Trm(L)/≈ byreferring to them as [t]≈, i.e., via representatives t ∈ [t]≈. We have to makesure that these definitions do not depend on the choice of these representatives,i.e., that for some other choices t′ which determine the same equivalence classes([t]≈ = [t′]≈), the definitions yield the same result. For instance, if R is a one-place predicate symbol, the last clause of the definition says that [t]≈ ∈ RA/≈

iff |=A Rt. If for some other term t′ with t ≈ t′, 6|=A Rt, then the definitionwould require [t′]≈ /∈ RA/≈ . If t ≈ t′, then [t]≈ = [t′]≈, but we can’t have both[t]≈ ∈ RA/≈ and [t]≈ /∈ RA/≈ . However, Proposition 57B guarantees that thiscannot happen.

Proposition 57E A/≈ is well defined, i.e., if t1, . . . , tn, t′1, . . . , t′n are terms,and ti ≈ t′i then

1. [ft1 . . . tn]≈ = [ft′1 . . . t′n]≈, i.e.,

ft1 . . . tn ≈ ft′1 . . . t′n

and

2. |=A Rt1 . . . tn iff |=A Rt′1 . . . t′n, i.e.,

Rt1 . . . tn ∈ Γ ∗ iff Rt′1 . . . t′n ∈ Γ ∗.

Proof. Follows from Proposition 57B by induction on n.

Lemma 57F |=A/≈ α iff α ∈ Γ ∗ for all sentences α.

Proof. By induction on α, just as in the proof of Lemma 56B. The only casethat needs additional attention is when α ≡ t = t′.

|=A/≈ t = t′ iff [t]≈ = [t′]≈ (by definition of M/≈)

iff t ≈ t′ (by definition of [t]≈)

iff t = t′ ∈ Γ ∗ (by definition of ≈).

Note that while M(∗) is always enumerable and infinite, M/≈ may befinite, since it may turn out that there are only finitely many classes [t]≈. Thisis to be expected, since Γ may contain sentences which require any structurein which they are true to be finite. For instance, ∀x∀y x = y is a consistentsentence, but is satisfied only in structures with a domain that contains exactlyone element.



5.8 The Completeness Theorem

Let’s combine our results: we arrive at the Godel’s completeness theorem.

Theorem 58A (Completeness Theorem) Let Γ be a set of sentences. IfΓ is consistent, it is satisfiable.

Proof. Suppose Γ is consistent. By Lemma 55A, there is a Γ ∗ ⊇ Γ whichis maximally consistent and saturated. If Γ does not contain =, then byLemma 56B, |=A(Γ∗) α iff α ∈ Γ ∗. From this it follows in particular thatfor all α ∈ Γ , |=A(Γ∗) α, so Γ is satisfiable. If Γ does contain =, then byLemma 57F, |=A/≈ α iff α ∈ Γ ∗ for all sentences α. In particular, |=A/≈ α forall α ∈ Γ , so Γ is satisfiable.

Corollary 5.8.2 (Completeness Theorem, Second Version) For allΓ and α sentences: if Γ α then Γ ` α.

Proof. Note that the Γ ’s in corollary 5.8.2 and Theorem 58A are universallyquantified. To make sure we do not confuse ourselves, let us restate Theo-rem 58A using a different variable: for any set of sentences ∆, if ∆ is con-sistent, it is satisfiable. By contraposition, if ∆ is not satisfiable, then ∆ isinconsistent. We will use this to prove the corollary.

Suppose that Γ α. Then Γ ∪ ¬α is unsatisfiable by Proposition 113E.Taking Γ ∪ ¬α as our ∆, the previous version of Theorem 58A gives us thatΓ ∪ ¬α is inconsistent. By Propositions 33D and 44D, Γ ` α.

5.9 The Compactness Theorem

One important consequence of the completeness theorem is the compactnesstheorem. The compactness theorem states that if each finite subset of a setof sentences is satisfiable, the entire set is satisfiable—even if the set itself isinfinite. This is far from obvious. There is nothing that seems to rule out,at first glance at least, the possibility of there being infinite sets of sentenceswhich are contradictory, but the contradiction only arises, so to speak, fromthe infinite number. The compactness theorem says that such a scenario canbe ruled out: there are no unsatisfiable infinite sets of sentences each finitesubset of which is satisfiable. Like the copmpleteness theorem, it has a versionrelated to entailment: if an infinite set of sentences entails something, alreadya finite subset does.

Definition 59A A set Γ of wffs is finitely satisfiable if and only if every finiteΓ0 ⊆ Γ is satisfiable.

Theorem 59B (Compactness Theorem) The following hold for any sen-tences Γ and α:

1. Γ α iff there is a finite Γ0 ⊆ Γ such that Γ0 α.

2. Γ is satisfiable if and only if it is finitely satisfiable.


5.9. THE COMPACTNESS THEOREM 77

Proof. We prove (2). If Γ is satisfiable, then there is a structure A such that|=A α for all α ∈ Γ . Of course, this A also satisfies every finite subset of Γ , soΓ is finitely satisfiable.

Now suppose that Γ is finitely satisfiable. Then every finite subset Γ0 ⊆ Γis satisfiable. By soundness, every finite subset is consistent. Then Γ itself mustbe consistent. For assume it is not, i.e., Γ ` ⊥. But derivations are finite, andso already some finite subset Γ0 ⊆ Γ must be inconsistent (cf. Propositions 33Fand 44F). But we just showed they are all consistent, a contradiction. Now bycompleteness, since Γ is consistent, it is satisfiable.

Example 5.9.3. In every model A of a theory Γ , each term t of course picksout an element of |A|. Can we guarantee that it is also true that every elementof |A| is picked out by some term or other? In other words, are there theories Γall models of which are covered? The compactness theorem shows that this isnot the case if Γ has infinite models. Here’s how to see this: Let A be aninfinite model of Γ , and let c be a constant symbol not in the language of Γ .Let ∆ be the set of all sentences c 6= t for t a term in the language L of Γ , i.e.,

∆ = c 6= t : t ∈ Trm(L).

A finite subset of Γ ∪∆ can be written as Γ ′ ∪∆′, with Γ ′ ⊆ Γ and ∆′ ⊆ ∆.Since ∆′ is finite, it can contain only finitely many terms. Let a ∈ |A| bean element of |A| not picked out by any of them, and let M′ be the structurethat is just like A, but also cA

′= a. Since a 6= tA for all t occuring in ∆′,

|=A′ ∆′. Since |=A Γ , Γ ′ ⊆ Γ , and c does not occur in Γ , also |=A′ Γ

′.Together, |=A′ Γ

′ ∪∆′ for every finite subset Γ ′ ∪∆′ of Γ ∪∆. So every finitesubset of Γ ∪∆ is satisfiable. By compactness, Γ ∪∆ itself is satisfiable. Sothere are models |=A Γ ∪∆. Every such A is a model of Γ , but is not covered,since cA 6= tA for all terms t of L.

Example 5.9.4. Consider a language L containing the predicate symbol <,constant symbols 0, 1, and function symbols +, ×, −, ÷. Let Γ be the setof all sentences in this language true in Q with domain Q and the obviousinterpretations. Γ is the set of all sentences of L true about the rationalnumbers. Of course, in Q (and even in R), there are no numbers which aregreater than 0 but less than 1/k for all k ∈ Z+. Such a number, if it existed,would be an infinitesimal: non-zero, but infinitely small. The compactnesstheorem shows that there are models of Γ in which infinitesimals exist: Let ∆be 0 < c∪c < (1÷k) : k ∈ Z+ (where k = (1 + (1 + · · ·+ (1 + 1) . . . )) withk 1’s). For any finite subset ∆0 of ∆ there is a K such that all the sentencesc < k in ∆0 have k < K. If we expand Q to Q′ with cQ

′= 1/K we have

that |=Q′ Γ ∪∆0, and so Γ ∪∆ is finitely satisfiable (Exercise: prove this indetail). By compactness, Γ ∪∆ is satisfiable. Any model S of Γ ∪∆ containsan infinitesimal, namely cS.

Example 5.9.5. We know that first-order logic with equality symbol can expressthat the size of the domain must have some minimal size: The sentence α≥n



(which says “there are at least n distinct objects”) is true only in structureswhere |A| has at least n objects. So if we take

∆ = α≥n : n ≥ 1

then any model of ∆ must be infinite. Thus, we can guarantee that a theoryonly has infinite models by adding ∆ to it: the models of Γ ∪ ∆ are all andonly the infinite models of Γ .

So first-order logic can express infinitude. The compactness theorem showsthat it cannot express finitude, however. For suppose some set of sentences Λwere satisfied in all and only finite structures. Then ∆∪Λ is finitely satisfiable.Why? Suppose ∆′∪Λ′ ⊆ ∆∪Λ is finite with ∆′ ⊆ ∆ and Λ′ ⊆ Λ. Let n be thelargest number such that A≥n ∈ ∆′. Λ, being satisfied in all finite structures,has a model A with finitely many but ≥ n elements. But then |=A ∆′ ∪Λ′. Bycompactness, ∆ ∪ Λ has an infinite model, contradicting the assumption thatΛ is satisfied only in finite structures.

5.10 The Lowenheim-Skolem Theorem

The Lowenheim-Skolem Theorem says that if a theory has an infinite model,then it also has a model that is at most denumerable. An immediate consequeneof this fact is that first-order logic cannot express that the size of a structure isnon-enumerable: any sentence or set of sentences satisfied in all non-enumerablestructures is also satisfied in some denumerable structure.

Theorem 510A If Γ is consistent then it has a denumerable model, i.e.,it is satisfiable in a structure whose domain is either finite or infinite butenumerable.

Proof. If Γ is consistent, the structure A delivered by the proof of the com-pleteness theorem has a domain |A| whose cardinality is bounded by that ofthe set of the terms of the language L. So A is at most denumerable.

Theorem 510B If Γ is consistent set of sentences in the language of first-order logic without identity, then it has a denumerable model, i.e., it is satisfi-able in a structure whose domain is infinite and enumerable.

Proof. If Γ is consistent and contains no sentences in which identity appears,then the structure A delivered by the proof of the completness theorem has adomain |A| whose cardinality is identical to that of the set of the terms of thelanguage L. So A is denumerably infinite.

Example 5.10.3 (Skolem’s Paradox). Zermelo-Fraenkel set theory ZFC is avery powerful framework in which practically all mathematical statements canbe expressed, including facts about the sizes of sets. So for instance, ZFC canprove that the set R of real numbers is non-enumerable, it can prove Cantor’sTheorem that the power set of any set is larger than the set itself, etc. IfZFC is consistent, its models are all infinite, and moreover, they all contain


5.10. THE LOWENHEIM-SKOLEM THEOREM 79

elements about which the theory says that they are non-enumerable, such asthe element that makes true the theorem of ZFC that the power set of thenatural numbers exists. By the Lowenheim-Skolem Theorem, ZFC also hasenumerable models—models that contain “non-enumerable” sets but whichthemselves are enumerable.


Chapter 6

Beyond First-order Logic

6.1 Overview

First-order logic is not the only system of logic of interest: there are manyextensions and variations of first-order logic. A logic typically consists of theformal specification of a language, usually, but not always, a deductive system,and usually, but not always, an intended semantics. But the technical use ofthe term raises an obvious question: what do logics that are not first-orderlogic have to do with the word “logic,” used in the intuitive or philosophicalsense? All of the systems described below are designed to model reasoning ofsome form or another; can we say what makes them logical?

No easy answers are forthcoming. The word “logic” is used in differentways and in different contexts, and the notion, like that of “truth,” has beenanalyzed from numerous philosophical stances. For example, one might takethe goal of logical reasoning to be the determination of which statements arenecessarily true, true a priori, true independent of the interpretation of thenonlogical terms, true by virtue of their form, or true by linguistic convention;and each of these conceptions requires a good deal of clarification. Even ifone restricts one’s attention to the kind of logic used in mathematics, there islittle agreement as to its scope. For example, in the Principia Mathematica,Russell and Whitehead tried to develop mathematics on the basis of logic,in the logicist tradition begun by Frege. Their system of logic was a formof higher-type logic similar to the one described below. In the end they wereforced to introduce axioms which, by most standards, do not seem purely logical(notably, the axiom of infinity, and the axiom of reducibility), but one mightnonetheless hold that some forms of higher-order reasoning should be acceptedas logical. In contrast, Quine, whose ontology does not admit “propositions”as legitimate objects of discourse, argues that second-order and higher-orderlogic are really manifestations of set theory in sheep’s clothing; in other words,systems involving quantification over predicates are not purely logical.

For now, it is best to leave such philosophical issues for a rainy day, andsimply think of the systems below as formal idealizations of various kinds of

81


82 CHAPTER 6. BEYOND FIRST-ORDER LOGIC

reasoning, logical or otherwise.

6.2 Many-Sorted Logic

In first-order logic, variables and quantifiers range over a single domain. Butit is often useful to have multiple (disjoint) domains: for example, you mightwant to have a domain of numbers, a domain of geometric objects, a domainof functions from numbers to numbers, a domain of abelian groups, and so on.

Many-sorted logic provides this kind of framework. One starts with a listof “sorts”—the “sort” of an object indicates the “domain” it is supposed toinhabit. One then has variables and quantifiers for each sort, and (usually) anequality symbol for each sort. Functions and relations are also “typed” by thesorts of objects they can take as arguments. Otherwise, one keeps the usualrules of first-order logic, with versions of the quantifier-rules repeated for eachsort.

For example, to study international relations we might choose a languagewith two sorts of objects, French citizens and German citizens. We might havea unary relation, “drinks wine,” for objects of the first sort; another unaryrelation, “eats wurst,” for objects of the second sort; and a binary relation,“forms a multinational married couple,” which takes two arguments, wherethe first argument is of the first sort and the second argument is of the secondsort. If we use variables a, b, c to range over French citizens and x, y, z torange over German citizens, then

∀a∀x[(MarriedToax→ (DrinksWinea ∨ ¬EatsWurstx)]]

asserts that if any French person is married to a German, either the Frenchperson drinks wine or the German doesn’t eat wurst.

Many-sorted logic can be embedded in first-order logic in a natural way,by lumping all the objects of the many-sorted domains together into one first-order domain, using unary predicate symbols to keep track of the sorts, andrelativizing quantifiers. For example, the first-order language correspondingto the example above would have unary predicate symbolss “German” and“French,” in addition to the other relations described, with the sort require-ments erased. A sorted quantifier ∀xα, where x is a variable of the Germansort, translates to

∀x (Germanx→ α).

We need to add axioms that insure that the sorts are separate—e.g., ∀x¬(Germanx∧Frenchx)—as well as axioms that guarantee that “drinks wine” only holds ofobjects satisfying the predicate Frenchx, etc. With these conventions andaxioms, it is not difficult to show that many-sorted sentences translate to first-order sentences, and many-sorted derivations translate to first-order deriva-tions. Also, many-sorted structures “translate” to corresponding first-orderstructures and vice-versa, so we also have a completeness theorem for many-sorted logic.


6.3. SECOND-ORDER LOGIC 83

6.3 Second-Order logic

The language of second-order logic allows one to quantify not just over a domainof individuals, but over relations on that domain as well. Given a first-orderlanguage L, for each k one adds variables R which range over k-ary relations,and allows quantification over those variables. If R is a variable for a k-aryrelation, and t1, . . . , tk are ordinary (first-order) terms, Rt1 . . . tk is an atomicwff. Otherwise, the set of wffs is defined just as in the case of first-order logic,with additional clauses for second-order quantification. Note that we only havethe equality symbol for first-order terms: if R and S are relation variables ofthe same arity k, we can define R = S to be an abbreviation for

∀x1 . . . ∀xk (Rx1 . . . xk ↔ Sx1 . . . xk).

The rules for second-order logic simply extend the quantifier rules to thenew second order variables. Here, however, one has to be a little bit carefulto explain how these variables interact with the predicate symbols of L, andwith wffs of L more generally. At the bare minimum, relation variables countas terms, so one has inferences of the form

α(R) ` ∃Rα(R)

But if L is the language of arithmetic with a constant relation symbol <, onewould also expect the following inference to be valid:

x < y ` ∃RRxy

or for a given wff α,

α(x1, . . . , xk) ` ∃RRx1 . . . xk

More generally, we might want to allow inferences of the form

α[λ~x. β(~x)/R] ` ∃Rα

where α[λ~x. β(~x)/R] denotes the result of replacing every atomic wff of theform Rt1, . . . , tk in α by β(t1, . . . , tk). This last rule is equivalent to having acomprehension schema, i.e., an axiom of the form

∃R ∀x1, . . . , xk (α(x1, . . . , xk)↔ Rx1 . . . xk),

one for each wff α in the second-order language, in which R is not a freevariable. (Exercise: show that if R is allowed to occur in α, this schema isinconsistent!)

When logicians refer to the “axioms of second-order logic” they usuallymean the minimal extension of first-order logic by second-order quantifier rulestogether with the comprehension schema. But it is often interesting to studyweaker subsystems of these axioms and rules. For example, note that in itsfull generality the axiom schema of comprehension is impredicative: it allows



one to assert the existence of a relation Rx1 . . . xk that is “defined” by a wffwith second-order quantifiers; and these quantifiers range over the set of allsuch relations—a set which includes R itself! Around the turn of the twentiethcentury, a common reaction to Russell’s paradox was to lay the blame on suchdefinitions, and to avoid them in developing the foundations of mathematics.If one prohibits the use of second-order quantifiers in the wff α, one has apredicative form of comprehension, which is somewhat weaker.

From the semantic point of view, one can think of a second-order structureas consisting of a first-order structure for the language, coupled with a set ofrelations on the domain over which the second-order quantifiers range (moreprecisely, for each k there is a set of relations of arity k). Of course, if compre-hension is included in the proof system, then we have the added requirementthat there are enough relations in the “second-order part” to satisfy the com-prehension axioms—otherwise the proof system is not sound! One easy wayto insure that there are enough relations around is to take the second-orderpart to consist of all the relations on the first-order part. Such a structure iscalled full, and, in a sense, is really the “intended structure” for the language.If we restrict our attention to full structures we have what is known as thefull second-order semantics. In that case, specifying a structure boils downto specifying the first-order part, since the contents of the second-order partfollow from that implicitly.

To summarize, there is some ambiguity when talking about second-orderlogic. In terms of the proof system, one might have in mind either

1. A “minimal” second-order proof system, together with some comprehen-sion axioms.

2. The “standard” second-order proof system, with full comprehension.

In terms of the semantics, one might be interested in either

1. The “weak” semantics, where a structure consists of a first-order part,together with a second-order part big enough to satisfy the comprehensionaxioms.

2. The “standard” second-order semantics, in which one considers full struc-tures only.

When logicians do not specify the proof system or the semantics they have inmind, they are usually refering to the second item on each list. The advantageto using this semantics is that, as we will see, it gives us categorical descriptionsof many natural mathematical structures; at the same time, the proof systemis quite strong, and sound for this semantics. The drawback is that the proofsystem is not complete for the semantics; in fact, no effectively given proofsystem is complete for the full second-order semantics. On the other hand, wewill see that the proof system is complete for the weakened semantics; thisimplies that if a sentence is not provable, then there is some structure, notnecessarily the full one, in which it is false.


6.3. SECOND-ORDER LOGIC 85

The language of second-order logic is quite rich. One can identify unaryrelations with subsets of the domain, and so in particular you can quantify overthese sets; for example, one can express induction for the natural numbers witha single axiom

∀R ((R0 ∧ ∀x (Rx→ Rx′))→ ∀xRx).

If one takes the language of arithmetic to have symbols 0, ′,+,× and <, onecan add the following axioms to describe their behavior:

1. ∀x¬x′ = 0

2. ∀x ∀y (s(x) = s(y)→ x = y)

3. ∀x (x+ 0) = x

4. ∀x∀y (x+ y′) = (x+ y)′

5. ∀x (x× 0) = 0

6. ∀x∀y (x× y′) = ((x× y) + x)

7. ∀x∀y (x < y ↔ ∃z y = (x+ z′))

It is not difficult to show that these axioms, together with the axiom of induc-tion above, provide a categorical description of the structure B, the standardmodel of arithmetic, provided we are using the full second-order semantics.Given any structure A in which these axioms are true, define a function f fromN to the domain of A using ordinary recursion on N, so that f(0) = 0A andf(x+ 1) = ′A(f(x)). Using ordinary induction on N and the fact that axioms(1) and (2) hold in A, we see that f is injective. To see that f is surjective, letP be the set of elements of |A| that are in the range of f . Since A is full, P is inthe second-order domain. By the construction of f , we know that 0A is in P ,and that P is closed under ′A. The fact that the induction axiom holds in A (inparticular, for P ) guarantees that P is equal to the entire first-order domainof A. This shows that f is a bijection. Showing that f is a homomorphism isno more difficult, using ordinary induction on N repeatedly.

In set-theoretic terms, a function is just a special kind of relation; for ex-ample, a unary function f can be identified with a binary relation R satisfying∀x∃y R(x, y). As a result, one can quantify over functions too. Using the fullsemantics, one can then define the class of infinite structures to be the class ofstructures A for which there is an injective function from the domain of A toa proper subset of itself:

∃f (∀x∀y (f(x) = f(y)→ x = y) ∧ ∃y ∀x f(x) 6= y).

The negation of this sentence then defines the class of finite structures.In addition, one can define the class of well-orderings, by adding the follow-

ing to the definition of a linear ordering:

∀P (∃xPx→ ∃x (Px ∧ ∀y (y < x→ ¬Py))).



This asserts that every non-empty set has a least element, modulo the identifi-cation of “set” with “one-place relation”. For another example, one can expressthe notion of connectedness for graphs, by saying that there is no nontrivialseparation of the vertices into disconnected parts:

¬∃A (∃xA(x) ∧ ∃y ¬A(y) ∧ ∀w ∀z ((Aw ∧ ¬Az)→ ¬Rwz)).

For yet another example, you might try as an exercise to define the class offinite structures whose domain has even size. More strikingly, one can pro-vide a categorical description of the real numbers as a complete ordered fieldcontaining the rationals.

In short, second-order logic is much more expressive than first-order logic.That’s the good news; now for the bad. We have already mentioned that thereis no effective proof system that is complete for the full second-order semantics.For better or for worse, many of the properties of first-order logic are absent,including compactness and the Lowenheim-Skolem theorems.

On the other hand, if one is willing to give up the full second-order seman-tics in terms of the weaker one, then the minimal second-order proof systemis complete for this semantics. In other words, if we read ` as “proves inthe minimal system” and as “logically implies in the weaker semantics”, wecan show that whenever Γ α then Γ ` α. If one wants to include specificcomprehension axioms in the proof system, one has to restrict the semanticsto second-order structures that satisfy these axioms: for example, if ∆ con-sists of a set of comprehension axioms (possibly all of them), we have thatif Γ ∪ ∆ α, then Γ ∪ ∆ ` α. In particular, if α is not provable using thecomprehension axioms we are considering, then there is a model of ¬α in whichthese comprehension axioms nonetheless hold.

The easiest way to see that the completeness theorem holds for the weakersemantics is to think of second-order logic as a many-sorted logic, as follows.One sort is interpreted as the ordinary “first-order” domain, and then for eachk we have a domain of “relations of arity k.” We take the language to havebuilt-in relation symbols “truekRx1 . . . xk” which is meant to assert that Rholds of x1, . . . , xk, where R is a variable of the sort “k-ary relation” and x1,. . . , xk are objects of the first-order sort.

With this identification, the weak second-order semantics is essentially theusual semantics for many-sorted logic; and we have already observed thatmany-sorted logic can be embedded in first-order logic. Modulo the trans-lations back and forth, then, the weaker conception of second-order logic isreally a form of first-order logic in disguise, where the domain contains both“objects” and “relations” governed by the appropriate axioms.

6.4 Higher-Order logic

Passing from first-order logic to second-order logic enabled us to talk aboutsets of objects in the first-order domain, within the formal language. Why stopthere? For example, third-order logic should enable us to deal with sets of sets


6.4. HIGHER-ORDER LOGIC 87

of objects, or perhaps even sets which contain both objects and sets of objects.And fourth-order logic will let us talk about sets of objects of that kind. Asyou may have guessed, one can iterate this idea arbitrarily.

In practice, higher-order logic is often wffted in terms of functions insteadof relations. (Modulo the natural identifications, this difference is inessential.)Given some basic “sorts” A, B, C, . . . (which we will now call “types”), we cancreate new ones by stipulating

If σ and τ are finite types then so is σ → τ .

Think of types as syntactic “labels,” which classify the objects we want in ourdomain; σ → τ describes those objects that are functions which take objectsof type σ to objects of type τ . For example, we might want to have a typeΩ of truth values, “true” and “false,” and a type N of natural numbers. Inthat case, you can think of objects of type N → Ω as unary relations, orsubsets of N; objects of type N → N are functions from natural numers tonatural numbers; and objects of type (N→ N)→ N are “functionals,” that is,higher-type functions that take functions to numbers.

As in the case of second-order logic, one can think of higher-order logic as akind of many-sorted logic, where there is a sort for each type of object we wantto consider. But it is usually clearer just to define the syntax of higher-typelogic from the ground up. For example, we can define a set of finite typesinductively, as follows:

1. N is a finite type.

2. If σ and τ are finite types, then so is σ → τ .

3. If σ and τ are finite types, so is σ × τ .

Intuitively, N denotes the type of the natural numbers, σ → τ denotes thetype of functions from σ to τ , and σ × τ denotes the type of pairs of objects,one from σ and one from τ . We can then define a set of terms inductively, asfollows:

1. For each type σ, there is a stock of variables x, y, z, . . . of type σ

2. 0 is a term of type N

3. S (successor) is a term of type N→ N

4. If s is a term of type σ, and t is a term of type N → (σ → σ), then Rstis a term of type N→ σ

5. If s is a term of type τ → σ and t is a term of type τ , then s(t) is a termof type σ

6. If s is a term of type σ and x is a variable of type τ , then λx. s is a termof type τ → σ.



7. If s is a term of type σ and t is a term of type τ , then 〈s, t〉 is a term oftype σ × τ .

8. If s is a term of type σ × τ then p1(s) is a term of type σ and p2(s) is aterm of type τ .

Intuitively, Rst denotes the function defined recursively by

Rst(0) = s

Rst(x+ 1) = t(x,Rst(x)),

〈s, t〉 denotes the pair whose first component is s and whose second componentis t, and p1(s) and p2(s) denote the first and second elements (“projections”)of s. Finally, λx. s denotes the function f defined by

f(x) = s

for any x of type σ; so item (6) gives us a form of comprehension, enabling us todefine functions using terms. Wffs are built up from equality symbol statementss = t between terms of the same type, the usual propositional connectives, andhigher-type quantification. One can then take the axioms of the system to bethe basic equations governing the terms defined above, together with the usualrules of logic with quantifiers and equality symbol.

If one augments the finite type system with a type Ω of truth values, onehas to include axioms which govern its use as well. In fact, if one is clever, onecan get rid of complex wffs entirely, replacing them with terms of type Ω! Theproof system can then be modified accordingly. The result is essentially thesimple theory of types set forth by Alonzo Church in the 1930s.

As in the case of second-order logic, there are different versions of higher-type semantics that one might want to use. In the full version, variables oftype σ → τ range over the set of all functions from the objects of type σ toobjects of type τ . As you might expect, this semantics is too strong to admita complete, effective proof system. But one can consider a weaker semantics,in which a structure consists of sets of elements Tτ for each type τ , togetherwith appropriate operations for application, projection, etc. If the details arecarried out correctly, one can obtain completeness theorems for the kinds ofproof systems described above.

Higher-type logic is attractive because it provides a framework in whichwe can embed a good deal of mathematics in a natural way: starting withN, one can define real numbers, continuous functions, and so on. It is alsoparticularly attractive in the context of intuitionistic logic, since the typeshave clear “constructive” intepretations. In fact, one can develop constructiveversions of higher-type semantics (based on intuitionistic, rather than classicallogic) that clarify these constructive interpretations quite nicely, and are, inmany ways, more interesting than the classical counterparts.


6.5. INTUITIONISTIC LOGIC 89

6.5 Intuitionistic Logic

In constrast to second-order and higher-order logic, intuitionistic first-orderlogic represents a restriction of the classical version, intended to model a more“constructive” kind of reasoning. The following examples may serve to illus-trate some of the underlying motivations.

Suppose someone came up to you one day and announced that they haddetermined a natural number x, with the property that if x is prime, theRiemann hypothesis is true, and if x is composite, the Riemann hypothesis isfalse. Great news! Whether the Riemann hypothesis is true or not is one ofthe big open questions of mathematics, and here they seem to have reducedthe problem to one of calculation, that is, to the determination of whether aspecific number is prime or not.

What is the magic value of x? They describe it as follows: x is the naturalnumber that is equal to 7 if the Riemann hypothesis is true, and 9 otherwise.

Angrily, you demand your money back. From a classical point of view, thedescription above does in fact determine a unique value of x; but what youreally want is a value of x that is given explicitly.

To take another, perhaps less contrived example, consider the followingquestion. We know that it is possible to raise an irrational number to a rational

power, and get a rational result. For example,√

22

= 2. What is less clearis whether or not it is possible to raise an irrational number to an irrationalpower, and get a rational result. The following theorem answers this in theaffirmative:

Theorem 65A There are irrational numbers a and b such that ab is rational.

Proof. Consider√

2√2. If this is rational, we are done: we can let a = b =

√2.

Otherwise, it is irrational. Then we have

(√

2

√2)√2 =√

2

√2·√2

=√

22

= 2,

which is certainly rational. So, in this case, let a be√

2√2, and let b be

√2.

Does this constitute a valid proof? Most mathematicians feel that it does.But again, there is something a little bit unsatisfying here: we have provedthe existence of a pair of real numbers with a certain property, without beingable to say which pair of numbers it is. It is possible to prove the same result,but in such a way that the pair a, b is given in the proof: take a =

√3 and

b = log3 4. Then

ab =√

3log3 4

= 31/2·log3 4 = (3log3 4)1/2 = 41/2 = 2,

since 3log3 x = x.Intuitionistic logic is designed to model a kind of reasoning where moves

like the one in the first proof are disallowed. Proving the existence of an xsatisfying α(x) means that you have to give a specific x, and a proof that it



satisfies α, like in the second proof. Proving that α or β holds requires thatyou can prove one or the other.

Formally speaking, intuitionistic first-order logic is what you get if youomit restrict a proof system for first-order logic in a certain way. Similarly,there are intuitionistic versions of second-order or higher-order logic. From themathematical point of view, these are just formal deductive systems, but, asalready noted, they are intended to model a kind of mathematical reasoning.One can take this to be the kind of reasoning that is justified on a certainphilosophical view of mathematics (such as Brouwer’s intuitionism); one cantake it to be a kind of mathematical reasoning which is more “concrete” andsatisfying (along the lines of Bishop’s constructivism); and one can argue aboutwhether or not the formal description captures the informal motivation. Butwhatever philosophical positions we may hold, we can study intuitionistic logicas a formally presented logic; and for whatever reasons, many mathematicallogicians find it interesting to do so.

There is an informal constructive interpretation of the intuitionist connec-tives, usually known as the Brouwer-Heyting-Kolmogorov interpretation. Itruns as follows: a proof of α ∧ β consists of a proof of α paired with a proofof β; a proof of α ∨ β consists of either a proof of α, or a proof of β, where wehave explicit information as to which is the case; a proof of α→ β consists ofa procedure, which transforms a proof of α to a proof of β; a proof of ∀xα(x)consists of a procedure which returns a proof of α(x) for any value of x; and aproof of ∃xα(x) consists of a value of x, together with a proof that this valuesatisfies α. One can describe the interpretation in computational terms knownas the “Curry-Howard isomorphism” or the “wffs-as-types paradigm”: thinkof a wff as specifying a certain kind of data type, and proofs as computationalobjects of these data types that enable us to see that the corresponding wff istrue.

Intuitionistic logic is often thought of as being classical logic “minus” thelaw of the excluded middle. This following theorem makes this more precise.

Theorem 65B Intuitionistically, the following axiom schemata are equiva-lent:

1. (α→ ⊥)→ ¬α.

2. α ∨ ¬α

3. ¬¬α→ α

Obtaining instances of one schema from either of the others is a good exer-cise in intuitionistic logic.

The first deductive systems for intuitionistic propositional logic, put forthas formalizations of Brouwer’s intuitionism, are due, independently, to Kol-mogorov, Glivenko, and Heyting. The first formalization of intuitionistic first-order logic (and parts of intuitionist mathematics) is due to Heyting. Thougha number of classically valid schemata are not intuitionistically valid, manyare.


6.5. INTUITIONISTIC LOGIC 91

The double-negation translation describes an important relationship be-tween classical and intuitionist logic. It is defined inductively follows (think ofαN as the “intuitionist” translation of the classical wff α):

αN ≡ ¬¬α for atomic wffs α

(α ∧ β)N ≡ (αN ∧ βN )

(α ∨ β)N ≡ ¬¬(αN ∨ βN )

(α→ β)N ≡ (αN → βN )

(∀xα)N ≡ ∀xαN

(∃xα)N ≡ ¬¬∃xαN

Kolmogorov and Glivenko had versions of this translation for propositionallogic; for predicate logic, it is due to Godel and Gentzen, independently. Wehave

Theorem 65C 1. α↔ αN is provable classically

2. If α is provable classically, then αN is provable intuitionistically.

We can now envision the following dialogue. Classical mathematician: “I’veproved α!” Intuitionist mathematician: “Your proof isn’t valid. What you’vereally proved is αN .” Classical mathematician: “Fine by me!” As far as theclassical mathematician is concerned, the intuitionist is just splitting hairs,since the two are equivalent. But the intuitionist insists there is a difference.

Note that the above translation concerns pure logic only; it does not addressthe question as to what the appropriate nonlogical axioms are for classical andintuitionistic mathematics, or what the relationship is between them. But thefollowing slight extension of the theorem above provides some useful informa-tion:

Theorem 65D If Γ proves α classically, ΓN proves αN intuitionistically.

In other words, if α is provable from some hypotheses classically, then αN

is provable from their double-negation translations.

To show that a sentence or propositional wff is intuitionistically valid, allyou have to do is provide a proof. But how can you show that it is not valid?For that purpose, we need a semantics that is sound, and preferrably complete.A semantics due to Kripke nicely fits the bill.

We can play the same game we did for classical logic: define the semantics,and prove soundness and completeness. It is worthwhile, however, to notethe following distinction. In the case of classical logic, the semantics was the“obvious” one, in a sense implicit in the meaning of the connectives. Thoughone can provide some intuitive motivation for Kripke semantics, the latter doesnot offer the same feeling of inevitability. In addition, the notion of a classicalstructure is a natural mathematical one, so we can either take the notion of astructure to be a tool for studying classical first-order logic, or take classicalfirst-order logic to be a tool for studying mathematical structures. In contrast,



Kripke structures can only be viewed as a logical construct; they don’t seemto have independent mathematical interest.

A Kripke structure for a propositional langauge consists of a partial orderMod(P ) with a least element, and an “monotone” assignment of propositionalvariables to the elements of Mod(P ). The intuition is that the elements ofMod(P ) represent “worlds,” or “states of knowledge”; an element p ≥ q repre-sents a “possible future state” of q; and the propositional variables assigned top are the propositions that are known to be true in state p. The forcing relationP, p α then extends this relationship to arbitrary wffs in the language; readP, p α as “α is true in state p.” The relationship is defined inductively, asfollows:

1. P, p pi iff pi is one of the propositional variables assigned to p.

2. P, p 1 ⊥.

3. P, p (α ∧ β) iff P, p α and P, p β.

4. P, p (α ∨ β) iff P, p α or P, p β.

5. P, p (α→ β) iff, whenever q ≥ p and P, q α, then P, q β.

It is a good exercise to try to show that ¬(p∧ q)→ (¬p∨¬q) is not intuitionis-tically valid, by cooking up a Kripke structure that provides a counterexample.

6.6 Modal Logics

Consider the following example of a conditional sentence:

If Jeremy is alone in that room, then he is drunk and naked anddancing on the chairs.

This is an example of a conditional assertion that may be materially truebut nonetheless misleading, since it seems to suggest that there is a strongerlink between the antecedent and conclusion other than simply that either theantecedent is false or the consequent true. That is, the wording suggests thatthe claim is not only true in this particular world (where it may be trivially true,because Jeremy is not alone in the room), but that, moreover, the conclusionwould have been true had the antecedent been true. In other words, one cantake the assertion to mean that the claim is true not just in this world, but inany “possible” world; or that it is necessarily true, as opposed to just true inthis particular world.

Modal logic was designed to make sense of this kind of necessity. Oneobtains modal propositional logic from ordinary propositional logic by addinga box operator; which is to say, if α is a wff, so is α. Intuitively, α assertsthat α is necessarily true, or true in any possible world. ♦α is usually taken tobe an abbreviation for ¬¬α, and can be read as asserting that α is possiblytrue. Of course, modality can be added to predicate logic as well.


6.7. OTHER LOGICS 93

Kripke structures can be used to provide a semantics for modal logic; infact, Kripke first designed this semantics with modal logic in mind. Rather thanrestricting to partial orders, more generally one has a set of “possible worlds,”P , and a binary “accessibility” relation Rxy between worlds. Intuitively, Rpqasserts that the world q is compatible with p; i.e., if we are “in” world p, wehave to entertain the possibility that the world could have been like q.

Modal logic is sometimes called an “intensional” logic, as opposed to an“extensional” one. The intended semantics for an extensional logic, like classi-cal logic, will only refer to a single world, the “actual” one; while the semanticsfor an “intensional” logic relies on a more elaborate ontology. In addition tostructureing necessity, one can use modality to structure other linguistic con-structions, reinterpreting and ♦ according to the application. For example:

1. In provability logic, α is read “α is provable” and ♦α is read “α isconsistent.”

2. In epistemic logic, one might read α as “I know α” or “I believe α.”

3. In temporal logic, one can read α as “α is always true” and ♦α as “αis sometimes true.”

One would like to augment logic with rules and axioms dealing with modal-ity. For example, the system S4 consists of the ordinary axioms and rules ofpropositional logic, together with the following axioms:

(α→ β)→ (α→ β)

α→ α

α→ α

as well as a rule, “from α conclude α.” S5 adds the following axiom:

♦α→ ♦α

Variations of these axioms may be suitable for different applications; for exam-ple, S5 is usually taken to characterize the notion of logical necessity. And thenice thing is that one can usually find a semantics for which the proof systemis sound and complete by restricting the accessibility relation in the Kripkestructures in natural ways. For example, S4 corresponds to the class of Kripkestructures in which the accessibility relation is reflexive and transitive. S5 cor-responds to the class of Kripke structures in which the accessibility relation isuniversal, which is to say that every world is accessible from every other; soα holds if and only if α holds in every world.

6.7 Other Logics

As you may have gathered by now, it is not hard to design a new logic. Youtoo can create your own a syntax, make up a deductive system, and fashion



a semantics to go with it. You might have to be a bit clever if you want theproof system to be complete for the semantics, and it might take some effort toconvince the world at large that your logic is truly interesting. But, in return,you can enjoy hours of good, clean fun, exploring your logic’s mathematicaland computational properties.

Recent decades have witnessed a veritable explosion of formal logics. Fuzzylogic is designed to model reasoning about vague properties. Probabilistic logicis designed to model reasoning about uncertainty. Default logics and nonmono-tonic logics are designed to model defeasible forms of reasoning, which is to say,“reasonable” inferences that can later be overturned in the face of new informa-tion. There are epistemic logics, designed to model reasoning about knowledge;causal logics, designed to model reasoning about causal relationships; and even“deontic” logics, which are designed to model reasoning about moral and ethi-cal obligations. Depending on whether the primary motivation for introducingthese systems is philosophical, mathematical, or computational, you may findsuch creatures studies under the rubric of mathematical logic, philosophicallogic, artificial intelligence, cognitive science, or elsewhere.

The list goes on and on, and the possibilities seem endless. We may neverattain Leibniz’ dream of reducing all of human reason to calculation—but thatcan’t stop us from trying.

contents - university of...

Documents