cs 350 all slides week 5-8 - university of aucklandcristian/mfcsdir/cris/...1.zig-zag across the...

CS 350 All slides week 5-8

André Nies

May 4, 2010

André Nies () CS 350 All slides week 5-8 May 4, 2010 1 / 31

Chapter 3: The Church-Turing thesis


The Plan

Section 3.1: Definition and Examples of Turing machines

Section 3.2: Variants of Turing machines

• several tapes• nondeterminism

Section 3.3: Clarify the concept of an algorithm, and discuss the thesis


Section 3.1: Definition and examples of Turingmachines

Turing machines (TMs) were introduced by Alan Turing in his 1936paper “On computable real numbers, with an application to theEntscheidungsproblem”


Comparing Turing machines with FA

A TM has the following abilities (compared to finite automaton)

• It can read and write on the tape.• The read/write head can move right, and also left.• The tape is one-way infinite.• The ACCEPT and REJECT states take effect immediately.

“TM = FA + unlimited external memory.”


TM are more powerful than FA

Example (page 139)There is a Turing machine deciding membership in the language

B = {w#w | w ∈ {0,1}∗}.

The language B is not regular. Thus, TMs are more powerful than FA.M1 = “on input w : ”

1. Zig-zag across the tape to check whether all correspondingposition on either side of # have the same symbol. Cross offsymbols that have been checked.

2. When all the symbols to the left of # have been crossed off, checkfor remaining symbols to the right of #.If any remain, REJECT

Otherwise, ACCEPT.


ExampleThere is a Turing machine with input alphabet {0,1} that exchanges 0and 1 (i.e., flips the input bits). For instance, 0110 is turned into 1001.

M1 = “on input w : ”

1. Read another symbol. If it is t blank ACCEPT.2. If the symbol is 0 replace it by 1; if the symbol is 1 replace it by 0.3. GOTO 1.

We can represent this TM by a diagram

Edge labels are of the form b → c,R or b → c,L where b, c are tapesymbols.


Formal Definition of a Turing machine

A Turing machine (TM) is a 7-tuple (Q,Σ, Γ, δ,q0,qacc ,qrej) such that

Q is the set of states;Σ is the input alphabet;Γ is the tape alphabet, where Σ ⊆ Γ and t ∈ Γ− Σ

δ : Q × γ → Q × Γ× {L,R}

is the transition function;q0,qacc ,qrej are the start/accepting/rejecting states, respectively,and qacc 6= qrej .

Exercise: write the TM from Example 2 as such a 7-tuple.


Definition of Computation

The Turing machine M on input w = w1,w2, . . . ,wn computes in theexpected way. Note the following:• The computation goes on till states qaccept or qreject have been

reached (maybe forever).

• If M attempts to move its head to the left when it is already on theleftmost square, instead the head stays where it is.


Configurations

A configuration of Turing machine M is the complete information at astage of an M-computation. It consists of:

• the current state• the current tape content• the current head position.

Example:

This configuration is denoted by the string 1011q701111.

A configuration is called accepting / rejecting ifits state is qaccept / qreject.


Computation

A computation of Turing machine M on input a string w is a sequence

C1, . . . ,Ck

of configurations for M such that

• C1 is the start configuration q0w• for i < k , Ci yields Ci+1 according to M• Ck is accepting or rejecting

M accepts string w if there is an accepting computation of M thatstarts with q0w .

The language recognized by M is

L(M) = {w ∈ Σ∗ : M accepts w}.

We say a language L is Turing recognizable if L = L(M) for some M.


Turing decidable languages

• A TM is called decider if it halts on each input (in one of the twohalting states, qacc or qrej ).

• We say a language L is Turing decidable if L = L(M) for somedecider M.

Clear: Turing decidable⇒ Turing recognizable.

The converse fails as we will see later.


Section 3.2: Variants of Turing machinesWe consider three variants of Turing machines:

• multi-tape TM• non-deterministic TM• enumerators

For the first two, we modify the definition of a transition function δ.Multi-tape TM: Let k be the number of tapes.

δ : Q × Γk → Q × Γk × {L,R,S}k .

The new letter S means that the head stays where it is.

Non-deterministic TM:

δ : Q × Γ→ P(Q × Γ× {L,R}).

P is the power set operator.


Simulation

• We say that Turing machines M,N are equivalent if L(M) = L(N),i.e., they recognize the same language.

• We will show that the multi-tape and non-deterministic models aren’treally more powerful than the basic model. Each of them isequivalent to a TM in the original sense. (However, they can befaster.)

• To show this, we will simulate the extended model by a Turingmachine in the original sense.


Simulating a multi-tape TM by a single tape TM

Theorem (3.16)Every multi-tape Turing machine M has an equivalent single tapeTuring machine S.

For instance, suppose that M has three tapes. We give an example ofan M-configuration and the corresponding S-configuration.

Going from M to S results in a quadratic slowdown.


Simulating non-deterministic by deterministic TM

Theorem (3.16)Every non-deterministic Turing machine N has an equivalentdeterministic Turing machine D.

We simulate N by a deterministic TM D with three tapes. (By 3.13 wecan later reduce this to one tape.)

• Input tape (read-only)

• simulation tape

• address tape.

D uses the address tape to search the computation tree of N oninput w in a breadth-first way for an accepting configuration.Exponential slowdown.


Enumerators

• An Enumerator is a TM E with an attached “printer” (a write-onlyoutput tape).

• while the computation of E goes on it can send strings to the printer.

• E starts with an empty input tape

• The language enumerated by E is the collection of strings that areprinted out (at some stage).


What enumerators can do

TheoremA language A is Turing recognizable by a machine M ⇐⇒

some enumerator E enumerates A.

⇐: Define a Turing machine M recognizing A.M = “ on input w ,

1. Run E . When E prints a string x , see whether x = w .

2. If so, ACCEPT”.

⇒: Now M is given. Let s1, s2, . . . list Σ∗ in lexicographical order.E = “Ignore the input.

1. Repeat for i = 1,2, . . .

2. Run i steps of M for each input s1, . . . , si

3. If a computation on input sk accepts, print sk .”


Section 3.3: Algorithms and the Church-Turing thesis• Intuitive notion of an algorithm: a sequence of simple instructions for

carrying out some task.Examples: division algorithm, Euclidean algorithm to find thegreatest common divisor of two numbers.

• Already the ancient Greeks had algorithms. The word “algorithm” isderived, however, from the name of the Persian-Arabian

mathematician Al-Kwarizmi (ca. 780-850).

• A mathematical definition corresponding to this intuition was onlygiven in the 1930s.


Hilbert’s 10th problemAt the 1900 ICM in Paris, David Hilbert presented a list of openproblems he said would determine the development of mathematics forthe whole coming century. The 10th problem was:

QuestionGive an algorithm to determine whether a multi-variable polynomialwith integer coefficients has a root (i.e., a solution) in the integers Z.

A term is a product of variables and constants from Z. A polynomial(with integer coefficients) is a sum of terms. To find a root means toassign integers to the variables so that the result is 0.For instance a possible input to the algorithm could be the polynomial

6x3yz2 + 3xy2 + x3 − 10.

Does it have a root? How about 6x3yz2 + 3xy2 − x3 − 10?


Solution to Hilbert’s 10th problem by Matiasevic, 1970

• In 1970, the 23 year-old Russian mathematician Yuri Matiasevicshowed that no algorithm exists for the task given by Hilbert.• For such a “negative” result, first a precise formal definition of

“algorithm” was needed. Hilbert didn’t have it yet, and probablydidn’t even imagine that this could be the solution to his problem.The definition was obtained in 1936 papers of Turing and Church• Davis, Putnam and Robinson had shown a bit earlier that no

algorithm exists to see whether an exponential polynomial has aroot. An example of such a polynomial 2xy2 − 3xz.• Matiasevic was a number theorist. He used his tricks to get rid of the

exponentials.


The 1936 papers of Turing and Church

• Turing introduced his machines. Church invented so-calledλ-calculus.

• They both used their system to propose a formal definition ofalgorithm

• The two notions were later shown to be equivalent!

This led to the ...


Church-Turing thesis

Intuitive notion of algorithm

equals

Turing machine algorithm.

This is not a mathematical statement, so it cannot be proved.

Yet, evidence for this thesis comes from the following:

• all the formal definitions of algorithms people have proposed yieldthe same class of computable functions

• all existing algorithms can indeed be carried out by Turingmachines.


Rephrasing Matiasevic’s result in our terminology

Let D = {p | p is a polynomial with integer coefficientsthat has a root in the integers}.

Matiasevic showed that D is not Turing decidable. Hence, by theChurch-Turing thesis, there is no algorithm to test membership in D.

However, D is Turing recognizable by the following machine.

M = “ on input a polynomial p with n variables:Search through all n-tuples x1, . . . , xn of integers (in somesystematic way). If you see p(x1, . . . , xn) = 0, ACCEPT. ”

Clearly D = L(M). So D is Turing recognizable.


Hilbert’s tenth problem for one variable is decidable

Let D1 = {p | p is a polynomial in one variable with integer coefficientsthat has a root in the integers}.

Then D1 is Turing decidable.

Proof: on input p, let

cmax = largest absolute value of a coefficient of p

c1 = coefficient of the highest order term

k = 1 + the highest exponent.

If x is a root of p then it lies between the values ±k cmaxc1

. So it sufficesto search for a root between these values.


Terminology for Turing machine descriptions

• Now that we believe the Church-Turing thesis, our focus movesaway from Turing machines and towards algorithms. We will mostlygive high-level descriptions of TMs.

• We need to be confident that these descriptions can, at least inprinciple, be compiled “downward” to obtain a formal TM descriptionwith all the detail.


Three levels of Turing machine descriptions

High level description: useEnglish to describe the algorithm;ignore implementation details.

Implementation level description:use English to describe how the TMmoves its heads, stores data, ....

Formal description (often via a diagram)specify set of states, transition function δ, ...


Conventions for Turing machine descriptions

• The input is always a string.• If we want to work with some other type of object S (such as a

graph, or a matrix), we have to encode it by a string.• 〈S〉 denotes that string, in some specified encoding of objects S.• For instance, if the object is an undirected graph G, we could take as〈G〉 its adjacency matrix, written as a string row-by-row;

• In a different encoding, we could let 〈G〉 be the adjacency list, i.e.,the list of vertices and edges (0,1,2)(0,1)(0,2).


Example of a high-level description of a TM

In the following we take the representation by adjacency lists.Let A = {〈G〉 | G is a connected undirected graph}.

We describe a Turing machine M for deciding A.

M = “ on input 〈G〉 (encoding of a graph G)

1. Select the first node of G and mark it.

2. REPEAT until no further nodes are marked:

3. For each node in G, mark it ifit is attached to a node that has already been marked.

4. Scan the nodes of G to see whether they have all been marked.If so, ACCEPT. Otherwise, REJECT.”


Chapter 4: Decidability


Overview

We investigate the power of algorithms to solve problems.

Some problems can be solved algorithmically, others cannot.

Why is unsolvability of a problem interesting? After all, we want tosolve it!

• If we know a problem is unsolvable, we have to simplify the problemso that it can be solved algorithmically;• Unsolvability makes us appreciate the inherent limits of computers,

and gain an extra perspective on the concept of computation.


Section 4.1: Decidable languages

L1 = {〈G〉 | G is a connected undirected graph}

is decidable by Example 3.23.

L2 = {x | x is the decimal representation of a prime number}

is decided by the following Turing machine.M = “ in input x (decimal presentation of a number)Let y = 2.WHILE y · y ≤ x :

see whether y divides x . If so REJECT

NEXT y .ACCEPT


Decidability problems about automata

We give some examples of decidable languages where the inputcontains (the encoding of a) finite automaton.

The acceptance problem for DFA:

ADFA = {〈B,w〉 | B is a DFA that accepts the input string w}.

Theorem (4.1)ADFA is a decidable language.

Note that B can be an arbitrary DFA. So this says more than the merefact that L(B) is decidable for a particular fixed DFA B.


High level description of a TM deciding the acceptanceproblem for DFAM = “ on input 〈B,w〉, where B is a DFA and w a string over thealphabet of B

1. Simulate B on w .2. If the simulation ends in an accepting state of B, ACCEPT.Otherwise, REJECT. ”

To give more detail. we first need to specify how we represent(encode) B by a string. We make this string a list of all the componentsof B = (Q,Σ, δ,q0,F ).Example: automaton over Σ = {0} to detect even input length.

〈B〉 = q0q1#0#q00q1q1010#q0#q0André Nies () CS 350 All slides week 5-8 May 4, 2010 34 / 31

The simulation of B by M

To simulate B, the machine M puts dots under the entry in 〈B〉representing the current state, and a dot under the entry representingthe symbol currently scanned:

It updates the tape according to B’s transition function δ.


Example of simulating a run of a DFA B

Let w = 000. The tape contents at various stages of the run of M on〈B,w〉 look like this (each line in the diagram corresponds to thesituation after a left-right sweep of M, followed by a right-left sweep).


Alphabet of M must be fixed

• A little problem with this simulation is that the number of simulatedstates/ alphabet symbols can be arbitrarily large depending on B.

• In contrast, M has to make do with a fixed work alphabet.

• Say the state set of some B is Q = {q1, . . . ,qn}.

• To get around this little problem, M represents qi by the string

q idec ,

where idec is the string representing i in decimal notation.

• So it only needs the symbols q,0, . . . ,9 to represent all the states.


Emptiness problem is also decidableGiven finite automaton A, we ask whether A accepts no string at all.

Theorem (4.4)EDFA = {〈A〉 | A is a DFA and L(A) = ∅} is decidable.

Proof. This is a path problem in the directed graph which is thediagram of A without the edge labels: can we never get from q0 to anaccepting state? So we use a marking algorithm similar to Example3.23 (test for connectedness).

T = “on input 〈A〉 where A is a DFA:1. Mark the start state of A2. Repeat until no new states get marked:

3. Mark any state with an ingoing transition from a state that isalready marked.

4. If no accepting state is marked ACCEPT. Otherwise, REJECT.”


Equivalence problem for DFATheorem (4.5)EqDFA = {〈A,B〉 | A,B are DFAs and L(A) = L(B)} is decidable.

Proof. By Theorem 1.25 etc. a Turing machine can construct a DFA Csuch that

L(C) = (L(A) ∩ L(B)) ∪ (L(A) ∩ L(B)).

Thus L(C) is the set of strings that are either in L(A) or in L(B). Then

L(A) = L(B) ⇐⇒ L(C) = ∅.

The following machine decides EqDFA :

“on input 〈A,B〉 where A,B are DFA:1. Construct C as above2. Run T from the proof of Theorem 4.4 (emptiness problem) on 〈C〉.3. If T accepts, ACCEPT. Otherwise, REJECT.”


Infinity problem for DFA

Theorem (4.5)InfDFA = {〈A〉 | A is a DFA and L(A) is infinite} is decidable.

Proof.• Suppose the DFA A has k states. Then by pumping lemma, L(A) is

infinite if and only if A accepts some string of length at least k .• Let Sk be a DFA with the same alphabet as A that accepts a string if

and only if its length is at least k . Then

L(A) is infinite ⇐⇒ L(A) ∩ L(Sk ) 6= ∅.

• The right hand side is decidable:Firstly, a TM can compute a DFA C such that L(C) = L(A) ∩ L(Sk ).Secondly, we run the algorithm for the emptiness problem on 〈C〉and give the opposite answer.


Section 4.2: The halting problem

We give a specific problem that is algorithmically unsolvable.Background:

• We know already that it is undecidable whether a polynomial over Zhas an integral root. (We didn’t give a proof– too hard.)• Clearly some (bizarre) problem is unsolvable, simply because there

are uncountably many problems (say, uncountably many subsets of{0,1}∗).• the point it that we can describe this unsolvable problem– in fact it is

Turing recognizable.• But we also show that it is undecidable.


Description of the halting problem

The problem we mean is the acceptance problem for TM, also calledhalting problem:

ATM = {〈M,w〉 | M is a Turing machine that accepts w }.

ATM can be seen as an abstract form of thesoftware verification problem.


The halting problem is Turing recognizable

FactATM is Turing recognizable.

Proof. The following Turing machine U recognizes ATM .U = “on input 〈M,w〉. where M is a TM and w a string1. simulate M on input w2. If M enters its accepting state ACCEPT; if M enters its rejectingstate, REJECT”

(If M loops on input w , then U loops on input 〈M,w〉.)• U is a “universal Turing machine”.• To implement U, we have to be careful, because U ’s work alphabet

must be of fixed size.• The same trick works that we already applied for the machine

deciding ADFA: we code M ’s states and alphabet symbols by strings.


The halting problem is undecidable

Theorem (4.11)The following is undecidable:ATM = {〈M,w〉 | M is a TM that accepts w}.

Proof. By contradiction. Assume that the Turing machine H is adecider for ATM . Consider the following Turing machine D:D = “ on input 〈M〉 where M is a Turing machine:1. Run H on the input 〈M, 〈M〉〉.2. If H accepts, REJECT. If H rejects, ACCEPT. ”Since H is a decider, D is also a decider. We get a contradiction whenwe run D on “itself”, or rather, on the input 〈D〉 that encodes D.

D accepts 〈D〉 ⇐⇒ H rejects 〈D, 〈D〉〉⇐⇒ D does not accept 〈D〉.

Contradiction!André Nies () CS 350 All slides week 5-8 May 4, 2010 44 / 31

Explanation via the Barber Paradox

Think of D as a barber. The rule is that

D shaves M if and only if M doesn’t shave himself.

Should D shave himself??? Contradiction!

This is in exact analogy to the previous proof.

• “D shaves M” becomes D accepts 〈M〉• Then we have by the rule above,

D accepts 〈M〉 ⇐⇒ M does not accept 〈M〉.• In particular,

D accepts 〈D〉 ⇐⇒ D does not accept 〈D〉,contradiction.


The diagonalization method (p. 174)

• We will review Cantor’s diagonalization method.

• Then we apply it to give a further view of the proof that the haltingproblem is undecidable.


When do infinite sets have the same size?DefinitionLet A,B be sets. Let f : A→ B.

• f is a correspondence (or bijection) if f is one-to-one and onto.• A and B have the same size if there is a correspondence f : A→ B.

Example: Let N = {1,2,3,4, . . .}. Let E = {2,4,6,8, . . .}.Then N has the same size as E , via the correspondence

1 → 2

2 → 4

3 → 6

4 → 8

. . .

It doesn’t matter that E is also a proper subset of N !André Nies () CS 350 All slides week 5-8 May 4, 2010 47 / 31

Countable sets

Example (4.15)The set Q of rationals has the same size as N .

DefinitionA set S is countable if it is finite, or has the same size as N .

Let R be the set of real numbers.The following is due to Georg Cantor, 1873.

Theorem (4.17)R is not countable.


Prove that R is not countable by diagonalizationSuppose for a contradiction that f : N → R is a correspondence.Define a real number r < 1 such that for each n,

the n–th digit of r does not equal the n–th digit of f (n).

Then r 6= f (n) for each n, contradiction.For instance, In the example of the figure, choose r = 0.211 . . .:

(Minor detail: we choose all the digits of r to be unequal 0 and 9. Thenwe have no problem with ambiguous representations of the same real,such as 0.19999 . . . = 0.2.)


Viewing the proof that ATM is undecidable asdiagonalization

We assumed there is a decider H such that

H accepts 〈M,w〉 ⇐⇒ M accepts w .

Using H, we defined a decider D such that

D rejects 〈M〉 ⇐⇒ M accepts 〈M〉.

We got a contradiction because

D rejects 〈D〉 ⇐⇒ D accepts 〈D〉.


Table of results of the computations Mi on input 〈Mk 〉:

The decider H accepts 〈Mi , 〈Mk 〉〉 if and only if Mi accepts 〈Mk 〉; else itrejects.


The contradiction

We can choose i such that D is Mi . It does the opposite of the entry onthe diagonal in the last table. At the entry (i , i) we get a contradiction.


Post’s complementation theorem

Theorem (4.22)Let A ⊆ Σ∗. ThenA is decidable ⇐⇒ both A and Σ∗ − A are Turing recognizable.

Proof: ⇒: Both A and Σ∗ − A are decidable, and hence Turingrecognizable.

⇐: Suppose that machine M1 recognizes A, and M2 recognizesΣ∗ − A.

Define a machine M deciding A:

M = “on input w1. Run both M1,M2 on input w in parallel.2. If M1 accepts, ACCEPT. If M2 accepts, REJECT.”


A language that is not Turing recognizable

Recall that ATM is the acceptance problem for Turing machines.

Corollary (4.23)Σ∗ − ATM is not Turing recognizable.

Proof.• We have seen that ATM itself is Turing recognizable.• If its complement ATM is also Turing recognizable, then ATM is

decidable by the previous Theorem 4.22, contradiction.


Decidability problems for context free grammars


A primer in context-free grammars (from Ch. 2)

Definition (2.2)A context-free grammar (CFG) is a 4-tuple (V,Σ,R,S).

• V is the set of variables. We use capital letters A,B,C,X ,Y forvariables.• Σ is the set of terminals.• R is a set of rules. They are of the form X → α, where X ∈ V andα ∈ (V ∪ Σ)∗.• S ∈ V is the start variable.

ExampleA→ 0A1A→ BB → #.Here V = {A,B},Σ = {0,1,#}, and the start variable is A.


Describing a fragment of the English language

There are 10 Variables, namely the capitalized terms (this stretchesthe formal definition a bit); 27 terminals (usual English alphabet andspace character); 18 rules.One can derive the stringa girl with a flower likes the boy


Deriviations

Given a CFG G, to derive a string w ∈ Σ∗,• we begin with the start variable;• if a variable in the current string is the left side of a rule, we can

replace it by the right side of that rule;• we do this till we obtain w (thus, we got rid of all the variables).

DefinitionThe language generated by a context-free grammar G is

{w ∈ Σ∗ : G derives w}.

Such a language is called context-free.


Example of a derivation, and its parse tree

ExampleWe derive w = 000#111 in the grammar above.

A⇒ 0A1⇒ 00A11⇒ 000A111⇒ 000B111⇒ 000#111.

This can be shown as a parse tree:


Example of a derivation

Exercise: Derive the following string by writing a parse tree.the boy likes a boy


Matching parenthesesWe describe the language L of strings of matching parentheses. Thisis needed for parsing.For instance, (()()) ∈ L. But ((() 6∈ L, and also (()))(6∈ L.

Example (2.2)Σ = {(, )}, V = {S}, rulesS → (S)

S → SSS → ε.A useful shorthand for such a set of rules is the following:S → (S) | SS | ε

Derivation of (()()):

S ⇒ (S)⇒ (SS)⇒ ((S)S)⇒ ((S)(S))⇒ (()(S))⇒ (()()).


Chomsky normal form

DefinitionA grammar with start variable S is in Chomsky normal form if everyone of its rules is in one of the following forms:A→ BC where B,C 6= S;A→ c where c ∈ Σ is a terminal;S → ε.

Theorem (2.9)Every context-free language is generated by a CFG in Chomskynormal form.In fact, there is a Turing machine converting a CFG into an equivalentone in Chomsky normal form.

For the proof see Sipser (not required for exam :).


Example of a grammar in Chomsky normal formΣ = {a,b}. The following grammar generates a∗:

S → aS | ε.An equivalent grammar in Chomsky normal form is:

S → ε | aS → TTT → TTT → a.

We derive the string aaa in this grammar:

S ⇒ TT ⇒ TTT ⇒ aTT ⇒ aaT ⇒ aaa.

Note that this derivation has 5 = 2 · 3− 1 arrows (steps). This isnecessarily so, see below.

Exercise: do the analogous thing for a∗b∗: first write a naive grammarthat generates it, and then grammar in Chomsky NF.


Solution to the exercise:

The grammar is S → aS | Sb | ε.

An equivalent grammar in Chomsky normal form is:

S → ε | a | bS → UVU → UU | aV → VV | b.


Counting the steps in derivations for a grammar inChomsky NF

FactGiven a grammar in Chomsky NF, and a string w of length n ≥ 1, anyderivation of w has exactly 2n − 1 steps.

Proof. To derive a string w of length n ≥ 1, we must apply, in someorder,

• n − 1 rules of the form A→ BC, and• n rules of the form A→ c.


Acceptance problem for CFGACFG = {〈G.w〉 : G is a CFG that generates the string w}.

Theorem (4.7)ACFG is decidable.

Proof. The following TM decides ACFG.M = “ on input 〈G.w〉 where G is a CFG and w ∈ Σ∗ is a string oflength n:1. Convert G into an equivalent grammar in Chomsky Normal Form,using the machine from Theorem 2.9.2. List all the derivations with 2n − 1 steps in this new grammar. (Ifn = 0, instead see whether S → ε is a rule.)3. If any of these derivations yields w , ACCEPT. Otherwise, REJECT.This works because for a grammar in Chomsky NF, any derivation of astring w of length n ≥ 1 has 2n − 1 steps. There are only finitely manysuch derivations, so we can try them all.


Emptiness problem for CFGs is decidableGiven context free grammar G, we ask whether it derives no string.

Theorem (4.4)ECFG = {〈G〉 : G is a CFG that derives the empty language } isdecidable.

Proof. The following Turing machine R goes “backwards” starting fromthe terminals, trying to get to S.R = “on input 〈G〉 where G is a CFG:1. Mark all the terminal symbols of G2. Repeat until no new variable gets marked:

3. Mark any variable A such that G contains a rule

A→ s1 . . . sk ,

and each symbol s1, . . . , sk is already marked.4. If the start variable is not marked ACCEPT. Otherwise, REJECT.”


Examples

ExampleLet G1 have the rules S → A, A→ aB, B → b.

S A B a bS A B a bS A B a bS A B a bS is marked, so REJECT.

ExampleLet G2 have the rules S → 0A, A→ 0B, B → S, C → 1.

S A B C 0 1S A B C 0 1S is not marked, so ACCEPT.


An undecidable problem about CFG’s

Theorem (5.13)AllCFG = {〈G〉 : G is a CFG that derives every string } is undecidable.


Pushdown automata, PDA (see page 110)

ExampleThere is a pushdown automaton recognizing the languageL = {0i1i : i ≥ 1}.

Note that no DFA can recognize this language.


Pushdown automaton recognizing {0i1i : i ≥ 1}

Go through the input:• As long as we read 0s on the input tape, push them on the stack.• As long as we read another 1, remove the top element from the

stack (“pop” the stack).

ACCEPT if the stack is empty when the whole input has been read.REJECT if the stack is empty before, or not empty after the input hasbeen read.

Exercise: palindromes over Σ = {0,1}, such as 01000010 or 010, canbe recognized by a PDA. This needs non-determinism. See SipserExample 2.18.


Pushdown automata are equivalent to CFGs

The formal definition of Pushdown automata (2.13) allows fornondeterminism. Deterministic PDA are less powerful.

Theorem (2.20)A language is context-free ⇐⇒ some PDA recognizes it.

We omit the proof (see Sipser/not required).

It follows that every regular language is context-free, because eachDFA can be seen as a PDA that ignores its stack. You can also provethis fact directly, without using Theorem 2.20 (exercise?).


Chapter 7: Time complexity


Resources matter

• So far, decidability of a problem was taken as decidability inprinciple.

• We only cared that a decider halts on each input, not how long thistakes.

• We now look at decidability in a more practical sense.

• We want to know whether a problem is efficiently decidable.


Big-O notation

We want to get rid of multiplicative constants in our notation forestimates.For instance we have n2 + n + 7 ≤ 2n2 for almost all n. We will writethis as n2 + n + 7 = O(n2).

Definition (7.2)We write f (n) = O(g(n)) if there is n0 and a constant c such that

f (n) ≤ c · g(n) for all n ≥ n0.

Here f ,g are functions N → R+.


Sample algorithm

The following TM decides the language {0k1k : k ≥ 0}.


Analyzing the running time of this algorithm

Let n be the length of the input.Phase 1: n steps to scan the tape, n steps to return head to theleftmost posiition.Phases 2,3: Repeatedly scans tape and crosses off a 0 and a 1 eachtime. Each scan uses O(n) steps. There are most n/2 scans. Sotogether n/2 ·O(n) = O(n2).Phase 4: O(n) steps.Alltogether, O(n) + O(n2) + O(n) = O(n2) steps.


Running time and time complexity classesDefinition (7.1)• Let M be a deterministic TM that is a decider (i.e. halts on all inputs).

• The running time of M is the function f : N → N such that f (n) isthe maximum number of steps M uses on any input of length n.

• We say that M is an f (n) time Turing machine.

Usually we denote the input length by n. For instance, the machineabove is an f (n) time TM for some f such that f (n) = O(n2).

Definition (7.7)Let t : N → R+. We define the time complexity class

TIME(t(n))

to be the collection of all languages that are decidable by an O(t(n))

time Turing machine.André Nies () CS 350 All slides week 5-8 May 4, 2010 78 / 31

The goal for the next few slides

We want to figure out to what extent the complexity of a languagedepends on the particular machine model.

This will eventually lead us to the definition of the class P (polynomialtime).

As an example we take the language A = {0k1k : k ≥ 0}.

We also learn a bit on logarithm, and the small 0-notation.


A faster machine deciding A = {0k1k : k ≥ 0}We know that A is in TIME(n2). The following TM also decides thelanguage {0k1k : k ≥ 0}.

This machine shows that A is in TIME(n log n) because in each time inPhase 4 we cut the number of remaining 0,1 by half.


A note on logarithm

• Logarithm is usually taken with base 2 in complexity theory.

• Note that if b is some other base, then we have

logb n = logb 2 · log2 n.

(This is so because n = blogb n = (blogb 2)log2n.)

• So, since TIME(t(n)) is the collection of all languages that aredecidable by an O(t(n)) time Turing machine, it doesn’t really matterwhich base we take for the logarithm in time bounds.


Small o-notation

Now we want to express that f grows slower than g.

Definition (7.5)Let f ,g be functions N → R+. We write f (n) = o(g(n)) if

limn→∞f (n)/g(n) = 0.

This means that for all c > 0 there is a number n0 such thatf (n) ≤ c · g(n) for all n ≥ n0.

(Compare: f = O(g) means that for some c there is such an n0.)

Example

log2n = o(√

n).


M2 is the fastest possible one tape machine decidingA = {0k1k : k ≥ 0}

Theorem (Problem 7.47 in Sipser)Suppose the 1-tape DTM M decides a language L in time o(n log n).Then L is regular.

This is an example of a “gap theorem”: there is a complexity gapbetween the time bound O(n) and O(n log n) for 1-tape machines. Nolanguage has a time complexity properly in between the two timebounds.


A linear time 2-tape TM deciding AThe following 2-tape TM decides the language {0k1k : k ≥ 0} in timeO(n) (also called linear time).

Running time analysis: Each of the four stages uses O(n) steps. Sothe running time is O(n).


Tape reduction and running time

Theorem (7.8)Let t(n) be a function, where t(n) ≥ n. Then every t(n) time multitapeTM M has an equivalent O(t2(n)) time single tape TM S.

Proof: recall how tape reduction works (Theorem 3.3):


Now one does a running time analysis. Suppose M has k tapes. Notethat in M ’s computation, no tape can have more than t(n) symbols.

• Initially S puts the tape of into the proper format. steps: O(n).• For each step of M, the machine S does the following:

• sweep right to the last #, recording in the constant sizememory (CPU) the symbols the heads of M are currentlyreading. steps: O(t(n)).• sweep left to the beginning of the tape, carrying out the

actions M would do. steps: O(t(n)).• if M needs a new cell on one of its tapes, then S moves the

rest one position to the right. It may have to do this for all the ktapes of M it simulates. steps: k ·O(t(n)) = O(t(n)).

In total we have

O(n) + t(n) ·O(t(n))

steps of S. Because t(n) ≥ n, this is O(t2(n)), as required.(Homework: read more details in Sipser p. 254.)


The class P

The following important class captures the intuition of a problem beingefficiently solvable.

Definition (7.12)P is the class of languages that are decidable in polynomial time on adeterministic single-tape Turing machine. That is,

P =⋃k

TIME(nk ).


The class P is machine-independent

• The class P doesn’t depend on the particular deterministic TMmodel.

• For instance, we could also define P in terms of multitape machineand still get the same class:

• if a language L can be decided in time t(n) = O(nk ) on amultitape TM, then by Theorem 7.8, it can be decided in time

O(t2(n)) = O(n2k )

on a single tape machine.

• For instance, if k = 3 and the implicit constant in t(n) = O(n3) is 5,we have (5n3)2 = 25n6. Hence we have O(25n6) as a time boundfor the single tape machine, which is still O(n6).


Examples of problems in P

We will give two interesting examples of problems in P.

• PATH: in a directed graph G, given two nodes s, t , is there a pathfrom s to t?

• RELPRIME: Given natural numbers x , y , are they relatively prime?(This means that the greatest common divisor is 1.)


How to describe polynomial time algorithms

Given that the class P is machine independent, it is fine to givehigh-level descriptions of the algorithms.

However, we need to put these descriptions into a specific format sothat we can later analyze the algorithm for being polynomial.

• As before, we encode objects G (such as a graph) by strings 〈G〉 insome way that has been previously agreed upon. The encodingmust be reasonable. This means there is no other encoding leadingto much shorter strings.• We describe algorithms in numbered stages. In the end we must

be able to check that

• we carry out polynomially many stages (in the input length)• each single stage can be implemented by polynomially many

steps on a TM.


More details on reasonable encodings of objects bystrings

• It is not reasonable to encode numbers in unary, for instance towrite 17 as 11111111111111111.

• For, the binary encoding 10001 of 17 is much shorter.

• Both the adjacency list and the adjacency matrix encoding ofdirected graphs are reasonable:

• There is no much shorter encoding of graphs.

• Note that we can pass from one encoding of graphs to the other inpolynomial time.


The path problem

PATH= {〈G, s, t〉 : G is a directed graph that has a path from s to t}.

Can we get from s to t by going along directed edges?


PATH is in P

Theorem (7.14)The problem PATH is in P.

• “Brute force” doesn’t help here. If G has m nodes, then the numberof paths without repetition is about m!. We cannot try them all inpolynomial time.• But, as you already know (from papers like CS 220, SE 211), there

are efficient algorithms for solving PATH, such as breadth-firstsearch.• We will give a high-level description of a version of breadth-first

search, and use it to practice complexity analysis.


High-level description of a breadth-first algorithm forPATH

Time analysis. Let m be the number of nodes (which is clearly less than theinput length in either representation of graphs).Stages 1,4 are executed only once. Stage 3 runs at most m times, becauseeach time another node is marked. So the total number of stages is at most2 + m.Each single stage takes a polynomial number of TM steps. This is clear forstage 1, 4. For stage 3, we need to scan the input to test which nodes aremarked. This can be implemented in polynomial time.


RELPRIME is in P

Numbers x , y are called relatively prime if the largest number thatdivides them both is 1.

RELPRIME = {〈x , y〉 : x , y are relatively prime}.

Theorem (7.15)RELPRIME is in P.

The encoding of numbers is in binary. We code pairs via a separatorsymbol #. Thus, a possible input for the TM deciding RELPRIMEwould be 10101#1101.


Proof that RELPRIME is in P

We do a time analysis of the Euclidean algorithm E .

The actual algorithm deciding RELPRIME uses E as a subroutine:

It is sufficient to check that E describes a polynomial time TM.


Euclid is polynomial time

Sequence of values when we are at (1.), on input 10101#1101:x y

21 1313 8

8 55 33 22 11 0

With a little algebra, one shows that every execution of stage 2 (exceptmaybe the first) cuts the value of x by at least half.


Euclid is polynomial time

• Each time we also exchange x , y .• This means that each of the original values x , y is at least halved

every two rounds.• So the number of rounds (i.e. times we pass 1.) is the minimum of

1 + 2 log2 x and 1 + 2 log2 y .• log2 x is at most the length of the binary presentation of x , so the

number of rounds is bounded by twice the length n of 〈x , y〉. Thismeans we get by with at most O(n) rounds.• Each stage uses polynomially many steps on a TM.


Further problems in P

Theorem (7.16)Every context-free language is in P.

This is proved with a technique called dynamic programming, whereone stores intermediate results in a table so that they don’t have to becomputed again.In fact each context-free language is in TIME(n3) on a multitape TM.


PRIMES is in PSome real-life programs need to find large prime numbers. Forinstance, the RSA public-key cryptosystem needs two prime numbersof about 100 decimal digits, which it then multiplies.

Theorem (Agrawal, Kayal, Saxena, 2001)PRIMES= {〈x〉 : x is a prime number} is in P.

• This is much smarter than the “brute force” attempt to find a properfactor.• In fact the algorithm decides whether x is prime without trying to find

a factor at all!• Instead it uses number theoretic properties of prime numbers, along

the lines of little Fermat’s theorem(if p is prime then ap−1 ≡ 1 mod p for each a < p).• The best result known to me is that PRIMES is TIME(n6) on a

multitape TM.André Nies () CS 350 All slides week 5-8 May 4, 2010 100 / 31

Efficient version of the Church-Turing thesis

Efficient algorithm

equals

polynomial time deterministic (Turing) machine algorithm.

Evidence (somewhat shaky) for this thesis comes from the following:

• all the detailed polynomial time machine models yield the sameclass. For instance, 1 tape and multitape machines yield the sameclass P.

• most existing efficient algorithms can indeed be carried out bypolynomial time deterministic Turing machines.


Why the “efficient Church-Turing thesis” this is notuniversally accepted

• some practical algorithms such as simplex method are in factexponential time, but poly time on average.

• We allow that the constants and exponents in the polynomial timebounds are hideously large, such as 5000000n5000000. This is notrealistic. (However, these large numbers never occur when weimplement algorithms used in practice.)

• LOGSPACE (read-only input/ work space O(log n)) is a classcontained in P which could also be seen as a formalization ofefficient computability. However, it is unknown whether that classequals P. See Sipser Ch.8.


cs 350 all slides week 5-8 - university of aucklandcristian/mfcsdir/cris/...1.zig-zag across the...

Documents