LR(0) CONJUNCTIVE GRAMMARS AND DETERMINISTIC SYNCHRONIZED
ALTERNATING PUSHDOWN AUTOMATA
Tamar Aizikowitz and Michael KaminskiTechnion – Israel Institute of Technology
June 17 CSR 2011, St. Petersburg, Russia
Context-Free Languages
Combine expressiveness with polynomial parsing, which is appealing for practical applications.
One of the most widely used language class in Computer Science.
At the theoretical basis of Programming Languages, Computational Linguistics, Formal Verification, Computational Biology, and more.
Extended Models
Goal: Models of computation that generate a slightly stronger language class without sacrificing polynomial parsing.
Why? Such models seem to have great potential for practical applications.
In fact, several fields (e.g., Computational Linguistics) have already voiced their need for a stronger language class.
Conjunctive Grammars
Conjunctive Grammars (CG) [Okhotin, 2001] are an extension of context-free grammars.
Add explicit intersection rulesS (A & B) ⋯ (w & w) w
Semantics: L(A & B) = L(A) ⋂ L(B) Since context-free languages are not closed under
intersection, CG is a wider class of languages Retain polynomial parsing and therefore can be
useful for practical applications
Synchronized Alternating PDA Synchronized Alternating Pushdown Automata
(SAPDA) [Aizikowitz & Kaminski, 2008] extend PDA.
Stack modeled as tree in which all braches must accept.
A limited form of synchronization createslocalized parallel computations.
First automaton counterpart shown for Conjunctive Grammars.
Conjunctive Grammars
A CG is a quadruple: G=(V , Σ , P , S )
Rules: A → (α1 & ⋯ & αn), where A∊V, αi∊(V ⋃ Σ)*
Examples: A → (aAB & Bc & aD) ; A → abC Derivation steps:
s1As2 s1(α1 & ⋯ & αn)s2, where A→(α1 & ⋯ &
αn)∊P s1(w & ⋯ & w)s2 s1 w s2, where w ∊ Σ*
non-terminals terminals derivation rules start symbol
The case of all n=1 is an ordinary CFG
Grammar Language
Language: L(G) = {w ∊ Σ* | S *w}, i.e., L(G) consists of all terminal words w derivable from the start symbol S.
Note: ( , ) , and & are not terminal symbols. Therefore, all conjunctions must be collapsed in order to derive a terminal word.
Semantics: (A & B) * w if and only if A *w and B *w. Therefore, L(A & B) = L(A) ⋂ L(B).
Example: Multiple Agreement Example: a CG for the multiple-agreement language
{anbncn | n ∊ ℕ}:
C → Cc | D ; D → aDb | ε : L(C) = {anbncm | n,m ∊ ℕ}
A → aA | E ; E → bEc | ε : L(A) = {ambncn | n,m ∊ ℕ}
S → (C & A) : L(S ) = L(C) ⋂ L(A)
S (C & A) (Cc & A) (Dc & A) (aDbc & A) (abc & A) ⋯ (abc & abc) abc
Synchronized Alternating Pushdown Automata (SAPDA)
An extension of classical PDA.
Transitions are made to conjunctions of ( state , stack-word ) pairs, e.g.,δ( q , σ , X ) = { ( p1 , XX ) ∧ ( p2 ,Y ) , ( p3 ,
Z ) }
Note: if all conjunctions are of one pair only, the automaton is an “ordinary” PDA.
non-deterministic model, i.e., many possible transitions
SAPDA Stack Tree
The stack of an SAPDA is a tree. A transition to n pairs splits the current branch into n branches.
Branches are processed independently. Empty sibling branches can be collapsed if they are
synchronized, i.e., are in the same state and have read the same portion of the input.
BA C
Dq
p
BAq
(q,A)(p,DC)δ(q,σ,A)
B qcollapseBε εq q
SAPDA Definition
An SAPDA is a sextuple A = ( Q , Σ , Γ , q0 , δ , ⊥ )
Transition function:δ(q,σ, X ) ⊆ {(q1,α1) ∧ ⋯ ∧ (qn,αn) | qi ∊ Q, αi ∊
Γ*, n ∊ ℕ}
states terminals initial state
transition function
initial stack symbol
stack symbols
SAPDA Computation and Language Computation:
Each step, a transition is applied to one stack-branch If a stack-branch is empty, it cannot be selected Synchronous empty sibling branches are collapsed
Initial Configuration: Accepting configuration: Language: L(A) = {w∊Σ* | ∃ q∊Q,
(q0,w,⊥)⊢*(q,ε,ε)} Note: all branches must empty, i.e., must “agree”.
⊥ q0ε q
have the same state and remaining input
Equivalence of SAPDA and CG Theorem 1. A language is generated by an CG if
and only if it is accepted by an SAPDA.
The equivalence is analogous to the classical equivalence between CFG and PDA.
The proofs of the equivalence are extended versions of the classical proofs.
Corollary: Single-state SAPDA and multi-state SAPDA are equivalent (like ordinary PDA).
Deterministic SAPDA and LR(0) CGLinear-time Implementation of DSAPDA and LR(0) Parsing
Linear-time LR(0) Parsing
Deterministic Context-free Languages
DCFL are a sub-family of context-free languages, accepted by Deterministic PDA (DPDA).
LR(k) languages were introduced by Knuth, and shown to be equivalent to DPDA, [Knuth, 1965].
Knuth developed a linear-time parsing algorithm for DCFL the basis of Compilation Theory.
Knuth also proved that LR(0) grammars are equivalent to DPDA which accept by empty stack.
Deterministic SAPDA
Deterministic SAPDA are defined according to the classical notion of a deterministic model.
That is, an SAPDA is deterministic if it has only one possible step from any given configuration.
Remark: A deterministic SAPDA has exactly one computation on any given input word w.
LR(0) Conjunctive Grammars To define LR(0) conjunctive grammars, we extend
the notions of viable prefixes, valid items ([A → ]), etc…
A conjunctive grammar is LR(0) if a conflict-free item automaton can be constructed for it.
Main issue: How to build the item automaton so that it supports conjunctive rules...
Solution: Read the paper
Equivalence Results
Theorem 2. A language is generated by an LR(0) CG if and only if it is accepted by a DSAPDA.
The equivalence is analogous to the classical equivalence between LR(0) CFG and DPDA.
The proofs of the equivalence are extended versions of the classical proofs.
The DSAPDA Membership Problem By using a naive implementation of a DSAPDA,
we have an exponential time solution for the membership problem.
Each branch is linear, but there can be an exponential number of branches.
DSAPDA Efficient Implementation There are at most M = |Q| |Γ| different
combinations for branch head configurations. Thus, at any given time, we need at most M
heads.
Linear-time Parsing
Based on the efficient implementation of a DSAPDA, we can build an LR(0) parser.
The parser runs in linear time for the boolean closure of LR(0) Conjunctive Languages
This class strictly Includes the boolean closure of classical LR(0) languages. { ai1 bai2 b2 ⋯ ain bn $ bai1 bai2 ⋯ bain $ | n ≥ 1 ij ≥
1 }
That is, we have linear-time parsing for a wider class of languages.
Summary
Conjunctive Languages are interesting because: They are a strong, rich class of languages. They are polynomially parsable. Their models of computation are intuitive and easy to
understand; highly resemble classical CFG and PDA. SAPDA are the first automaton model presented for
Conjunctive Languages. They are a natural extension of PDA. They lend new intuition on Conjunctive Languages. They are the basis of a linear parsing algorithm for
a wide class of languages.