inferring finite automata from queries and counter-examples eggert jón magnússon

Inferring Finite Automata Inferring Finite Automata from queries and counter-from queries and counter-examplesexamplesEggert Jón Magnússon

Learning a languageLearning a languageInferring finite automata is

analogous to learning a language.In fact, there is no way to

distinguish between two automata that recognize the same language, without examining the state structure.

We focus on finding the minimum equivalent automata.

Requirements for learningRequirements for learning It has been shown that the only classes of

languages that can be learned from positive data only are classes which include no infinite language.

The idea is proof by contradiction.Assume that we have a guessing algorithm that can build an automaton to recognize the finite language L from the series of strings w1...wn, members of L.

Build an infinite language L’ that simply consists of the strings w1...wn, plus at least one rule or string that is not a member of L. The infinite language can therefore always fool any guessing algorithm.

TeacherTeacherAngluin introduced the concept of a

minimally adequate teacher, that can answer the questions:◦“is S a member of L” – yes/no◦“Is given DFA, D, the answer” – yes / or a

string from the symmetric difference of LD and L (either a string that is in L and not in LD or a string that is in LD and not in L).

With a given teacher, an algorithm exists that recognizes a regular set, and is P.

Angluin’s AlgorithmAngluin’s AlgorithmIteratively, the algorithm builds a

DFA using membership queries, then presents the teacher with the DFA as a solution.

If the DFA is accepted, the algorithm is finished. Otherwise, the teacher responds with a counter-example, a string that the DFA presented would either accept or reject incorrectly.

The algorithm uses the counter-example to refine the DFA, going back to the first step.

Angluin’s Algorithm, Angluin’s Algorithm, details.details. The algorithm uses two sets, S for states and E for

experiments, and one observation table, T, where elements of (SSA) form rows, and elements of E form columns – the values of each cell is the outcome of a membership test for the concatenation of the row and column strings.

The set S is prefix-complete, the set E is suffix-complete. Before making a guess, the observation table is required

to be closed and consistent. ◦ Closed means that there are no unique rows in the bottom

part of the observation table, for elements in SA. ◦ - if the observation table isn’t closed, we find a unique row

in the bottom part of the observation table, and pull it’s corresponding element from SA into S

◦ Consistent means that if two rows for elements s1, s2 in S in the table are the same, for all a in A, the rows for s1a and s2a are the same.

◦ - if the table isn’t consistent, we find a suffix where this doesn’t hold, and add that to E.

Example RunExample RunLet’s use an example DFA from

Sipser (Example 1.68, p. 76 in International version).

The alphabet is A= {a,b}

Example, continuedExample, continuedS = E = {}T initialized with

T is not closed – t(a) t()

Add “a” to S, extend TT is now both closed

and consistent.

0

a 1

b 1

0

a 1

b 1

aa 0

ab 1

First guessFirst guessThe teacher rejects, and

gives the counterexample “ba” – which is not accepted by the first guess.

We add “ba” and all it’s prefixes (“b”) to S.

S is now: {,“a”,”b”,”ba”}Now, the table is no longer

consistent – row(b) = row(ba), but row(bab)row(bb).

We add “b” to E

0

a 1

b 1

ba 1

aa 0

bb 0

ab 1

baa 0

bab 1

Second guess Second guess The table is now

consistent, and closed, so we make a guess.

Note that the unique row “bitmask values” translate directly to states.

T b

0 1

a 1 1

b 1 0

ba 1 1

aa 0 1

bb 0 1

ab 1 1

baa 0 1

bab 1 1

Running timeRunning timeEquivalence test uses EQDFASince, for each equivalence test, we

add at least one state to the guess state machine, in the worst case, we make one guess for each state in the target machine.

In general, before each guess, we add only one string to either S or E.

The running time is O(m2n2 + mn3) – m is the longest counterexample produced, and n is the number of states in the target machine.

Further workFurther workThe requirement of a teacher is considered

unfair by many and requiring too much knowledge of the automaton.

Estimation/exploration algorithm (EEA) is a genetic algorithm.◦ Creates many random state machines, and

many random test strings◦ Compares the output of the random state

machines with the output of the target machine

◦ Iteratively refines, alternatively, the random state machines and test strings, either until convergence or until some desirable behaviour is displayed.

◦ Verification is done with a new set of test strings.

ReferencesReferencesAngluin, D., 1987. Learning

Regular Sets from Queries and Counter-examples.

Gold, E. Mark, 1967. Language Identification in the Limit.

Bongard, J., Lipson, H., 2005. Active Coevolutionary Learning of Deterministic Finite Automata.

inferring finite automata from queries and counter-examples eggert jón magnússon

Documents