inferring finite automata from queries and counter-examples eggert jón magnússon
TRANSCRIPT
Inferring Finite Automata Inferring Finite Automata from queries and counter-from queries and counter-examplesexamplesEggert Jón Magnússon
Learning a languageLearning a languageInferring finite automata is
analogous to learning a language.In fact, there is no way to
distinguish between two automata that recognize the same language, without examining the state structure.
We focus on finding the minimum equivalent automata.
Requirements for learningRequirements for learning It has been shown that the only classes of
languages that can be learned from positive data only are classes which include no infinite language.
The idea is proof by contradiction.Assume that we have a guessing algorithm that can build an automaton to recognize the finite language L from the series of strings w1...wn, members of L.
Build an infinite language L’ that simply consists of the strings w1...wn, plus at least one rule or string that is not a member of L. The infinite language can therefore always fool any guessing algorithm.
TeacherTeacherAngluin introduced the concept of a
minimally adequate teacher, that can answer the questions:◦“is S a member of L” – yes/no◦“Is given DFA, D, the answer” – yes / or a
string from the symmetric difference of LD and L (either a string that is in L and not in LD or a string that is in LD and not in L).
With a given teacher, an algorithm exists that recognizes a regular set, and is P.
Angluin’s AlgorithmAngluin’s AlgorithmIteratively, the algorithm builds a
DFA using membership queries, then presents the teacher with the DFA as a solution.
If the DFA is accepted, the algorithm is finished. Otherwise, the teacher responds with a counter-example, a string that the DFA presented would either accept or reject incorrectly.
The algorithm uses the counter-example to refine the DFA, going back to the first step.
Angluin’s Algorithm, Angluin’s Algorithm, details.details. The algorithm uses two sets, S for states and E for
experiments, and one observation table, T, where elements of (SSA) form rows, and elements of E form columns – the values of each cell is the outcome of a membership test for the concatenation of the row and column strings.
The set S is prefix-complete, the set E is suffix-complete. Before making a guess, the observation table is required
to be closed and consistent. ◦ Closed means that there are no unique rows in the bottom
part of the observation table, for elements in SA. ◦ - if the observation table isn’t closed, we find a unique row
in the bottom part of the observation table, and pull it’s corresponding element from SA into S
◦ Consistent means that if two rows for elements s1, s2 in S in the table are the same, for all a in A, the rows for s1a and s2a are the same.
◦ - if the table isn’t consistent, we find a suffix where this doesn’t hold, and add that to E.
Example RunExample RunLet’s use an example DFA from
Sipser (Example 1.68, p. 76 in International version).
The alphabet is A= {a,b}
Example, continuedExample, continuedS = E = {}T initialized with
T is not closed – t(a) t()
Add “a” to S, extend TT is now both closed
and consistent.
0
a 1
b 1
0
a 1
b 1
aa 0
ab 1
First guessFirst guessThe teacher rejects, and
gives the counterexample “ba” – which is not accepted by the first guess.
We add “ba” and all it’s prefixes (“b”) to S.
S is now: {,“a”,”b”,”ba”}Now, the table is no longer
consistent – row(b) = row(ba), but row(bab)row(bb).
We add “b” to E
0
a 1
b 1
ba 1
aa 0
bb 0
ab 1
baa 0
bab 1
Second guess Second guess The table is now
consistent, and closed, so we make a guess.
Note that the unique row “bitmask values” translate directly to states.
T b
0 1
a 1 1
b 1 0
ba 1 1
aa 0 1
bb 0 1
ab 1 1
baa 0 1
bab 1 1
Running timeRunning timeEquivalence test uses EQDFASince, for each equivalence test, we
add at least one state to the guess state machine, in the worst case, we make one guess for each state in the target machine.
In general, before each guess, we add only one string to either S or E.
The running time is O(m2n2 + mn3) – m is the longest counterexample produced, and n is the number of states in the target machine.
Further workFurther workThe requirement of a teacher is considered
unfair by many and requiring too much knowledge of the automaton.
Estimation/exploration algorithm (EEA) is a genetic algorithm.◦ Creates many random state machines, and
many random test strings◦ Compares the output of the random state
machines with the output of the target machine
◦ Iteratively refines, alternatively, the random state machines and test strings, either until convergence or until some desirable behaviour is displayed.
◦ Verification is done with a new set of test strings.
ReferencesReferencesAngluin, D., 1987. Learning
Regular Sets from Queries and Counter-examples.
Gold, E. Mark, 1967. Language Identification in the Limit.
Bongard, J., Lipson, H., 2005. Active Coevolutionary Learning of Deterministic Finite Automata.