formal learning theory michele friend (philosophy) and valentina harizanov (mathematics)

Formal Learning TheoryFormal Learning TheoryMichele Friend (Philosophy)Michele Friend (Philosophy)

andand

Valentina Harizanov (Mathematics)Valentina Harizanov (Mathematics)

Example Example Guessing strings over alphabet Guessing strings over alphabet

{{aa,,bb}}a a Guess?Guess?aa, , aa aa Guess?Guess?aa, , aaaa, , aaa aaa Guess?Guess?aa, , aaaa, , aaaaaa, , b b Guess?Guess?a, aa, aaa, a, aa, aaa, bb, , bb bb Guess?Guess?aa, , aaaa, , aaaaaa, , bb, , bbbb, , bbb bbb Guess?Guess?a, aa, aaa, a, aa, aaa, bb, , bbbb, , bbbbbb, , aba aba Guess?Guess?……

Infinite process called Infinite process called identification in the limit

Learning paradigmLearning paradigm

Language or class of languages to be Language or class of languages to be learned learned

LearnerLearner

Environment in which a language is Environment in which a language is presented to the learner presented to the learner

Hypotheses that occur to the learner about Hypotheses that occur to the learner about language to be learned on the basis of the language to be learned on the basis of the environmentenvironment

Formal language Formal language LL

Given finite alphabet, say Given finite alphabet, say TT={={aa,b},b}A sentence A sentence ww is a string of symbols over the is a string of symbols over the alphabet alphabet TT: : , , aa, , aaaa, , abab, , baba, , bbabba, …, …=empty string=empty stringA language A language LL={={ww00, , ww11, , ww22,…} is a set of correct ,…} is a set of correct sentences, say sentences, say

LL11={={aa, , aaaa, , aaaaaa, , aaaaaaaa, …}, …}LL22={={aa, , bb, , babbab, , abaaba, , bababbabab, …}, …}

ww00, , ww11, , ww22,… is a text for ,… is a text for LL (order and repetition (order and repetition do not matter)do not matter)ww22, , ww11, , ww22, , ww11, , ww33,… another text for ,… another text for LL

Chomsky grammar Chomsky grammar GG=(=(TT, , VV, , SS, , PP))TT={={aa,,bb}, }, VV={={SS}, }, PP=production rules=production rules

LL11 = { = {aa, , aaaa, , aaaaaa, …}, …}

PP11

1.1. SS aSaS

2.2. SS aa

LL((GG11))= L= L11

SS aaS aaaSaS aaaaSaaS aaaaaaaa

Regular grammarRegular grammar

Finite state automatonFinite state automaton

LL22={={aa, , bb, , babbab, , abaaba, ,

bababbabab, , aabaa,aabaa, …} …}

PP22

1.1. SS aSaaSa

2.2. SS bSbbSb

3.3. SS a a

4.4. SS bb

LL((GG22))= L= L22

Context-free grammarContext-free grammar

Push-down automatonPush-down automaton

Coding Chomsky languagesCoding Chomsky languages

Chomsky languages=computably Chomsky languages=computably enumerable languagesenumerable languages

Gödel coding by numbers of finite Gödel coding by numbers of finite sequences of syntactic objectssequences of syntactic objects

Code of Code of LL is is ee: : LL==LLee

Algorithmic enumeration of all Algorithmic enumeration of all (unrestricted) Chomsky languages(unrestricted) Chomsky languages

LL00, , LL11, , LL22, …,, …,LLee,…,…

Decidable (computable) languageDecidable (computable) language

A language is A language is decidabledecidable if there is an if there is an algorithm that recognizes correct from algorithm that recognizes correct from incorrect sentences.incorrect sentences.Chomsky language is decidable exactly Chomsky language is decidable exactly when the incorrect sentences form a when the incorrect sentences form a Chomsky language.Chomsky language.Not every Chomsky language is decidable.Not every Chomsky language is decidable.There is no algorithmic enumeration of all There is no algorithmic enumeration of all decidable languages.decidable languages.

Learning from textLearning from text

An algorithmic learner is a Turing machine being fed text An algorithmic learner is a Turing machine being fed text for the language for the language LL to be learned, sentence by sentence. to be learned, sentence by sentence.At each step the learner guesses the code for the At each step the learner guesses the code for the language being fed:language being fed:ww0 0 ; ; ee00

ww00, , ww11; ; ee11

ww00, , ww11, , ww22; ; ee22

……Learning is successful if the sequence Learning is successful if the sequence

ee00, , ee11, , ee22, , ee33, …, … converges to the “description” of converges to the “description” of LL. .

Syntactic convergenceSyntactic convergenceEXEX-learning-learning

EXEX=explanatory=explanatoryFor some For some nn, have, haveee00, , ee11, …, , …, eenn, , eenn, , eenn, , eenn, …, …

and and

The set of all finite languages is The set of all finite languages is EXEX-learnable from text.-learnable from text.

neLL

Semantic convergenceSemantic convergenceBCBC-learning-learning

BCBC = behaviorally correct = behaviorally correct

For some For some nn, have, have

ee00, , ee11, …, , …, eenn, , een+1n+1, , een+2n+2, , een+3n+3, …, …

andand

There are classes of languages that are There are classes of languages that are

BCBC-learnable, but not -learnable, but not EXEX-learnable-learnable

21 nnn eee LLLL

Learning from an informantLearning from an informant

L L = {= {ww00, , ww11, , ww22, , ww33, … }, … }

(not L)= {(not L)= {uu00, , uu11, , uu22, , uu33, … } = , … } =

incorrect sentences in proper vocabularyincorrect sentences in proper vocabulary

Learning stepsLearning steps

ww0 0 ; ; ee00

ww00, , uu00; ; ee11

ww00, , uu11, , ww11; ; ee22

ww00, , uu11, , ww11, u, u22; ; ee33

……

Locking sequence for Locking sequence for EXEX-learner -learner

(Blum-Blum) If a learner can learn a (Blum-Blum) If a learner can learn a language language LL, then there is a finite sequence , then there is a finite sequence of sentences in of sentences in LL, called a , called a locking locking sequencesequence for for LL, on which the learner , on which the learner “locks” its correct hypothesis; that is, after “locks” its correct hypothesis; that is, after that sequence the hypothesis does not that sequence the hypothesis does not change.change.

Same hypothesis on Same hypothesis on and ( and (,,) for any ) for any

Angluin criterionAngluin criterionMaximum finite fragment propertyMaximum finite fragment property

Consider a class of Chomsky languages. Consider a class of Chomsky languages. Then the class is Then the class is EXEX-learnable from text -learnable from text exactly when for every language exactly when for every language LL in the in the class, there is a finite fragment class, there is a finite fragment DD of of LL ((DD⊂⊂LL) such that every other possibly ) such that every other possibly bigger fragment bigger fragment UU of of L:L:

DD⊆⊆UU⊂⊂LL

cannot be in the class.cannot be in the class.

ProblemProblem

How can we formally define and study How can we formally define and study certain learning strategies?certain learning strategies?

Constraints on hypotheses: consistency, Constraints on hypotheses: consistency, confidence, reliability, etc.confidence, reliability, etc.

Consistent learningConsistent learning

A learner is A learner is consistentconsistent on a language on a language LL if if at every step, the learner guesses a at every step, the learner guesses a language which includes all the data given language which includes all the data given to the learner up to that point. to the learner up to that point. The class of all finite languages can be The class of all finite languages can be identified consistently. identified consistently. If a languages is consistently If a languages is consistently EXEX-learnable -learnable by an algorithmic learner, then it must be a by an algorithmic learner, then it must be a decidable language.decidable language.

Popperian learning of total Popperian learning of total functionsfunctions

(Total) computable function (Total) computable function f f can be tested can be tested against finite sequences of data given to the against finite sequences of data given to the learner.learner. A learner is A learner is PopperianPopperian on on ff if on any sequence if on any sequence of positive data for of positive data for ff, the learner guesses a , the learner guesses a computable function. computable function. A learner is Popperian if it is Popperian on A learner is Popperian if it is Popperian on every computable function. every computable function. Not every algorithmically Not every algorithmically EXEX-learnable class of -learnable class of functions is Popperian.functions is Popperian.

Confident learningConfident learning

Learner is Learner is confidentconfident when it is guaranteed to when it is guaranteed to converge to some hypothesis, even if it is given converge to some hypothesis, even if it is given text for a language that does not belong to the text for a language that does not belong to the class to be learned.class to be learned.Must be also Must be also accurateaccurate on the languages in the on the languages in the class.class.There is a class that is learnable by an There is a class that is learnable by an algorithmic algorithmic EXEX-learner, and is learnable by -learner, and is learnable by (another) confident (another) confident EXEX-learner, but cannot be -learner, but cannot be learned by an algorithmic and confident learned by an algorithmic and confident EXEX--learner. learner.

Reliable learningReliable learning

Learner is Learner is reliablereliable if it is not allowed to if it is not allowed to converge incorrectly (although might never converge incorrectly (although might never converge on the text for a language not in converge on the text for a language not in the class).the class).

Reliable Reliable EXEX-learnability from text implies -learnability from text implies that every language in the text must be that every language in the text must be finite.finite.

Decisive learningDecisive learning

Learner, once it has put out a revised hypothesis Learner, once it has put out a revised hypothesis for a new language, which replaces an earlier for a new language, which replaces an earlier hypothesized language, never returns to the old hypothesized language, never returns to the old language again.language again.

Decisive Decisive EXEX-learning from text not restrictive for -learning from text not restrictive for general learners, nor for algorithmic learners of general learners, nor for algorithmic learners of computable functions.computable functions.

Decisiveness reduces the power of algorithmic Decisiveness reduces the power of algorithmic learning for languages.learning for languages.

U-shaped learningU-shaped learning(Baliga, Case, Merkle, Stephan, (Baliga, Case, Merkle, Stephan,

and Wiehagen)and Wiehagen)

Variant of non-decisive learningVariant of non-decisive learning

Mimics learning-unlearning-relearning Mimics learning-unlearning-relearning patternpattern

Overregularization in Language Overregularization in Language Acquisition, Acquisition, monograph by Markus, monograph by Markus, Pinker, Ullman, Hollande, Rosen, and Xu.Pinker, Ullman, Hollande, Rosen, and Xu.

ProblemProblem

How can we develop algorithmic learning theory for How can we develop algorithmic learning theory for languages more complicated than Chomsky languages, languages more complicated than Chomsky languages, in particular, ones closer to natural language?in particular, ones closer to natural language?

(Case and Royer)(Case and Royer) Correction grammars: L Correction grammars: L11–– LL22,, where Gwhere G11 is Chomsky (unrestricted, type-0; or context- is Chomsky (unrestricted, type-0; or context-

free, type-2) grammar for generating language free, type-2) grammar for generating language LL11 and G and G22 is the one generating the editing (corrections) is the one generating the editing (corrections) LL22

Burgin: “Grammars with prohibition and human-computer Burgin: “Grammars with prohibition and human-computer interaction,” 2005.interaction,” 2005.Ershov’s difference hierarchy in computability theory for Ershov’s difference hierarchy in computability theory for limit computable languages limit computable languages

ProblemProblem

What is the significance of negative versus What is the significance of negative versus positive information in the learning processpositive information in the learning process??

Learning from switching type of information Learning from switching type of information (Jain and Stephan): (Jain and Stephan):

Learner can request positive or negative Learner can request positive or negative information about information about LL, but when he, after finitely , but when he, after finitely many switches, requests information of the many switches, requests information of the same type, he receives all of it (in the limit)same type, he receives all of it (in the limit)

Harizanov-Stephan’s resultHarizanov-Stephan’s result

Consider a class of Chomsky languages. Consider a class of Chomsky languages. Assume that there is a language Assume that there is a language LL in the family in the family such that for every finite set of sentences such that for every finite set of sentences DD, , there are languages there are languages UU and and U′U′ in the family with in the family with

U U ⊂⊂LL⊂⊂U′ U′ andand

D D ∩ ∩ UU = = D D ∩ ∩ U′U′ Then the family cannot be even Then the family cannot be even BCBC-learned from -learned from

switching.switching.UU approximates approximates LL from below; from below; U′ U′ from abovefrom aboveUU and and U′ U′ coincide on coincide on DD

ProblemProblem

What are good formal frameworks that unify What are good formal frameworks that unify deduction and inductiondeduction and induction??Martin, Sharma and Stephan: use parametric Martin, Sharma and Stephan: use parametric logic (5 parameters: vocabulary, structures, logic (5 parameters: vocabulary, structures, language, data sentences, assumption language, data sentences, assumption sentences) sentences) Model theoretic approach, based on the Model theoretic approach, based on the Tarskian “truth-based” notion of logical Tarskian “truth-based” notion of logical consequence. consequence. The difference between deductive and The difference between deductive and inductive consequences lies in the process of inductive consequences lies in the process of deriving a consequence from the premises. deriving a consequence from the premises.

Deduction vs inductionDeduction vs induction

A sentence A sentence ss is a is a deductivedeductive consequence of a consequence of a theory theory TT if if ss can be inferred from can be inferred from TT with absolute with absolute certainty.certainty.

A sentence A sentence ss is an is an inductiveinductive consequence of a consequence of a theory theory TT if if ss can be correctly (only can be correctly (only hypothetically) inferred from hypothetically) inferred from TT, but can also be , but can also be incorrectly inferred from other theories incorrectly inferred from other theories T′ T′ that that have enough in common with have enough in common with TT to provisionally to provisionally force the inference of force the inference of ss..

formal learning theory michele friend (philosophy) and valentina harizanov (mathematics)

Documents

sentence w

sequence e

e algorithmic enumeration

formal language

learning paradigm language

s b lg

automaton slide

environment slide