theory revision chris murphy. the problem sometimes we: – have theories for existing data that do...

Theory Revision

Chris Murphy

The Problem

• Sometimes we:– Have theories for existing data that do not match

new data– Do not want to repeat learning every time we

update data– Believe that our rule learners could perform much

better if given basic theories to build off of

Two Types of Errors in Theories

• Over-generalization– Theory covers negative examples– Caused by incorrect rules in theory or by existing

rules missing necessary constraints– Example:uncle(A,B) :- brother(A,C).– Solution:uncle(A,B) :- brother(A,C), parent(C,B).

Two Types of Errors in Theories

• Over-specialization– Theory does not cover all positive examples– Caused by rules having additional, unnecessary

constraints or missing rules in the theory that are necessary to proving some examples

– Example:uncle(A,B) :- brother(A,C), mother(C,B).– Solution:Uncle(A,B) :- brother(A,C), parent(C,B).

What is Theory Refinement?

• “…learning systems that have a goal of making small changes to an original theory to account for new data.”

• Combination of two processes:– Using a background theory to improve rule

effectiveness and adequacy on data– Using problem detection and correction processes

to make small adjustments to said theories

Basic Issues Addressed

• Is there an error in the existing theory?• What part of the theory is incorrect?• What correction needs to be made?

Theory Refinement Basics

• System is given a beginning theory about domain– Can be incorrect or incomplete (and often is)

• Well refined theory will:– Be accurate with new/updated data– Make as few changes as possible to original theory– Changes are monitored by a “Distance Metric”

that keeps a count of every change made

The Distance Metric

• Adds every addition, deletion, or replacement of clauses

• Used to:– Measure syntactical corruptness

of original theory– Determine how good a learning

system is at replicating human created theories

• Drawback is that it does not recognize equivalent literals such as less(X,Y). And greq(Y,X).

• Table on the right shows examples of distance between theories, as well as its relationship to accuracy

Why Preserve the Original Theory?

• If you understood the original theory, you’ll likely understand the new one

• Similar theories will likely retain the ability to use abstract predicates from the original theory

Theory Refinement Systems

• EITHER• FORTE• AUDREY II• KBANN• FOCL, KR-FOCL, A-EBL, AUDREY, and more

EITHER• Explanation-based and Inductive Theory Extension and Revision• First system with ability to fix over-generalizing and over-

specialization• Able to correct multiple faults• Uses one or more failings at a time to learn one or more

corrections to a theory• Able to correct intermediate points in theories• Uses positive and negative examples • Able to learn disjunctive rules• Specialization algorithm does not allow positives to be

eliminated• Generalization algorithm does not allow negatives to be

admitted

FORTE

• Attempts to prove all positive and negative examples using the current theory

• When errors are detected:– Identify all clauses that are candidates for revision– Determine whether clause needs to be specialized or generalized– Determine what operators to test for various revisions

• Best revision is determined based on its accuracy when tested on complete training set

• Process repeats until system perfectly classifies the training set or until FORTE finds that no revisions improve the accuracy of the theory

Specializing a Theory

• Needs to happen when one or more negatives are covered

• Ways to fix the problem:– Delete a clause: simple, just delete and retest– Add new antecedents to existing clause

• More difficult• FORTE uses two methods...

– Add one antecedent at a time, like FOIL, choosing the antecedent that provides the best info gain at any point

– Relational Pathfinding – uses graph structures to find new relations in data

Generalizing a Theory

• Need to generalize when positives are not covered• Ways FORTE generalizes:

– Delete antecedents from an existing clause (either singly or in groups)

– Add a new clause• Copy clause identified at the revision point• Purposely over-generalize• Send over-general rule to specialization algorithm

– Use inverse relation operators “identification” and “absorption”• These use intermediate rules to provide more options for

alternative definitions

AUDREY II

• Runs in two main phases:– Initial domain theory is specialized to eliminate

negative coverage• At each step, a best clause is chosen, it is specialized,

and the process repeats• Best clause is the one that contributes the most

negative examples being incorrectly classified and is required by the fewest number of positives• If best clause covers no positives, it is deleted,

otherwise, literals are added in a FOIL-like manner to eliminate covered negatives

AUDREY II– Revised theory is generalized to cover all positives

(without covering any negatives)• Uncovered positive example is randomly chosen, and

theory is generalized to cover the example• Process repeats until all remaining positives are covered• If assumed literals can be removed without decreasing

positive coverage, that is done• If not, AUDREY II tries replacing literals with new

conjuction of literals (also uses FOIL-type process)• If deleting and replacement fail, system uses a FOIL-like

method of determining entirely new clauses for proving the literal

KBANN• System that takes a domain theory of Prolog style clauses, and

transforms it into knowledge-based neural network (KNN)– Uses the knowledge base (background theory) to determine topology and

initial weights of KNN• Different units and links within KNN correspond to various

components of the domain theory• Topologies of KNNs can be different than topologies that we have

seen in neural networks

KBANN

• KNNs are trained on example data, and rules are extracted using an N of M method (saves time)

• Domain theories for KBANN need not contain all intermediate theories necessary to learn certain concepts– Adding hidden units along with units specified by the domain

theory allows the network to induce necessary terms not stated in background info

• Problems arise when interpreting intermediate rules learned from hidden nodes– Difficult to label them based on the inputs they resulted from– In one case, programmers labeled rules based on the section of

info that they were attached to in that topology

System Comparison

• AUDREY II is better than FOCL at theory revision, but it still has room for improvement– Its revised theories are

closer to both original theory and human-created correct theory

System Comparison

• AUDREY II is slightly more accurate than FORTE, and its revised theories are closer to the original and correct theories

• KR-FOCL addresses some issues of other systems by allowing user to decide among changes that have the same accuracy

Applications of Theory Refinement

• Used to identify different parts of both DNA and RNA sequences

• Used to debug student written basic Prolog programs

• Used to maintain working theories as new data is obtained

theory revision chris murphy. the problem sometimes we: – have theories for existing data that do...

Documents