theory revision
DESCRIPTION
Theory Revision. Chris Murphy. The Problem. Sometimes we: Have theories for existing data that do not match new data Do not want to repeat learning every time we update data Believe that our rule learners could perform much better if given basic theories to build off of. - PowerPoint PPT PresentationTRANSCRIPT
Theory Revision
Chris Murphy
The Problem
• Sometimes we:– Have theories for existing data that do not match
new data– Do not want to repeat learning every time we
update data– Believe that our rule learners could perform much
better if given basic theories to build off of
Two Types of Errors in Theories
• Over-generalization– Theory covers negative examples– Caused by incorrect rules in theory or by existing
rules missing necessary constraints– Example:uncle(A,B) :- brother(A,C).– Solution:uncle(A,B) :- brother(A,C), parent(C,B).
Two Types of Errors in Theories
• Over-specialization– Theory does not cover all positive examples– Caused by rules having additional, unnecessary
constraints or missing rules in the theory that are necessary to proving some examples
– Example:uncle(A,B) :- brother(A,C), mother(C,B).– Solution:Uncle(A,B) :- brother(A,C), parent(C,B).
What is Theory Refinement?
• “…learning systems that have a goal of making small changes to an original theory to account for new data.”
• Combination of two processes:– Using a background theory to improve rule
effectiveness and adequacy on data– Using problem detection and correction processes
to make small adjustments to said theories
Basic Issues Addressed
• Is there an error in the existing theory?• What part of the theory is incorrect?• What correction needs to be made?
Theory Refinement Basics
• System is given a beginning theory about domain– Can be incorrect or incomplete (and often is)
• Well refined theory will:– Be accurate with new/updated data– Make as few changes as possible to original theory– Changes are monitored by a “Distance Metric”
that keeps a count of every change made
The Distance Metric
• Adds every addition, deletion, or replacement of clauses
• Used to:– Measure syntactical corruptness of
original theory– Determine how good a learning
system is at replicating human created theories
• Drawback is that it does not recognize equivalent literals such as less(X,Y). And greq(Y,X).
• Table on the right shows examples of distance between theories, as well as its relationship to accuracy
Why Preserve the Original Theory?
• If you understood the original theory, you’ll likely understand the new one
• Similar theories will likely retain the ability to use abstract predicates from the original theory
Theory Refinement Systems
• EITHER• FORTE• AUDREY II• KBANN• FOCL, KR-FOCL, A-EBL, AUDREY, and more
EITHER• Explanation-based and Inductive Theory Extension and Revision• First system with ability to fix over-generalizing and over-
specialization• Able to correct multiple faults• Uses one or more failings at a time to learn one or more
corrections to a theory• Able to correct intermediate points in theories• Uses positive and negative examples • Able to learn disjunctive rules• Specialization algorithm does not allow positives to be eliminated• Generalization algorithm does not allow negatives to be admitted
FORTE• Attempts to prove all positive and negative examples using
the current theory• When errors are detected:
– Identify all clauses that are candidates for revision– Determine whether clause needs to be specialized or generalized– Determine what operators to test for various revisions
• Best revision is determined based on its accuracy when tested on complete training set
• Process repeats until system perfectly classifies the training set or until FORTE finds that no revisions improve the accuracy of the theory
Specializing a Theory
• Needs to happen when one or more negatives are covered
• Ways to fix the problem:– Delete a clause: simple, just delete and retest– Add new antecedents to existing clause
• More difficult• FORTE uses two methods...
– Add one antecedent at a time, like FOIL, choosing the antecedent that provides the best info gain at any point
– Relational Pathfinding – uses graph structures to find new relations in data
Generalizing a Theory• Need to generalize when positives are not covered• Ways FORTE generalizes:
– Delete antecedents from an existing clause (either singly or in groups)
– Add a new clause• Copy clause identified at the revision point• Purposely over-generalize• Send over-general rule to specialization algorithm
– Use inverse relation operators “identification” and “absorption”• These use intermediate rules to provide more options for alternative
definitions
AUDREY II
• Runs in two main phases:– Initial domain theory is specialized to eliminate
negative coverage• At each step, a best clause is chosen, it is specialized, and
the process repeats• Best clause is the one that contributes the most negative
examples being incorrectly classified and is required by the fewest number of positives• If best clause covers no positives, it is deleted,
otherwise, literals are added in a FOIL-like manner to eliminate covered negatives
AUDREY II– Revised theory is generalized to cover all positives
(without covering any negatives)• Uncovered positive example is randomly chosen, and
theory is generalized to cover the example• Process repeats until all remaining positives are covered• If assumed literals can be removed without decreasing
positive coverage, that is done• If not, AUDREY II tries replacing literals with new
conjuction of literals (also uses FOIL-type process)• If deleting and replacement fail, system uses a FOIL-like
method of determining entirely new clauses for proving the literal
KBANN• System that takes a domain theory of Prolog style clauses, and
transforms it into knowledge-based neural network (KNN)– Uses the knowledge base (background theory) to determine topology and
initial weights of KNN• Different units and links within KNN correspond to various components
of the domain theory• Topologies of KNNs can be different than topologies that we have seen
in neural networks
KBANN• KNNs are trained on example data, and rules are extracted
using an N of M method (saves time)• Domain theories for KBANN need not contain all intermediate
theories necessary to learn certain concepts– Adding hidden units along with units specified by the domain theory
allows the network to induce necessary terms not stated in background info
• Problems arise when interpreting intermediate rules learned from hidden nodes– Difficult to label them based on the inputs they resulted from– In one case, programmers labeled rules based on the section of info
that they were attached to in that topology
System Comparison
• AUDREY II is better than FOCL at theory revision, but it still has room for improvement– Its revised theories are
closer to both original theory and human-created correct theory
System Comparison
• AUDREY II is slightly more accurate than FORTE, and its revised theories are closer to the original and correct theories
• KR-FOCL addresses some issues of other systems by allowing user to decide among changes that have the same accuracy
Applications of Theory Refinement
• Used to identify different parts of both DNA and RNA sequences
• Used to debug student written basic Prolog programs
• Used to maintain working theories as new data is obtained