know what it knows : a framework for self-aware learningknow what it knows : a framework for...
TRANSCRIPT
![Page 1: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/1.jpg)
Know What it Knows : A Framework for Self-Aware Learning
Lihong Li, Michael L. Littman, Thomas J. Walsh Rutgers UniversityICML-08. 25th International Conference on Machine Learning, Helsinki, Finland, July 2008
Presented by β Praveen Venkateswaran, 2/26/18
Slides and images based on the paper
![Page 2: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/2.jpg)
KWIK Framework
Β΄ The paper introduces a new framework called βKnow What It Knowsβ (KWIK)
Β΄ Utility in settings where active exploration can impact the training examples that the learner is exposed to (e.g.) Reinforcement learning, Active learning
Β΄ Core of many reinforcement learning algorithms have the idea of distinguishing between instances that have been learned with sufficient accuracy vs. those whose outputs are unknown.
![Page 3: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/3.jpg)
KWIK Framework
Β΄ The paper makes explicit some properties that are sufficient for a learning algorithm to be used in efficient exploration algorithms.
Β΄ Describes set of hypothesis classes for which KWIK algorithms can be created.
Β΄ The learning algorithm needs to make only accurate predictions, although it can opt out of predictions by saying βI donβt knowβ (β₯).
Β΄ However, there must be a (polynomial) bound on the number of times the algorithm can respond β₯. The authors call such a learning algorithm KWIK.
![Page 4: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/4.jpg)
Motivation : Example
Β΄ Each edge associated with binary cost vector β dimension d = 3Β΄ Cost of traversing edge = cost vector β fixed weight vector Β΄ Fixed weight vector given to be π = [1,2,0]Β΄ Agent does not know π, but knows topology and all cost vectorsΒ΄ In each episode, agent starts from source and moves along a path to sinkΒ΄ Every time an edge is crossed, agent observes its true cost
SourceSink
Learning Task β Take non-cheapest path in as few episodes as possible!
Β΄ Given w, we see ground truth cost of top path = 12, middle path = 13, and bottom path = 15
![Page 5: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/5.jpg)
Β΄ Simple approach β Agent assumes edge costs are uniform and walks the shortest path (middle)
Β΄ Since agent observes true cost upon traversal, it gathers 4 instances of [1,1,1] β 3 and 1 instance of [1,0,0] β 1
Β΄ Using standard regression, the learned weight vector π" = [1,1,1]Β΄ Using π" , cost of top = 14, middle = 13, bottom = 14
Motivation : Simple Approach
Using these estimates, an agent will take the middle path forever without realizing its not optimal
SourceSink
![Page 6: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/6.jpg)
Β΄ Consider learning algorithm that βknows what it knowsβ
Β΄ Instead of creating π" , it only attempts to reason if edge costs can be obtained from available data
Β΄ After the first episode, it knows the middle pathβs cost is 13
Β΄ Last edge of bottom path has cost vector [0,0,0] which has to be 0, however penultimate edge has cost [0,1,1]
Β΄ Regardless of π, πβ[0,1,1] = πβ([1,1,1] - [1,0,0]) = πβ[1,1,1] - πβ[1,0,0] = 3-1 = 2Β΄ Hence, agent knows that cost of bottom is 14 which is worse than middle
Motivation : KWIK Approach
SourceSink
![Page 7: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/7.jpg)
Β΄ However [0,0,1] on top path is linearly independent of previously seen cost vectors - (i.e.) its cost is unconstrained in the eyes of the agent
Β΄ The agent βknows that it doesnβt knowβ
Β΄ Try out the top path in the second episode. Observes [0,0,1]β0 allowing it to solve for π and accurately predict the cost for any vector
Β΄ Also finds that optimal path is top with high confidence
Motivation : KWIK Approach
SourceSink
![Page 8: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/8.jpg)
Β΄ In general, any algorithm that guesses a weight vector may never find the optimal path.
Β΄ An algorithm that uses linear algebra to distinguish known from unknown costs will either take an optimal route or discover the cost of a linearly independent cost vector on each episode. Thus, it can never choose suboptimal paths more than d times.
Β΄ KWIKβs ideation originated from its use in multi-state sequential decision making problems (like this one)
Β΄ Also action selection in bandit and associative bandit problems (bandit with inputs) can be addressed by choosing the better arm when payoffs are known and an unknown arm otherwise
Β΄ Can also be used in active learning and anomaly detection since both require some degree of reasoning if recently presented input is predictable from previous examples.
Motivation : KWIK Approach
![Page 9: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/9.jpg)
Β΄ Input set X, output set Y. Hypothesis class H consists of a set of functions from X to Y Hβ (X β Y)
Β΄ The target function πβ β H is source of training examples and is unknown to the learner (we assume target function is in the hypothesis class)
KWIK : Formal Definition
Protocol for a run :
Β΄ H and accuracy parameters π and πΉ are known to both the learner and the environment
Β΄ Environment selects πβ adversarially
Β΄ Environment selects input π β X adversarially and informs the learner
Β΄ Learner predicts an output π" β Y βͺ{β₯} where β₯refers to βI donβt knowβ
Β΄ If π" β β₯, it should be accurate (i.e.) |π" - π|β€ π, where π = πβ(π). Otherwise the run is considered a failure.
Β΄ The probability of a failed run must be bounded by πΉ
![Page 10: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/10.jpg)
KWIK : Formal Definition
Β΄ Over a run, the total number of steps on which π" = β₯ must be bounded by B(π, πΉ), ideally polynomial in 1/ π, 1/ πΉ, and parameters defining H.
Β΄ If π" = β₯, the learner makes an observation π β Z, of the output where Β΄ π = π in the deterministic case
Β΄ π = 1 with probability π and π = 0 with probability (1 - π) in the Bernoulli case
Β΄ π = π + πΌ for zero-mean random variable πΌ in the additive noise case
![Page 11: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/11.jpg)
KWIK : Connection to PAC and MB
Β΄ In all 3 frameworks β PAC (Probably Approximately Correct), MB (Mistake-Bound) and KWIK (Know What It Knows) β a series of inputs (instances) is presented to the learner.
Each rectangular
box is an input
Β΄ In the PAC model, the learner is provided with labels (correct outputs) for an initial sequence of inputs, depicted by crossed boxes in figure.
Β΄ Learner is then responsible for producing accurate outputs (empty boxes) for all new inputs where inputs are drawn from a fixed distribution.
![Page 12: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/12.jpg)
KWIK : Connection to PAC and MB
Correct
Β΄ In the MB model, the learner is expected to produce an output for every input.
Β΄ Labels are provided to the learner whenever it makes a mistake (black boxes)
Β΄ Inputs selected adversarially so there is no bound on when last mistake may be made
Β΄ MB algorithms guarantee that the total number of mistakes made is small (i.e.) ratio of incorrect to correct β 0 asymptotically
Incorrect
![Page 13: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/13.jpg)
KWIK : Connection to PAC and MB
Β΄ KWIK model has elements of both PAC and MB. Like PAC, KWIK algorithms are not allowed to make mistakes. Like MB, inputs to a KWIK algorithm are selected adversarially.
Β΄ Instead of bounding mistakes, a KWIK algorithm must have a bound on the number of label requests (β₯) it can make.
Β΄ A KWIK algorithm can be turned into an MB algorithm with the same bound by having the algorithm guess an output each time its not certain.
![Page 14: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/14.jpg)
Examples of KWIK algorithms
Β΄ The algorithm keeps a mapping π2 initialized to π2(π) = β₯ β π β X
Β΄ The algorithm reports π2(π) for every input π chosen by the environment
Β΄ If π2(π) = β₯, environment reports correct label π (noise-free) and algorithm reassigns π2(π) = π
Β΄ Since it only reports β₯ once for each input, the KWIK bound is |X|
Memorization Algorithm - The memorization algorithm can learn any hypothesis class with input space X with a KWIK bound of |X|. This algorithm can be used when the input space X is finite and observations are noise free.
![Page 15: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/15.jpg)
Examples of KWIK algorithms
Β΄ The algorithm initializes π―2 = H. Β΄ For every input π, it computes π³6 = {π(π) | π β π―2 } (i.e.) set of all outputs for all
hypotheses that have not yet been ruled out.
Β΄ If |π³6| = 0 , version space has been exhausted and πβ β H
Β΄ If |π³6| = 1, all hypotheses in π―2 agree on the output.
Β΄ If |π³6| > 1, two hypotheses disagree, β₯ is returned and the true label π is received. The version space is then updated such that π―2 β= {π |π β π―2 β§ π(π) = π}
Β΄ Hence the new version space reduces (i.e.) |π―2 β| β€ |π―2 β|-1Β΄ If |π―2| = 1, then |π³6| = 1 and the algorithm will no longer return β₯. Hence |H|β1 is
the bound of β₯βs that can be returned.
Enumeration Algorithm - The enumeration algorithm can learn any hypothesis class H with a KWIK bound of |H|β1. This algorithm can be used when the hypothesis class H is finite and observations are noise free.
![Page 16: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/16.jpg)
Illustrative Example
Β΄ Input space X = ππ· and output space Y = {trouble, no trouble}Β΄ Memorization algorithm achieves bound of ππ since it may have to see each
possible subset of people (where π = |P|)
Β΄ Enumeration algorithm can achieve a bound of π(π β π) since there is a hypothesis for each possible assignment of a and b.
Β΄ Each time the algorithm reports β₯, it can rule out one possible combination
Given a group of people P where a β P is a troublemaker and b β P is a nice person who can control a. For any subset of people G β P, figure out if there will be trouble if you do not know the identities of a and b.
![Page 17: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/17.jpg)
Examples of KWIK algorithmsKWIK bounds can also be achieved when the set of hypothesis class and input space are infinite
If there is an unknown point and a given target function that maps input points to their Euclidean distance (|π β π|A) from that unknown point, the planar distance algorithm can learn in this hypothesis class with a KWIK bound of 3.
![Page 18: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/18.jpg)
Examples of KWIK algorithms
Β΄ Given initial input π, the algorithm returns β₯ and receives output π. Hence, it means that the unknown point must lie on the circle depicted in Figure (a)
Β΄ For a future new input, the algorithm again returns β₯ and similarly finds a second circle (Fig (b)). If the circles intersect at one point, it becomes trivial. Else there will be 2 potential locations.
Β΄ Next, for any new input that is not collinear with the first 2 inputs, the algorithm returns β₯ which will then allow it to identify the location of the unknown point.
Β΄ Generally, a d-dimensional version of this problem will have a KWIK bound of d+1
![Page 19: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/19.jpg)
Examples of KWIK algorithmsWhat if the observations are not noise-free?
Source : Twitter
The coin learning algorithm can accurately predict the probability that a biased coin will come up heads given Bernoulli observations with a KWIK bound of B(π, πΉ) = π
πππln π
πΉ= O( π
ππln π
πΉ)
Β΄ The unknown probability of heads is π and we want to estimate π" with high accuracy (|π" - π| β€ π) and high probability (1- πΉ)
Β΄ Since observations are noisy, we see either 1 with probability π or 0 with probability (1- π).
Β΄ Each time the algorithm returns β₯, it gets an independent trial that it can use to compute π" = π
π»β πππ»πFπ where ππ β {0,1} is the π‘βth
observation in π trials.
Β΄ The number of trials needed to be (1- πΉ) certain the estimate is within πcan be computed using a Hoeffding bound.
![Page 20: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/20.jpg)
Combining KWIK LearnersCan KWIK learners be combined to provide learning guarantees for more
complex hypothesis classes?
Let F : X β Y and π―π, β¦, π―π be the learnable hypothesis classes with bounds of π©π(π, πΉ), β¦, π©π(π, πΉ) where π―π β F β 1β€ i β€ π (i.e.) hypothesis classes share the same input/output sets. The union algorithm can learn the joint hypothesis class H = βͺπ π―π with a KWIK bound of π©(π, πΉ) = (1-π) + β π©π(π, πΉ)οΏ½
π
Β΄ The union algorithm maintains a set of active algorithms π¨2, one for each hypothesis class.
Β΄ Given an input π, the algorithm queries each π β π¨2, to obtain the prediction ππ"from each active sub-algorithm and adds it to the set π³6.
Β΄ If β β₯ β π³6, it returns β₯ and obtains correct output π and sends it to any sub-algorithm π for which ππ" = β₯ to allow it to learn.
Β΄ If |π³6| > 1, then there is a disagreement among the sub-algorithms and again returns β₯ and removes any sub-algorithm that made a prediction other than πor β₯.
Β΄ So for each input that returns β₯, either a sub-algorithm reported β₯ (maximum β π©π(π, πΉ)οΏ½π times) or two sub-algorithms disagreed and at least one was removed
(at most (π-1) times) thus resulting in the bound.
![Page 21: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/21.jpg)
Combining KWIK Learners : ExampleLet X = Y = π½. Define π―π consisting of functions mapping distance of input π from an unknown point (|π β π|). π―π can be learned with a bound of 2 using a 1-D version of the planar algorithm. Let π―π be the set of lines {f |f(π) =ππ + π } which can also be learnt with a bound of 2 using the regression algorithm. We need to learn H = π―π βͺ π―π
Β΄ Let the first input be ππ= 2. The union algorithm asks π―π and π―π for the output and both return β₯. And say it receives the output ππ= 2
Β΄ For the next input ππ= 8, π―π and π―π still return β₯ and receive ππ= 4 Β΄ Then, for the third input ππ= 1, π―π returns 4 since (|2-4|= 2 and |8-4|= 4) and π―π
having computed π = 1/3 and π = 4/3 returns 5/3.
Β΄ Since π―π and π―π disagree, the algorithm returns β₯ and then finds out ππ= 3, and then makes all future predictions accurately.
![Page 22: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/22.jpg)
Combining KWIK LearnersWhat about disjoint input spaces (i.e.) πΏπ β© πΏπ = β if π β π
Let π―π, β¦, π―π be the learnable hypothesis classes with bounds of π©π(π, πΉ), β¦, π©π(π, πΉ)where π―π β (πΏπ β Y). The input-partition algorithm can learn the hypothesis class H = πΏπ βͺ β―βͺ πΏπ β Y with a KWIK bound of π©(π, πΉ) = β π©π(π, πΉ/π)οΏ½
π
Β΄ The input-partition algorithm runs the learning algorithm for each subclass π―π
Β΄ When it receives an input π β πΏπ, it returns the response from π―π which is πaccurate.
Β΄ To achieve (1- πΉ) certainty, it insists on (1- πΉ/π) certainty from each of the sub-algorithms. By the union bound, the overall failure probability must be less than the sum of the failure probabilities for the sub-algorithms
![Page 23: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/23.jpg)
Disjoint Spaces Example
Β΄ An MDP consists of π states and πactions. For each combination of state and action and next state, the transition function returns a probability.
Β΄ As the reinforcement-learning agent moves around in the state space, it observes stateβactionβstate transitions and must predict the probabilities for transitions it has not yet observed.
Β΄ In the model-based setting, an algorithm learns a mapping from the size πππinput space of stateβactionβstate combinations to probabilities via Bernoulli observations.
Β΄ Thus, the problem can be solved via the input-partition algorithm over a set of individual probabilities learned via the coin-learning algorithm. The resulting KWIK bound is B(π, πΉ) = O(π
ππππ
ln πππΉ
) .
![Page 24: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th](https://reader034.vdocuments.us/reader034/viewer/2022052003/6015fbb82df58c25c3194212/html5/thumbnails/24.jpg)
Conclusion
Β΄ The paper describes the KWIK (βKnow What It Knowsβ) model of supervised learning and identifies and generalizes key steps for efficient exploration.
Β΄ The authors provide algorithms for some basic hypothesis classes for both deterministic and noisy observations.
Β΄ They also describe methods for composing hypothesis classes to create more complex algorithms.
Β΄ Open Problem β How do you adapt the KWIK framework when the target hypothesis can be chosen from outside the hypothesis class H.