vingean reflection: reliable reasoning for self … reasoning for self-improving agents benja...
TRANSCRIPT
![Page 1: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/1.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Vingean reflection:Reliable reasoning for self-improving agents
Benja Fallenstein
Machine Intelligence Research Institute
May 16, 2015
Benja Fallenstein Vingean reflection
![Page 2: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/2.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 3: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/3.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interests
But how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 4: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/4.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 5: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/5.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 6: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/6.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 7: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/7.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 8: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/8.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 9: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/9.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 10: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/10.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 11: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/11.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Motivation
Smarter-than-human intelligence isn’t around the corner
but it’ll (probably) be developed eventually.
Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?
How do we make sure system actually pursues them?
How do we correct the system if we get it wrong?
Want solid theoretical understanding of problem & solution
What is correct reasoning and decision making?
Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .
. . . go in the right direction, but are not enough.
Need for foundational research—which can be done today.
Benja Fallenstein Vingean reflection
![Page 12: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/12.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Vingean reflection
Can we create a self-modifying system. . .
. . . that goes through a billion modifications. . .
. . . without ever going wrong?
Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!
Need to use abstract reasoning
Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are goodStandard decision theory doesn’t model this
Formal logic as a model of abstract reasoning
Benja Fallenstein Vingean reflection
![Page 13: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/13.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Vingean reflection
Can we create a self-modifying system. . .
. . . that goes through a billion modifications. . .
. . . without ever going wrong?
Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!
Need to use abstract reasoning
Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are goodStandard decision theory doesn’t model this
Formal logic as a model of abstract reasoning
Benja Fallenstein Vingean reflection
![Page 14: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/14.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Vingean reflection
Can we create a self-modifying system. . .
. . . that goes through a billion modifications. . .
. . . without ever going wrong?
Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!
Need to use abstract reasoning
Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are good
Standard decision theory doesn’t model this
Formal logic as a model of abstract reasoning
Benja Fallenstein Vingean reflection
![Page 15: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/15.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Vingean reflection
Can we create a self-modifying system. . .
. . . that goes through a billion modifications. . .
. . . without ever going wrong?
Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!
Need to use abstract reasoning
Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are goodStandard decision theory doesn’t model this
Formal logic as a model of abstract reasoning
Benja Fallenstein Vingean reflection
![Page 16: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/16.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Vingean reflection
Can we create a self-modifying system. . .
. . . that goes through a billion modifications. . .
. . . without ever going wrong?
Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!
Need to use abstract reasoning
Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are goodStandard decision theory doesn’t model this
Formal logic as a model of abstract reasoning
Benja Fallenstein Vingean reflection
![Page 17: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/17.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
1 The “procrastination paradox”
2 A formal toy model
3 Partial solutions
4 Logical uncertainty
5 Conclusions
Benja Fallenstein Vingean reflection
![Page 18: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/18.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The “procrastination paradox”
Agent in a deterministic, known world; discrete timesteps.
In each timestep, the agent chooses whether to press a button:
If pressed in 1st round: Utility = 1/2If pressed in 2nd round (and not before): Utility = 2/3If pressed in 3rd round (and not before): Utility = 3/4. . .
If never pressed: Utility = 0
(No optimal strategy, but sure can beat 0!)
The agent is programmed to press the button immediately. . .
. . . unless it finds a “good argument” that the button will getpressed later.
Benja Fallenstein Vingean reflection
![Page 19: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/19.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The “procrastination paradox”
Agent in a deterministic, known world; discrete timesteps.
In each timestep, the agent chooses whether to press a button:
If pressed in 1st round: Utility = 1/2If pressed in 2nd round (and not before): Utility = 2/3If pressed in 3rd round (and not before): Utility = 3/4. . .If never pressed: Utility = 0
(No optimal strategy, but sure can beat 0!)
The agent is programmed to press the button immediately. . .
. . . unless it finds a “good argument” that the button will getpressed later.
Benja Fallenstein Vingean reflection
![Page 20: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/20.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The “procrastination paradox”
Agent in a deterministic, known world; discrete timesteps.
In each timestep, the agent chooses whether to press a button:
If pressed in 1st round: Utility = 1/2If pressed in 2nd round (and not before): Utility = 2/3If pressed in 3rd round (and not before): Utility = 3/4. . .If never pressed: Utility = 0
(No optimal strategy, but sure can beat 0!)
The agent is programmed to press the button immediately. . .
. . . unless it finds a “good argument” that the button will getpressed later.
Benja Fallenstein Vingean reflection
![Page 21: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/21.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The “procrastination paradox”
Agent in a deterministic, known world; discrete timesteps.
In each timestep, the agent chooses whether to press a button:
If pressed in 1st round: Utility = 1/2If pressed in 2nd round (and not before): Utility = 2/3If pressed in 3rd round (and not before): Utility = 3/4. . .If never pressed: Utility = 0
(No optimal strategy, but sure can beat 0!)
The agent is programmed to press the button immediately. . .
. . . unless it finds a “good argument” that the button will getpressed later.
Benja Fallenstein Vingean reflection
![Page 22: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/22.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The agent reasons:
Suppose I don’t press the button now.
Either I press the button in the next step, or I don’t.
If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.
So the agent can always find a “good argument” that the buttonwill get pressed later. . .
. . . and therefore never presses the button!
If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).
Benja Fallenstein Vingean reflection
![Page 23: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/23.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The agent reasons:
Suppose I don’t press the button now.
Either I press the button in the next step, or I don’t.
If I do, the button gets pressed, good.
If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.
So the agent can always find a “good argument” that the buttonwill get pressed later. . .
. . . and therefore never presses the button!
If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).
Benja Fallenstein Vingean reflection
![Page 24: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/24.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The agent reasons:
Suppose I don’t press the button now.
Either I press the button in the next step, or I don’t.
If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!
Either way, the button gets pressed.
So the agent can always find a “good argument” that the buttonwill get pressed later. . .
. . . and therefore never presses the button!
If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).
Benja Fallenstein Vingean reflection
![Page 25: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/25.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The agent reasons:
Suppose I don’t press the button now.
Either I press the button in the next step, or I don’t.
If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.
So the agent can always find a “good argument” that the buttonwill get pressed later. . .
. . . and therefore never presses the button!
If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).
Benja Fallenstein Vingean reflection
![Page 26: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/26.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The agent reasons:
Suppose I don’t press the button now.
Either I press the button in the next step, or I don’t.
If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.
So the agent can always find a “good argument” that the buttonwill get pressed later. . .
. . . and therefore never presses the button!
If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).
Benja Fallenstein Vingean reflection
![Page 27: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/27.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
The agent reasons:
Suppose I don’t press the button now.
Either I press the button in the next step, or I don’t.
If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.
So the agent can always find a “good argument” that the buttonwill get pressed later. . .
. . . and therefore never presses the button!
If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).
Benja Fallenstein Vingean reflection
![Page 28: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/28.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
So what went wrong? (And how do we fix it?)
The paradox doesn’t go through with finite time horizons—
—or with temporal discounting:Utility =
∑∞t=0 γt · Rt , where
∑∞t=0 γt <∞ and Rt ∈ [0, 1].
Does using temporal discounting fix all such problems?
In our toy model:No, not by itself.
Still get (more technical) paradoxes of self-reference.
But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.
(Suggests this is key to avoiding the problem.)
Benja Fallenstein Vingean reflection
![Page 29: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/29.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
So what went wrong? (And how do we fix it?)
The paradox doesn’t go through with finite time horizons—
—or with temporal discounting:Utility =
∑∞t=0 γt · Rt , where
∑∞t=0 γt <∞ and Rt ∈ [0, 1].
Does using temporal discounting fix all such problems?
In our toy model:No, not by itself.
Still get (more technical) paradoxes of self-reference.
But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.
(Suggests this is key to avoiding the problem.)
Benja Fallenstein Vingean reflection
![Page 30: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/30.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
So what went wrong? (And how do we fix it?)
The paradox doesn’t go through with finite time horizons—
—or with temporal discounting:Utility =
∑∞t=0 γt · Rt , where
∑∞t=0 γt <∞ and Rt ∈ [0, 1].
Does using temporal discounting fix all such problems?
In our toy model:No, not by itself.
Still get (more technical) paradoxes of self-reference.
But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.
(Suggests this is key to avoiding the problem.)
Benja Fallenstein Vingean reflection
![Page 31: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/31.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
So what went wrong? (And how do we fix it?)
The paradox doesn’t go through with finite time horizons—
—or with temporal discounting:Utility =
∑∞t=0 γt · Rt , where
∑∞t=0 γt <∞ and Rt ∈ [0, 1].
Does using temporal discounting fix all such problems?
In our toy model:No, not by itself.
Still get (more technical) paradoxes of self-reference.
But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.
(Suggests this is key to avoiding the problem.)
Benja Fallenstein Vingean reflection
![Page 32: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/32.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
So what went wrong? (And how do we fix it?)
The paradox doesn’t go through with finite time horizons—
—or with temporal discounting:Utility =
∑∞t=0 γt · Rt , where
∑∞t=0 γt <∞ and Rt ∈ [0, 1].
Does using temporal discounting fix all such problems?
In our toy model:No, not by itself.
Still get (more technical) paradoxes of self-reference.
But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.
(Suggests this is key to avoiding the problem.)
Benja Fallenstein Vingean reflection
![Page 33: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/33.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
1 The “procrastination paradox”
2 A formal toy model
3 Partial solutions
4 Logical uncertainty
5 Conclusions
Benja Fallenstein Vingean reflection
![Page 34: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/34.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
For our toy model, use formal logic.
But not because we think realistic smarter-than-human agentswork like this.
The problem seems to be much more general.Any scheme for highly reliable self-referential reasoningwill need to deal with it somehow.
Rather: because we can prove theorems about it—
and then see what this tells us about the real problem.
Benja Fallenstein Vingean reflection
![Page 35: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/35.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
For our toy model, use formal logic.
But not because we think realistic smarter-than-human agentswork like this.
The problem seems to be much more general.Any scheme for highly reliable self-referential reasoningwill need to deal with it somehow.
Rather: because we can prove theorems about it—
and then see what this tells us about the real problem.
Benja Fallenstein Vingean reflection
![Page 36: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/36.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Write P(n) for “the button is pressed in the nth timestep”.
Define computable function f (n):
f (n) searches for proofs
in Peano Arithmetic (PA)of length ≤ 10100+n
of “∃k > n. P(k)” — i.e., “button pressed later”.
If proof found =⇒ returns 0 (“don’t press button”).Else =⇒ returns 1 (“press button”).
PA ` P(n) ↔ [f (n) = 1].
(Self-referential definition by Kleene’s second recursion thm.)
Benja Fallenstein Vingean reflection
![Page 37: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/37.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
By looking at f (n + 1), can prove (in � 10100+n symbols):
“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)
“If button not pressed in next step, must have found proof itwill be pressed later”:1
PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q
(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)
“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)
Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.
=⇒ So PA 0 �PApϕq → ϕ.
1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection
![Page 38: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/38.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
By looking at f (n + 1), can prove (in � 10100+n symbols):
“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)
“If button not pressed in next step, must have found proof itwill be pressed later”:1
PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q
(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)
“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)
Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.
=⇒ So PA 0 �PApϕq → ϕ.
1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection
![Page 39: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/39.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
By looking at f (n + 1), can prove (in � 10100+n symbols):
“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)
“If button not pressed in next step, must have found proof itwill be pressed later”:1
PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q
(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)
“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)
Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.
=⇒ So PA 0 �PApϕq → ϕ.
1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection
![Page 40: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/40.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
By looking at f (n + 1), can prove (in � 10100+n symbols):
“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)
“If button not pressed in next step, must have found proof itwill be pressed later”:1
PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q
(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)
“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)
Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.
=⇒ So PA 0 �PApϕq → ϕ.
1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection
![Page 41: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/41.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
By looking at f (n + 1), can prove (in � 10100+n symbols):
“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)
“If button not pressed in next step, must have found proof itwill be pressed later”:1
PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q
(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)
“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)
Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.
=⇒ So PA 0 �PApϕq → ϕ.
1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection
![Page 42: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/42.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
By looking at f (n + 1), can prove (in � 10100+n symbols):
“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)
“If button not pressed in next step, must have found proof itwill be pressed later”:1
PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q
(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)
“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)
Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.
=⇒ So PA 0 �PApϕq → ϕ.
1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection
![Page 43: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/43.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
By looking at f (n + 1), can prove (in � 10100+n symbols):
“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)
“If button not pressed in next step, must have found proof itwill be pressed later”:1
PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q
(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)
“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)
Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.
=⇒ So PA 0 �PApϕq → ϕ.
1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection
![Page 44: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/44.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
PA avoids the paradox since PA 0 �PApϕq → ϕ.
→ Generalize this beyond our logic-based toy example?
Why do we think our agent will work correctly?
We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”
An agent should be able to use the same argument whenreasoning about rewriting itself!
Need something like T ` �Tpϕq → ϕ. . .
Godel/Lob: But that’s inconsistent, finite time horizons or not!
Benja Fallenstein Vingean reflection
![Page 45: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/45.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
PA avoids the paradox since PA 0 �PApϕq → ϕ.
→ Generalize this beyond our logic-based toy example?
Why do we think our agent will work correctly?
We reason: “It will take only actions if it has very good reasonto believe these actions will be safe —
therefore, any actions itwill take will be almost certainly safe.”
An agent should be able to use the same argument whenreasoning about rewriting itself!
Need something like T ` �Tpϕq → ϕ. . .
Godel/Lob: But that’s inconsistent, finite time horizons or not!
Benja Fallenstein Vingean reflection
![Page 46: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/46.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
PA avoids the paradox since PA 0 �PApϕq → ϕ.
→ Generalize this beyond our logic-based toy example?
Why do we think our agent will work correctly?
We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”
An agent should be able to use the same argument whenreasoning about rewriting itself!
Need something like T ` �Tpϕq → ϕ. . .
Godel/Lob: But that’s inconsistent, finite time horizons or not!
Benja Fallenstein Vingean reflection
![Page 47: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/47.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
PA avoids the paradox since PA 0 �PApϕq → ϕ.
→ Generalize this beyond our logic-based toy example?
Why do we think our agent will work correctly?
We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”
An agent should be able to use the same argument whenreasoning about rewriting itself!
Need something like T ` �Tpϕq → ϕ. . .
Godel/Lob: But that’s inconsistent, finite time horizons or not!
Benja Fallenstein Vingean reflection
![Page 48: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/48.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
PA avoids the paradox since PA 0 �PApϕq → ϕ.
→ Generalize this beyond our logic-based toy example?
Why do we think our agent will work correctly?
We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”
An agent should be able to use the same argument whenreasoning about rewriting itself!
Need something like T ` �Tpϕq → ϕ. . .
Godel/Lob: But that’s inconsistent, finite time horizons or not!
Benja Fallenstein Vingean reflection
![Page 49: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/49.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
PA avoids the paradox since PA 0 �PApϕq → ϕ.
→ Generalize this beyond our logic-based toy example?
Why do we think our agent will work correctly?
We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”
An agent should be able to use the same argument whenreasoning about rewriting itself!
Need something like T ` �Tpϕq → ϕ. . .
Godel/Lob: But that’s inconsistent, finite time horizons or not!
Benja Fallenstein Vingean reflection
![Page 50: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/50.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
1 The “procrastination paradox”
2 A formal toy model
3 Partial solutions
4 Logical uncertainty
5 Conclusions
Benja Fallenstein Vingean reflection
![Page 51: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/51.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Partial solutions
1 Can have theories T0,T1,T2, . . . s.t. Tn+1 ` �Tnpϕq→ ϕ.
Agent using Tn+1 can rewrite into agent using Tn.Stops working when we reach T0.Works for finite time horizons.
2 Can have theories s.t. Tn ` �Tn+1pϕq→ ϕ for all ϕ ∈ Π1.
Agent using Tn can rewrite into agent using Tn+1.Can rewrite forever!
(But: Agent doesn’t know this! :-()
Works with temporal discounting (Fallenstein & Soares, 2014).
Do these approaches generalize beyond formal logic?
Benja Fallenstein Vingean reflection
![Page 52: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/52.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Partial solutions
1 Can have theories T0,T1,T2, . . . s.t. Tn+1 ` �Tnpϕq→ ϕ.
Agent using Tn+1 can rewrite into agent using Tn.Stops working when we reach T0.Works for finite time horizons.
2 Can have theories s.t. Tn ` �Tn+1pϕq→ ϕ for all ϕ ∈ Π1.
Agent using Tn can rewrite into agent using Tn+1.Can rewrite forever!
(But: Agent doesn’t know this! :-()
Works with temporal discounting (Fallenstein & Soares, 2014).
Do these approaches generalize beyond formal logic?
Benja Fallenstein Vingean reflection
![Page 53: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/53.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Partial solutions
1 Can have theories T0,T1,T2, . . . s.t. Tn+1 ` �Tnpϕq→ ϕ.
Agent using Tn+1 can rewrite into agent using Tn.Stops working when we reach T0.Works for finite time horizons.
2 Can have theories s.t. Tn ` �Tn+1pϕq→ ϕ for all ϕ ∈ Π1.
Agent using Tn can rewrite into agent using Tn+1.Can rewrite forever!
(But: Agent doesn’t know this! :-()
Works with temporal discounting (Fallenstein & Soares, 2014).
Do these approaches generalize beyond formal logic?
Benja Fallenstein Vingean reflection
![Page 54: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/54.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
1 The “procrastination paradox”
2 A formal toy model
3 Partial solutions
4 Logical uncertainty
5 Conclusions
Benja Fallenstein Vingean reflection
![Page 55: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/55.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Logical uncertainty
Standard probability theory = environmental uncertainty.
Agents are assumed to be logically omniscient.
No theoretical understanding of mathematical uncertainty!
Example: Choose between O(n2) and O(n log n) algorithm
Realistic Vingean reflection needs logical uncertainty.
Approach for study:
Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)
→ uncertainty about whether PA is consistent
Reflection is still difficult
Benja Fallenstein Vingean reflection
![Page 56: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/56.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Logical uncertainty
Standard probability theory = environmental uncertainty.
Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!
Example: Choose between O(n2) and O(n log n) algorithm
Realistic Vingean reflection needs logical uncertainty.
Approach for study:
Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)
→ uncertainty about whether PA is consistent
Reflection is still difficult
Benja Fallenstein Vingean reflection
![Page 57: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/57.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Logical uncertainty
Standard probability theory = environmental uncertainty.
Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!
Example: Choose between O(n2) and O(n log n) algorithm
Realistic Vingean reflection needs logical uncertainty.
Approach for study:
Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)
→ uncertainty about whether PA is consistent
Reflection is still difficult
Benja Fallenstein Vingean reflection
![Page 58: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/58.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Logical uncertainty
Standard probability theory = environmental uncertainty.
Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!
Example: Choose between O(n2) and O(n log n) algorithm
Realistic Vingean reflection needs logical uncertainty.
Approach for study:
Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)
→ uncertainty about whether PA is consistent
Reflection is still difficult
Benja Fallenstein Vingean reflection
![Page 59: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/59.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Logical uncertainty
Standard probability theory = environmental uncertainty.
Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!
Example: Choose between O(n2) and O(n log n) algorithm
Realistic Vingean reflection needs logical uncertainty.
Approach for study:
Probability distribution over complete theories in somefirst-order language.
e.g. complete theories extending Peano Arithmetic (PA)
→ uncertainty about whether PA is consistent
Reflection is still difficult
Benja Fallenstein Vingean reflection
![Page 60: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/60.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Logical uncertainty
Standard probability theory = environmental uncertainty.
Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!
Example: Choose between O(n2) and O(n log n) algorithm
Realistic Vingean reflection needs logical uncertainty.
Approach for study:
Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)
→ uncertainty about whether PA is consistent
Reflection is still difficult
Benja Fallenstein Vingean reflection
![Page 61: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/61.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Logical uncertainty
Standard probability theory = environmental uncertainty.
Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!
Example: Choose between O(n2) and O(n log n) algorithm
Realistic Vingean reflection needs logical uncertainty.
Approach for study:
Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)
→ uncertainty about whether PA is consistent
Reflection is still difficult
Benja Fallenstein Vingean reflection
![Page 62: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/62.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Reflection in probabilistic logic
Assign probabilities P[ϕ] to sentences ϕ. . .
. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].
Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.
But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).
Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.
Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.
Contradiction!
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1
Benja Fallenstein Vingean reflection
![Page 63: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/63.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Reflection in probabilistic logic
Assign probabilities P[ϕ] to sentences ϕ. . .
. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].
Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.
But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).
Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.
Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.
Contradiction!
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1
Benja Fallenstein Vingean reflection
![Page 64: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/64.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Reflection in probabilistic logic
Assign probabilities P[ϕ] to sentences ϕ. . .
. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].
Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.
But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).
Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.
Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.
Contradiction!
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1
Benja Fallenstein Vingean reflection
![Page 65: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/65.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Reflection in probabilistic logic
Assign probabilities P[ϕ] to sentences ϕ. . .
. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].
Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.
But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).
Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.
Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.
Contradiction!
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1
Benja Fallenstein Vingean reflection
![Page 66: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/66.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Reflection in probabilistic logic
Assign probabilities P[ϕ] to sentences ϕ. . .
. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].
Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.
But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).
Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.
Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.
Contradiction!
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1
Benja Fallenstein Vingean reflection
![Page 67: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/67.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Procrastination in probabilistic logic
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.
Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n
“Button pressed in step n unless very sure it’s pressed later”
P[∃n.P(n)] = 1
For all n, P[P(n)] = 0
Unclear how to interpret this!
P can’t be σ-additive probability measure on standard models
But can be finitely additive measure
Clearer understanding needed!
Benja Fallenstein Vingean reflection
![Page 68: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/68.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Procrastination in probabilistic logic
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.
Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n
“Button pressed in step n unless very sure it’s pressed later”
P[∃n.P(n)] = 1
For all n, P[P(n)] = 0
Unclear how to interpret this!
P can’t be σ-additive probability measure on standard models
But can be finitely additive measure
Clearer understanding needed!
Benja Fallenstein Vingean reflection
![Page 69: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/69.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Procrastination in probabilistic logic
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.
Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n
“Button pressed in step n unless very sure it’s pressed later”
P[∃n.P(n)] = 1
For all n, P[P(n)] = 0
Unclear how to interpret this!
P can’t be σ-additive probability measure on standard models
But can be finitely additive measure
Clearer understanding needed!
Benja Fallenstein Vingean reflection
![Page 70: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/70.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Procrastination in probabilistic logic
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.
Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n
“Button pressed in step n unless very sure it’s pressed later”
P[∃n.P(n)] = 1
For all n, P[P(n)] = 0
Unclear how to interpret this!
P can’t be σ-additive probability measure on standard models
But can be finitely additive measure
Clearer understanding needed!
Benja Fallenstein Vingean reflection
![Page 71: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/71.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Procrastination in probabilistic logic
Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.
Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n
“Button pressed in step n unless very sure it’s pressed later”
P[∃n.P(n)] = 1
For all n, P[P(n)] = 0
Unclear how to interpret this!
P can’t be σ-additive probability measure on standard models
But can be finitely additive measure
Clearer understanding needed!
Benja Fallenstein Vingean reflection
![Page 72: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/72.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
1 The “procrastination paradox”
2 A formal toy model
3 Partial solutions
4 Logical uncertainty
5 Conclusions
Benja Fallenstein Vingean reflection
![Page 73: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/73.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Conclusions
Gave example of self-referential reasoning gone wrong.
Any reliable system for self-referential reasoning will need todeal with this somehow.
Analyzed the problem using a toy model,and looked for solutions that generalize.
Can extend to utility-based agents (Fallenstein & Soares, 2014)
Looked for extensions to logical uncertainty.
Reflection is still difficult.
Still get versions of the procrastination paradox.
Better understanding needed.
Extremely reliable self-referential reasoning isn’t trivial. . .
but we can make progress towards it! Thanks for listening!
Benja Fallenstein Vingean reflection
![Page 74: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/74.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Conclusions
Gave example of self-referential reasoning gone wrong.
Any reliable system for self-referential reasoning will need todeal with this somehow.
Analyzed the problem using a toy model,and looked for solutions that generalize.
Can extend to utility-based agents (Fallenstein & Soares, 2014)
Looked for extensions to logical uncertainty.
Reflection is still difficult.
Still get versions of the procrastination paradox.
Better understanding needed.
Extremely reliable self-referential reasoning isn’t trivial. . .
but we can make progress towards it! Thanks for listening!
Benja Fallenstein Vingean reflection
![Page 75: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter](https://reader031.vdocuments.us/reader031/viewer/2022022008/5ae120fc7f8b9a97518e1c67/html5/thumbnails/75.jpg)
The “procrastination paradox”A formal toy model
Partial solutionsLogical uncertainty
Conclusions
Conclusions
Gave example of self-referential reasoning gone wrong.
Any reliable system for self-referential reasoning will need todeal with this somehow.
Analyzed the problem using a toy model,and looked for solutions that generalize.
Can extend to utility-based agents (Fallenstein & Soares, 2014)
Looked for extensions to logical uncertainty.
Reflection is still difficult.
Still get versions of the procrastination paradox.
Better understanding needed.
Extremely reliable self-referential reasoning isn’t trivial. . .
but we can make progress towards it! Thanks for listening!
Benja Fallenstein Vingean reflection