Simplicity and Simplicity and Truth:Truth:
an Alternative an Alternative Explanation of Explanation of
Ockham's RazorOckham's RazorKevin T. Kelly Kevin T. Kelly Conor Mayo-WilsonConor Mayo-Wilson
Department of PhilosophyDepartment of PhilosophyJoint Program in Logic and ComputationJoint Program in Logic and Computation
Carnegie Mellon UniversityCarnegie Mellon University
www.hss.cmu.edu/philosophy/faculty-www.hss.cmu.edu/philosophy/faculty-kelly.phpkelly.php
I. The Simplicity I. The Simplicity Puzzle Puzzle
Which Theory is Right?Which Theory is Right?
???
Ockham Says:Ockham Says:
Choose theSimplest!
But Why?But Why?
Gotcha!
PuzzlePuzzle
An indicator must be An indicator must be sensitivesensitive to to what it indicates.what it indicates.
simple
PuzzlePuzzle
A reliable indicator must be A reliable indicator must be sensitivesensitive to what it indicates. to what it indicates.
complex
PuzzlePuzzle
But Ockham’s razor But Ockham’s razor alwaysalways points points at simplicity.at simplicity.
simple
PuzzlePuzzle
But Ockham’s razor But Ockham’s razor alwaysalways points points at simplicity.at simplicity.
complex
PuzzlePuzzle
How can a broken compass help How can a broken compass help you find something unless you you find something unless you already know where it is?already know where it is?
complex
Standard AccountsStandard Accounts
1. Prior Simplicity Bias1. Prior Simplicity BiasBayes, BIC, MDL, MML, etc.Bayes, BIC, MDL, MML, etc.
2. Risk Minimization2. Risk MinimizationSRM, AIC, cross-validation, etc. SRM, AIC, cross-validation, etc.
1. Prior Simplicity Bias1. Prior Simplicity Bias
The simple theory is more plausible now because it was more plausible yesterday.
More Subtle VersionMore Subtle Version
Simple data are a Simple data are a miraclemiracle in the in the complex theory but not in the complex theory but not in the simple theory.simple theory.
P C
Regularity: retrograde motion of Venus at solar conjunction
Has to be!
However…However…
e e would would notnot be a miracle given be a miracle given PP((););
Why not this?
CP
The Real MiracleThe Real MiracleIgnoranceIgnorance about model:about model:
pp((CC) ) pp((PP); );
++ IgnoranceIgnorance about parameter setting:about parameter setting:
p’p’((PP(() | ) | PP) ) pp((PP((’ ’ ) | ) | PP).).
== Knowledge Knowledge about about CC vs. vs. PP(():):
pp((PP(()) << )) << pp((CC).).
CP
Lead into gold.Perpetual motion.Free lunch.
Ignorance is knowledge.War is peace.I love Big Bayes.
Standard Paradox of Standard Paradox of IndifferenceIndifference
IgnoranceIgnorance ofof redred vs. not- vs. not-redred
++ IgnoranceIgnorance over not-over not-redred::
== KnowledgeKnowledge about about red vs. white.red vs. white.
Knognorance = All the priveleges of knowledgeWith none of the responsibilitiesYeah!
The Ellsberg ParadoxThe Ellsberg Paradox
1/3 ? ?
Human PreferenceHuman Preference
1/3 ? ?
a > b
a c < cb
b
Human ViewHuman View
1/3 ? ?
a > b
a c < cb
bknowledge ignorance
knowledgeignorance
Bayesian ViewBayesian View
1/3 ? ?
a > b
a c > cb
bknognoranceknognorance
knognoranceknognorance
In Any EventIn Any Event
The coherentist foundations of The coherentist foundations of Bayesianism have Bayesianism have nothing to nothing to do with short-run truth-do with short-run truth-conduciveness.conduciveness.Not so loud!
Bayesian ConvergenceBayesian Convergence
Too-simple theories get shot down…Too-simple theories get shot down…
ComplexityTheories
Updated opinion
Bayesian ConvergenceBayesian Convergence
Plausibility is transferred to the next-Plausibility is transferred to the next-simplest theory…simplest theory…
Blam! ComplexityTheories
Updated opinion
Plink!
Bayesian ConvergenceBayesian Convergence
Plausibility is transferred to the next-Plausibility is transferred to the next-simplest theory…simplest theory…
Blam! ComplexityTheories
Updated opinion
Plink!
Bayesian ConvergenceBayesian Convergence
Plausibility is transferred to the next-Plausibility is transferred to the next-simplest theory…simplest theory…
Blam! ComplexityTheories
Updated opinion
Plink!
Bayesian ConvergenceBayesian Convergence
The true theory is nailed to the fence.The true theory is nailed to the fence.
Blam! ComplexityTheories
Updated opinion
Zing!
ConvergenceConvergence
But But alternative strategies alternative strategies also also converge: converge: AnythingAnything in the short run is in the short run is
compatible with convergence in the compatible with convergence in the long run.long run.
Summary of Bayesian Summary of Bayesian ApproachApproach
Prior-basedPrior-based explanations of Ockham’s explanations of Ockham’s razor are circular and based on a razor are circular and based on a faulty model of ignorance.faulty model of ignorance.
Convergence-basedConvergence-based explanations of explanations of Ockham’s razor fail to single out Ockham’s razor fail to single out Ockham’s razor.Ockham’s razor.
2. Risk Minimization2. Risk Minimization Ockham’s razor minimizes Ockham’s razor minimizes
expected distance of empirical expected distance of empirical estimates from the true value.estimates from the true value.
Truth
Unconstrained Unconstrained EstimatesEstimates
are are CenteredCentered on truth but on truth but spreadspread around it.around it.
Pop!Pop!Pop!Pop!
Unconstrained aim
Off-centerOff-center but but less spread.less spread.
Clamped aim
Truth
Constrained EstimatesConstrained Estimates
Off-centerOff-center but but less spreadless spread Overall improvement in Overall improvement in expected expected
distancedistance from truth… from truth…
Truth
Pop!Pop!Pop!Pop!
Constrained EstimatesConstrained Estimates
Clamped aim
Doesn’t Find True Doesn’t Find True TheoryTheory
The theory that minimizes estimation The theory that minimizes estimation risk can be quite risk can be quite falsefalse..
Four eyes!
Clamped aim
Makes SenseMakes Sense……when loss of an answer is similar in when loss of an answer is similar in
nearby distributions.nearby distributions.
Similarityp
Close is goodenough!Loss
But Truth MattersBut Truth Matters……when loss of an answer is when loss of an answer is
discontinuousdiscontinuous with similarity. with similarity.
Similarityp
Close is no cigar!Loss
E.g. ScienceE.g. Science
If you want true laws, false laws aren’t good enough.
E.g. ScienceE.g. Science
You must be a philosopher.This is a machine learning conference.
E.g., Causal Data MiningE.g., Causal Data Mining
Protein A
Protein B
Protein C Cancer protein
Practical enough?
Now you’re talking! I’m ona cilantro-only diet to get myprotein C level under control.
Correlation Correlation doesdoes imply causation if imply causation if there are multiple variables, some there are multiple variables, some of which are common effects. of which are common effects. [Pearl, [Pearl, Spirtes, Glymour and Scheines]Spirtes, Glymour and Scheines]
Central IdeaCentral Idea
Protein A
Protein B
Protein C Cancer protein
Joint distribution Joint distribution pp is is causally compatible causally compatible with directed, acyclic graph with directed, acyclic graph GG iff: iff:
Causal Markov Condition: Causal Markov Condition: each variable each variable XX is independent of its non-effects given its is independent of its non-effects given its immediate causes.immediate causes.
Faithfulness Condition: Faithfulness Condition: no other conditional no other conditional independence relations hold in independence relations hold in pp. .
Core assumptionsCore assumptions
F1 F2
Tell-tale DependenciesTell-tale DependenciesH C
F
Given F, H gives some info about C(Faithfulness)
C
Given C, F1 gives no further info about F2(Markov)
Common ApplicationsCommon Applications Linear Causal Case:Linear Causal Case: each variable each variable
XX is a linear function of its parents is a linear function of its parents and a normally distributed hidden and a normally distributed hidden variable called an “error term”. variable called an “error term”. The error terms are mutually The error terms are mutually independent.independent.
Discrete Multinomial Case:Discrete Multinomial Case: each each variable variable XX takes on a finite range of takes on a finite range of values. values.
No unobserved latent confounding No unobserved latent confounding causescauses
A Very Optimistic A Very Optimistic AssumptionAssumption
Genetics
Smoking Cancer
I’ll give you this one. What’s he up to?
Current Nutrition Current Nutrition WisdomWisdom
Protein A
Protein B
Protein C Cancer protein
Are you kidding? It’s dripping with Protein C!English Breakfast?
As the Sample As the Sample Increases…Increases…
Protein A
Protein B
Protein C Cancer proteinweak
Protein D
I do! Out of my way!
This situation approximatesThe last one. So who cares?
As the Sample Increases As the Sample Increases Again…Again…
Protein A
Protein B
Protein C Cancer protein
Wasn’t that last approximationto the truth good enough?
weak
Protein D
Aaack! I’m poisoned!
Protein Eweak
weak
Causal Flipping Causal Flipping TheoremTheorem
No matter what a consistent causal No matter what a consistent causal discovery procedure has seen so far, discovery procedure has seen so far, there exists a pair there exists a pair GG, , pp satisfying the satisfying the assumptions so that the current assumptions so that the current sample is arbitrarily likely and the sample is arbitrarily likely and the procedure produces arbitrarily many procedure produces arbitrarily many opposite conclusions in opposite conclusions in pp as sample as sample size increases.size increases.
oops
I meant oops
oopsI meant
I meant
The Wrong ReactionThe Wrong Reaction The demon The demon underminesundermines justification of justification of
science.science. He must be He must be defeated defeated to forestall to forestall
skepticism.skepticism. Bayesian circularityBayesian circularity Classical instrumentalismClassical instrumentalism
Grrrr!Urk!
Another ViewAnother View
Many explanations have been offered to Many explanations have been offered to make sense of the here-today-gone-make sense of the here-today-gone-tomorrow nature of medical wisdom — tomorrow nature of medical wisdom — what we are advised with confidence one what we are advised with confidence one year is reversed the nextyear is reversed the next — but the — but the simplest one is that it is simplest one is that it is the natural the natural rhythm of sciencerhythm of science. .
((Do We Really Know What Makes us Do We Really Know What Makes us HealthyHealthy, NY Times Magazine, Sept. 16, , NY Times Magazine, Sept. 16, 2007). 2007).
Zen ApproachZen Approach Get to Get to knowknow the demon. the demon. Locate the justification of Locate the justification of
Ockham’s razor in his Ockham’s razor in his powerpower..
Connections to the TruthConnections to the Truth Short-run Short-run
ReliabilityReliability Too strong to be Too strong to be
feasible when theory feasible when theory matters.matters.
Long-run Long-run Convergence Convergence Too weak to single Too weak to single
out Ockham’s razorout Ockham’s razor
ComplexSimple
ComplexSimple
Middle PathMiddle Path Short-run ReliabilityShort-run Reliability
Too strong to be Too strong to be feasible when theory feasible when theory matters.matters.
““Straightest” Straightest” convergenceconvergence Just right?Just right?
Long-run Convergence Long-run Convergence Too weak to single out Too weak to single out
Ockham’s razorOckham’s razor
ComplexSimple
ComplexSimple
ComplexSimple
II. Navigation by Broken II. Navigation by Broken CompassCompass
simple
Asking for DirectionsAsking for Directions
Where’s …Where’s …
Asking for DirectionsAsking for Directions
Turn around. The freeway ramp is on the left.Turn around. The freeway ramp is on the left.
Asking for DirectionsAsking for Directions
Goal
Best RouteBest Route
Goal
Best Route to Any GoalBest Route to Any Goal
Disregarding Advice is Disregarding Advice is BadBad
Extra U-turn
Best Route to Any GoalBest Route to Any Goal
…so fixed advice can help you reach a hidden goalwithout circles, evasions, or magic.
In Step with the DemonIn Step with the Demon
ConstantLinear
Quadratic
Cubic
There yet?Maybe.
In Step with the DemonIn Step with the Demon
ConstantLinear
Quadratic
Cubic
There yet?
Maybe.
In Step with the DemonIn Step with the Demon
ConstantLinear
Quadratic
Cubic
There yet?Maybe.
In Step with the DemonIn Step with the Demon
ConstantLinear
Quadratic
Cubic
There yet?Maybe.
Ahead of Mother NatureAhead of Mother Nature
ConstantLinear
Quadratic
Cubic
There yet?Maybe.
Ahead of Mother NatureAhead of Mother Nature
ConstantLinear
Quadratic
Cubic
I know you’re coming!
Ahead of Mother NatureAhead of Mother Nature
ConstantLinear
Quadratic
Cubic
Maybe.
Ahead of Mother NatureAhead of Mother Nature
ConstantLinear
Quadratic
Cubic
!!!
Hmm, it’s quite nice here…
Ahead of Mother NatureAhead of Mother Nature
ConstantLinear
Quadratic
Cubic
You’re back!Learned your lesson?
Ockham Violator’s PathOckham Violator’s Path
ConstantLinear
Quadratic
Cubic
See, you shouldn’t run aheadEven if you are right!
Ockham PathOckham Path
ConstantLinear
Quadratic
Cubic
Empirical ProblemsEmpirical Problems
T1 T2 T3
Set Set KK of infinite of infinite input sequencesinput sequences.. Partition of Partition of KK into alternative into alternative
theoriestheories..K
Empirical MethodsEmpirical Methods
T1 T2 T3
Map finite input sequences to Map finite input sequences to theories or to “?”.theories or to “?”.
K
T3
e
Method ChoiceMethod Choice
T1 T2 T3
e1 e2 e3 e4
Input history
Output historyAt each stage, scientist can choose a new method (agreeing with past theory choices).
Aim: Aim: Converge to the Converge to the TruthTruth
T1 T2 T3
K
T3 ? T2 ? T1 T1 T1 T1 . . .T1 T1 T1
RetractionRetraction
Choosing Choosing TT and then not choosing and then not choosing TT next next
T’T’
TT
??
Aim: Aim: Eliminate Eliminate Needless Needless RetractionsRetractions
Truth
Aim: Aim: Eliminate Eliminate Needless Needless RetractionsRetractions
Truth
Ancient RootsAncient Roots
"Living in the midst of ignorance and considering themselves intelligent and enlightened, the senseless people go round and round, following crooked courses, just like the blind led by the blind." Katha Upanishad, I. ii. 5, c. 600 BCE.
Aim: Eliminate Aim: Eliminate Needless Needless DelaysDelays to Retractions to Retractions
theory
applicationapplicationapplication
applicationcorollary
applicationtheory
applicationapplication
corollary applicationcorollary
Aim: Eliminate Aim: Eliminate Needless Needless DelaysDelays to Retractions to Retractions
Why Timed Retractions?Why Timed Retractions?
Retraction minimization =generalized significance level.
Retraction time minimization = generalized power.
Easy Retraction Time Easy Retraction Time ComparisonsComparisons
T1 T1 T2 T2
T1 T1 T2 T2 T3 T3T2 T4 T4
T2 T2
Method 1
Method 2
T4 T4 T4
. . .
. . .
at least as many at least as late
Worst-case Retraction Time Worst-case Retraction Time BoundsBounds
T1 T2
Output sequences
T1 T2
T1 T2
T4
T3
T3
T3
T3
T3 T3
T4
T4
T4
T4 T4
. . .
(1, 2, ∞)
. . .
. . .
. . .. . .
. . .T4
T4
T4
T1 T2 T3 T3 T3 T4T3 . . .
IV. Ockham Without IV. Ockham Without Circles, Evasions, Circles, Evasions,
or Magic or Magic
Curve FittingCurve Fitting
Data = open intervals around Data = open intervals around YY at rational values of at rational values of X.X.
Curve FittingCurve Fitting
No effects:No effects:
Curve FittingCurve Fitting
First-order effect:First-order effect:
Curve FittingCurve Fitting
Second-order effect:Second-order effect:
Empirical EffectsEmpirical Effects
Empirical EffectsEmpirical Effects
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical TheoriesEmpirical Theories True theory determined by which True theory determined by which
effects appear.effects appear.
Empirical ComplexityEmpirical Complexity
More complex
Background ConstraintsBackground Constraints
More complex
Background ConstraintsBackground Constraints
More complex
Ockham’s RazorOckham’s Razor Don’t select a theory unless it is Don’t select a theory unless it is
uniquely simplest in light of uniquely simplest in light of experience.experience.
Weak Ockham’s RazorWeak Ockham’s Razor Don’t select a theory unless it Don’t select a theory unless it
among the simplest in light of among the simplest in light of experience.experience.
StalwartnessStalwartness Don’t retract your answer while it Don’t retract your answer while it
is uniquely simplestis uniquely simplest
StalwartnessStalwartness Don’t retract your answer while it Don’t retract your answer while it
is uniquely simplestis uniquely simplest
Timed Retraction BoundsTimed Retraction Bounds
rr((M, e, nM, e, n)) = = the least timed retraction the least timed retraction bound covering the total timed bound covering the total timed retractionsretractions of of M M along input streams along input streams of complexityof complexity n n that extendthat extend e e
Empirical Complexity 0 1 2 3 . . .
. . .
M
EfficiencyEfficiency of Method of Method M M at at ee
MM convergesconverges to the truth no matter to the truth no matter what;what;
For each convergent For each convergent M’M’ that agrees that agrees with with MM up to the end of up to the end of ee, and for , and for each each nn:: rr((MM, , ee, , nn) ) rr((M’M’, , ee, , nn))
Empirical Complexity 0 1 2 3 . . .
. . .
M M’
MM is is Beaten Beaten at at ee
There exists convergent There exists convergent M’M’ that that agrees with agrees with MM up to the end of up to the end of ee, , such thatsuch that For each For each nn, , rr((MM, , ee, , nn) ) rr((M’M’, , ee, , nn);); Exists Exists nn, , rr((MM, , ee, , nn) ) > > rr((M’M’, , ee, , nn).).
Empirical Complexity 0 1 2 3 . . .
. . .
M M’
Basic IdeaBasic Idea
Ockham efficiency:Ockham efficiency: Nature can Nature can force arbitary, convergent force arbitary, convergent MM to to produce the successive answers produce the successive answers down an effect path arbitrarily late, down an effect path arbitrarily late, so stalwart, Ockham solutions are so stalwart, Ockham solutions are efficient.efficient.
Basic IdeaBasic Idea
Unique Ockham efficiency: Unique Ockham efficiency: A A violator of Ockham’s razor or violator of Ockham’s razor or stalwartness can be forced into an stalwartness can be forced into an extra retraction or a late retraction extra retraction or a late retraction in complexity class zero at the time in complexity class zero at the time of the violation, so the violator is of the violation, so the violator is beaten by each stalwart, Ockham beaten by each stalwart, Ockham solution. solution.
Ockham Efficiency Ockham Efficiency TheoremTheorem
Let Let MM be a be a solutionsolution. The . The following are equivalent:following are equivalent: MM is is alwaysalways strongly Ockham and strongly Ockham and
stalwart;stalwart; MM is is alwaysalways efficient; efficient; MM is is nevernever weaklyweakly beaten. beaten.
Example: Causal Example: Causal InferenceInference EffectsEffects are are conditional statistical conditional statistical
dependence relationsdependence relations..
X dep Y | {Z}, {W}, {Z,W}
Y dep Z | {X}, {W}, {X,W}
X dep Z | {Y}, {Y,W}
. . .. . .
Causal Discovery = Causal Discovery = Ockham’s RazorOckham’s Razor
X Y Z W
Ockham’s RazorOckham’s Razor
X Y Z W
X dep Y | {Z}, {W}, {Z,W}
Causal Discovery = Causal Discovery = Ockham’s RazorOckham’s Razor
X Y Z W
X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {Y,W}
Causal Discovery = Causal Discovery = Ockham’s RazorOckham’s Razor
X Y Z W
X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {W}, {Y,W}
Causal Discovery = Causal Discovery = Ockham’s RazorOckham’s Razor
X Y Z W
X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {W}, {Y,W}Z dep W| {X}, {Y}, {X,Y}Y dep W| {Z}, {X,Z}
Causal Discovery = Causal Discovery = Ockham’s RazorOckham’s Razor
X Y Z W
X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {W}, {Y,W}Z dep W| {X}, {Y}, {X,Y}Y dep W| {X}, {Z}, {X,Z}
IV. Simplicity IV. Simplicity DefinedDefined
ApproachApproach
Empirical complexityEmpirical complexity reflects reflects nested problems of inductionnested problems of induction posed by the problem.posed by the problem.
Hence, simplicity is Hence, simplicity is problem-problem-relativerelative but but topologically topologically invariantinvariant..
Empirical ProblemsEmpirical Problems
T1 T2 T3
Set Set KK of infinite of infinite input sequencesinput sequences.. Partition Partition QQ of of KK into alternative into alternative
theoriestheories..
K
Simplicity ConceptsSimplicity Concepts A A simplicity conceptsimplicity concept for for KK is just a well- is just a well-
founded order < on a partition founded order < on a partition SS of of KK with with ascending chains of order type not ascending chains of order type not exceeding omega such that:exceeding omega such that:
1.1. Each element of Each element of SS is included in some is included in some answer in answer in QQ..
2.2. Each downward union in (Each downward union in (SS, <) is closed;, <) is closed;
3.3. Incomparable sets share no boundary Incomparable sets share no boundary point.point.
4.4. Each element of Each element of SS is included in the is included in the boundary of its successor.boundary of its successor.
General Ockham Efficiency General Ockham Efficiency TheoremTheorem
Let Let MM be a be a solutionsolution. The . The following are equivalent:following are equivalent: MM is is alwaysalways strongly Ockham and strongly Ockham and
stalwart;stalwart; MM is is alwaysalways efficient; efficient; MM is is nevernever beaten. beaten.
ConclusionsConclusions
Causal truths are necessary for Causal truths are necessary for counterfactual predictions.counterfactual predictions.
Ockham’s razor is necessary for Ockham’s razor is necessary for staying on the straightest path to the staying on the straightest path to the true theory but does not point at the true theory but does not point at the true theory.true theory.
No evasions or circles are required.No evasions or circles are required.
Future DirectionsFuture Directions
Extension of unique efficiency theorem Extension of unique efficiency theorem to stochastic model selection.to stochastic model selection.
Latent variables as Ockham conclusions.Latent variables as Ockham conclusions. Degrees of retraction.Degrees of retraction. Pooling of marginal Ockham Pooling of marginal Ockham
conclusions.conclusions. Retraction efficiency assessment of Retraction efficiency assessment of
MDL, SRM.MDL, SRM.
Suggested ReadingSuggested Reading
"Ockham’s Razor, Truth, and Information""Ockham’s Razor, Truth, and Information", in , in Handbook of the Philosophy of Handbook of the Philosophy of InformationInformation, J. van Behthem and P. , J. van Behthem and P. Adriaans, eds., to appear. Adriaans, eds., to appear.
"Ockham’s Razor, Empirical Complexity, a"Ockham’s Razor, Empirical Complexity, and Truth-finding Efficiency"nd Truth-finding Efficiency", , Theoretical Computer ScienceTheoretical Computer Science, 383: 270-, 383: 270-289, 2007. 289, 2007.
Both available as pre-prints at: Both available as pre-prints at: www.hss.cmu.edu/philosophy/faculty-www.hss.cmu.edu/philosophy/faculty-kelly.phpkelly.php