model assumptions and truth in statisticsucakche/presentations/modelass0115.pdf · cluster analysis...

93
Statistics and Philosophy Mathematical models and reality Frequentist probabilities The Bayes-frequentist controversy Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig 4 February 2015 Christian Hennig Model Assumptions and Truth in Statistics

Upload: others

Post on 26-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Model Assumptions and Truth in Statistics

Christian Hennig

4 February 2015

Christian Hennig Model Assumptions and Truth in Statistics

Page 2: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

1. Statistics and Philosophy

Statistics is about getting knowledge from data.

What can we know from data?

How can we find out?

Christian Hennig Model Assumptions and Truth in Statistics

Page 3: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ What is probability? - frequentist vs. Bayesian vs. ?

Observers see what happens but probabilityis about “what could have happenedother than what happened”.

Christian Hennig Model Assumptions and Truth in Statistics

Page 4: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ What is probability? - frequentist vs. Bayesian vs. ?

Observers see what happens but probabilityis about “what could have happenedother than what happened”.

◮ Assessing evidence - testing hypotheses,p-values, probabilities of hypotheses. . .

Christian Hennig Model Assumptions and Truth in Statistics

Page 5: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ What is probability? - frequentist vs. Bayesian vs. ?

Observers see what happens but probabilityis about “what could have happenedother than what happened”.

◮ Assessing evidence - testing hypotheses,p-values, probabilities of hypotheses. . .

often confused with interpretations of probability,but Bayesian methods can work with frequentistprobabilities.

Christian Hennig Model Assumptions and Truth in Statistics

Page 6: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ What is probability? - frequentist vs. Bayesian vs. ?

Observers see what happens but probabilityis about “what could have happenedother than what happened”.

◮ Assessing evidence - testing hypotheses,p-values, probabilities of hypotheses. . .

often confused with interpretations of probability,but Bayesian methods can work with frequentistprobabilities.

◮ Modelling and model assumptions -where does the model come from?How can assumptions be assessed?

Christian Hennig Model Assumptions and Truth in Statistics

Page 7: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ Statistics vs. Data Analysis - our own “PoS vs. STS”.

Data Analysis is about much more than probability modelsand classical inference.Prediction, exploratory analysis, data compression,pattern recognition, data visualisation, stability,. . .

Christian Hennig Model Assumptions and Truth in Statistics

Page 8: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ Statistics vs. Data Analysis - our own “PoS vs. STS”.

Data Analysis is about much more than probability modelsand classical inference.Prediction, exploratory analysis, data compression,pattern recognition, data visualisation, stability,. . .

Data Analysts: Only use statistical models where they help,often other methods are better.

Statisticians: St. models improve pretty much everything.(And we have experimental design.)

Christian Hennig Model Assumptions and Truth in Statistics

Page 9: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ Statistics vs. Data Analysis - our own “PoS vs. STS”.

Data Analysis is about much more than probability modelsand classical inference.Prediction, exploratory analysis, data compression,pattern recognition, data visualisation, stability,. . .

Data Analysts: Only use statistical models where they help,often other methods are better.

Statisticians: St. models improve pretty much everything.(And we have experimental design.)

(Parts of the field claimed by Machine Learning now,role of “black box prediction machines”?)

Christian Hennig Model Assumptions and Truth in Statistics

Page 10: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ Objectivity vs. subjectivity -

automatic/standardised decisions vs. researcher’sdecisions

Christian Hennig Model Assumptions and Truth in Statistics

Page 11: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ Objectivity vs. subjectivity -

automatic/standardised decisions vs. researcher’sdecisions

◮ Measuring quality of methodology -assumed truth (what is the truth we want to find)?prediction error? replication? interpretability?

Christian Hennig Model Assumptions and Truth in Statistics

Page 12: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Issues◮ Objectivity vs. subjectivity -

automatic/standardised decisions vs. researcher’sdecisions

◮ Measuring quality of methodology -assumed truth (what is the truth we want to find)?prediction error? replication? interpretability?

◮ . . . and more - theory of measurement,role of visualisation, statistics communication. . .

Christian Hennig Model Assumptions and Truth in Statistics

Page 13: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Overview

1. Statistics and Philosophy

2. Mathematical models and reality

3. Frequentist probabilities

4. The Bayes-frequentist controversy

5. Cluster analysis and truth

6. Conclusion

Christian Hennig Model Assumptions and Truth in Statistics

Page 14: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

2. Mathematical models and reality - some data

Teacher A results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

01

23

45

10 20 30 40 50 60 70 80 90 100

Teacher B results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

05

1015

10 20 30 40 50 60 70 80 90 100

Christian Hennig Model Assumptions and Truth in Statistics

Page 15: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Standard model:

x1, . . . , xn ∼ N (µ1, σ21) i.i.d.,

y1, . . . , ym ∼ N (µ2, σ22) i.i.d.

Estimate µ1, µ2 by means, x̄1 = 58.6, x̄2 = 56.9,teacher A students do better, but not significantly so(p = 0.455 under µ1 = µ2).

Christian Hennig Model Assumptions and Truth in Statistics

Page 16: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Standard model:

x1, . . . , xn ∼ N (µ1, σ21) i.i.d.,

y1, . . . , ym ∼ N (µ2, σ22) i.i.d.

Estimate µ1, µ2 by means, x̄1 = 58.6, x̄2 = 56.9,teacher A students do better, but not significantly so(p = 0.455 under µ1 = µ2).

What do the model assumptions mean,and do they make sense?

Christian Hennig Model Assumptions and Truth in Statistics

Page 17: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

With fitted distributions N (µ1, σ21),N (µ2, σ

22)

Teacher A results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

01

23

45

10 20 30 40 50 60 70 80 90 100

Teacher B results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

05

1015

10 20 30 40 50 60 70 80 90 100

Christian Hennig Model Assumptions and Truth in Statistics

Page 18: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

We actually know that the model assumptions are violated.The normal distribution has an unlimited value range,and produces integer values with probability zero.Actually for such reasons we know thatno data observed by humans can ever be “truly normal”.

But how much harm does this do?

Christian Hennig Model Assumptions and Truth in Statistics

Page 19: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

General mathematical modelling

Identification of items of perceived realitywith mathematical objectsand interpretation of results of mathematical operationsin terms of the items.

“Trivial” manipulation of counts and measurements,deterministic and stochastic models,analytical and computer models,data fitting and mechanistic models.

Christian Hennig Model Assumptions and Truth in Statistics

Page 20: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Constructivist view (H., 2010, Foundations of Science)

Christian Hennig Model Assumptions and Truth in Statistics

Page 21: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Implications for mathematical modelling◮ Science: establishing agreement in open exchange.

Christian Hennig Model Assumptions and Truth in Statistics

Page 22: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Implications for mathematical modelling◮ Science: establishing agreement in open exchange.◮ Mathematics is about creating a system that makes

absolute agreement possible(but only within mathematics).

Christian Hennig Model Assumptions and Truth in Statistics

Page 23: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Implications for mathematical modelling◮ Science: establishing agreement in open exchange.◮ Mathematics is about creating a system that makes

absolute agreement possible(but only within mathematics).

◮ Mathematical modelling is not about how things are,but about how we think and communicate about them.(Models communicate and change views of reality.)

Christian Hennig Model Assumptions and Truth in Statistics

Page 24: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Implications for mathematical modelling◮ Science: establishing agreement in open exchange.◮ Mathematics is about creating a system that makes

absolute agreement possible(but only within mathematics).

◮ Mathematical modelling is not about how things are,but about how we think and communicate about them.(Models communicate and change views of reality.)

◮ It cannot be formally analysed how formal models arerelated to informal reality.

Christian Hennig Model Assumptions and Truth in Statistics

Page 25: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Implications for mathematical modelling◮ Science: establishing agreement in open exchange.◮ Mathematics is about creating a system that makes

absolute agreement possible(but only within mathematics).

◮ Mathematical modelling is not about how things are,but about how we think and communicate about them.(Models communicate and change views of reality.)

◮ It cannot be formally analysed how formal models arerelated to informal reality.

◮ Pragmatist attitude: what do we get out of it?

Christian Hennig Model Assumptions and Truth in Statistics

Page 26: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Realism?

Models are about personal and social perceptions.Science: can check models againstobservations from agreed measurement procedures.

Christian Hennig Model Assumptions and Truth in Statistics

Page 27: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Realism?

Models are about personal and social perceptions.Science: can check models againstobservations from agreed measurement procedures.

Compatible with Chang’s Active Scientific Realism :“I take reality as whatever is not subject to one’s will, andknowledge as an ability to act without being frustrated byresistance from reality.”We acknowledge reality’s “resistance”and that this is what science is about -without accessing “truth”about observer-independent reality.

Christian Hennig Model Assumptions and Truth in Statistics

Page 28: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Antony Gormley - Allotment

Christian Hennig Model Assumptions and Truth in Statistics

Page 29: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

3. Frequentist probabilities (e.g., von Mises)3.1 What do the model assumptions mean?For example,

x1, . . . , xn ∼ N (µ1, σ21) i.i.d.,

y1, . . . , ym ∼ N (µ2, σ22) i.i.d.

Christian Hennig Model Assumptions and Truth in Statistics

Page 30: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

What do the model assumptions mean?

“We think of the situation as . . . ”◮ Potentially infinite repetition (of experimental conditions)◮ P(A): relative frequency limit of occurrence of A

(e.g., normal distribution is defined by P(A) ∀A.)

Christian Hennig Model Assumptions and Truth in Statistics

Page 31: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

What do the model assumptions mean?

“We think of the situation as . . . ”◮ Potentially infinite repetition (of experimental conditions)◮ P(A): relative frequency limit of occurrence of A

(e.g., normal distribution is defined by P(A) ∀A.)

This is obviously an idealisation -and what constitutes a “repetition”?

“Whatever can be distinguished cannot be identical.”(B. de Finetti)

Christian Hennig Model Assumptions and Truth in Statistics

Page 32: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

3.2 Independent repetitions (“i.i.d.”)

Frequentism relies on “repetitions of experiments”,e.g. results from different patients in clinical study,different students in the exam.

“x1, . . . , xn ∼ N (µ1, σ21) i.i.d.” defines

a probability distribution for the full dataset,e.g., P(x1 ∈ A1, x2 ∈ A2) = P(x1 ∈ A1)P(x2 ∈ A2).

Christian Hennig Model Assumptions and Truth in Statistics

Page 33: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

3.2 Independent repetitions (“i.i.d.”)

Frequentism relies on “repetitions of experiments”,e.g. results from different patients in clinical study,different students in the exam.

“x1, . . . , xn ∼ N (µ1, σ21) i.i.d.” defines

a probability distribution for the full dataset,e.g., P(x1 ∈ A1, x2 ∈ A2) = P(x1 ∈ A1)P(x2 ∈ A2).

“i.i.d.” is defined in terms of probabilities.

Probabilities cannot be defined in terms ofi.i.d. repetitions.

Christian Hennig Model Assumptions and Truth in Statistics

Page 34: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

In order to define “i.i.d.” sequences, i.i.d. repetitions anddefining repetitions are required on different levels.

x1 x2 x3 x4 x5 ... xn ...

Christian Hennig Model Assumptions and Truth in Statistics

Page 35: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

In order to define “i.i.d.” sequences, i.i.d. repetitions anddefining repetitions are required on different levels.

x1 x2 x3 x4 x5 ... xn ...

x21 x22 x23 x24 x25 ... x2n ...

x31 x32 x33 x34 x35 ... x3n ...

x41 x42 x43 x44 x45 ... x4n ...

Christian Hennig Model Assumptions and Truth in Statistics

Page 36: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

In practice, there’s only one level of repetition.

The effective sample size for(in-)dependence assumptions is usually 1.

But independent repetition of some kindis always required in order to learn from data.

Christian Hennig Model Assumptions and Truth in Statistics

Page 37: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

In practice, there’s only one level of repetition.

The effective sample size for(in-)dependence assumptions is usually 1.

But independent repetition of some kindis always required in order to learn from data.

Independent repetition is constructed by conscious decisionto ignore potential dependencies and differences.

Christian Hennig Model Assumptions and Truth in Statistics

Page 38: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

3.3 Can frequentist model assumptions be checked?

Goodness-of-fit/misspecification tests (Mayo & Spanos)If something modelled as very unlikely happens,the model is interpreted to be falsified.

Christian Hennig Model Assumptions and Truth in Statistics

Page 39: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

3.3 Can frequentist model assumptions be checked?

Goodness-of-fit/misspecification tests (Mayo & Spanos)If something modelled as very unlikely happens,the model is interpreted to be falsified.

For example, could “falsify” independencefrom data where students at same tabletend to have similar results.

Christian Hennig Model Assumptions and Truth in Statistics

Page 40: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

3.3 Can frequentist model assumptions be checked?

Goodness-of-fit/misspecification tests (Mayo & Spanos)If something modelled as very unlikely happens,the model is interpreted to be falsified.

For example, could “falsify” independencefrom data where students at same tabletend to have similar results.

Implicitly assumes tables to be i.i.d.

Christian Hennig Model Assumptions and Truth in Statistics

Page 41: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Dependence (between what’s “close”) can only be foundby contrast to independence between what’s further away.

If everything’s dependent on everything else in same manner(or in irregularly different manners),this cannot be found.

Christian Hennig Model Assumptions and Truth in Statistics

Page 42: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

3.4 The goodness-of-fit paradox (H, 2007)

Checking the model assumptions violates them automaticallybecause the possibility of unlikely eventsis constitutive part of the models.

Christian Hennig Model Assumptions and Truth in Statistics

Page 43: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Christian Hennig Model Assumptions and Truth in Statistics

Page 44: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Christian Hennig Model Assumptions and Truth in Statistics

Page 45: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

3.5 “Approximately true?”

Model assumptions are violated -is it good enough if they are “approximately true”?

Christian Hennig Model Assumptions and Truth in Statistics

Page 46: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

“The model assumptions hold approximately” -precise meaning could only be:“We assume Q ∈ P and there is a true P so thatminQ∈P d(P,Q) is small.”

Christian Hennig Model Assumptions and Truth in Statistics

Page 47: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

“The model assumptions hold approximately” -precise meaning could only be:“We assume Q ∈ P and there is a true P so thatminQ∈P d(P,Q) is small.”

Dissimilarity measure d is required.

Distances for data analysis (L. Davies):d(P,Q) small ⇔ difficult to distinguish data from P,Q.

Christian Hennig Model Assumptions and Truth in Statistics

Page 48: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Problem: In every small d -neighbourhood of Qthere are distributions that behave very differently.

Christian Hennig Model Assumptions and Truth in Statistics

Page 49: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Problem: In every small d -neighbourhood of Qthere are distributions that behave very differently.

Q = N (µ, σ2) has expectation µ.Expectation of P may be anywhere or not exist.Likelihood ratio of Q and P may take any value.

Christian Hennig Model Assumptions and Truth in Statistics

Page 50: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

If P 6= N (µ, σ2), what is estimated estimating µ?

Median, expectation, mode are identical for N (µ, σ2),but different for asymmetric P.

Christian Hennig Model Assumptions and Truth in Statistics

Page 51: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

If P 6= N (µ, σ2), what is estimated estimating µ?

Median, expectation, mode are identical for N (µ, σ2),but different for asymmetric P.

Robustness theory: what characteristics of distributionsdon’t change much in d -neighbourhoods?

Christian Hennig Model Assumptions and Truth in Statistics

Page 52: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

If P 6= N (µ, σ2), what is estimated estimating µ?

Median, expectation, mode are identical for N (µ, σ2),but different for asymmetric P.

Robustness theory: what characteristics of distributionsdon’t change much in d -neighbourhoods?

In any case, cannot rule out too irregular distributions(such as complex dependence patterns).Some assumptions need to be imposed without checking.

Should at least reflect our perception.

Christian Hennig Model Assumptions and Truth in Statistics

Page 53: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

3.6 “Frequentism-as-model”, a new perspective

(Frequentist) models are still useful to. . .◮ communicate researcher’s perception of situation,◮ inspire methodology,◮ check quality of methodology

in situations with known (made up) truth.(Tukey, L. Davies)

◮ give an idea of potential variation to be expected.

Christian Hennig Model Assumptions and Truth in Statistics

Page 54: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Re-formulation of “model checking” :Find out whether data could leadstatistical method (derived from model) astray(requires statistical theory/simulations).

Christian Hennig Model Assumptions and Truth in Statistics

Page 55: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Re-formulation of “model checking” :Find out whether data could leadstatistical method (derived from model) astray(requires statistical theory/simulations).

Some violations are not problematic(e.g., g.o.f.-testing is rarely problematic,discreteness of data assumed normal doesn’t hurt much).

Christian Hennig Model Assumptions and Truth in Statistics

Page 56: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Re-formulation of “model checking” :Find out whether data could leadstatistical method (derived from model) astray(requires statistical theory/simulations).

Some violations are not problematic(e.g., g.o.f.-testing is rarely problematic,discreteness of data assumed normal doesn’t hurt much).

Depends on aim of analysis (researcher’s construct)which features of model are important.

Major motivation of methodology should come fromdesired interpretation of results.

Christian Hennig Model Assumptions and Truth in Statistics

Page 57: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Students from same year: clearly not independent(but may not show dependence pattern),“identical” only by ignoring background(iid is constructed by selective ignorance).

(That’s the view the model carries.)

Christian Hennig Model Assumptions and Truth in Statistics

Page 58: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Teacher A results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

01

23

45

10 20 30 40 50 60 70 80 90 100

Teacher B results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

05

1015

10 20 30 40 50 60 70 80 90 100

Means: 58.6 (A), 56.9 (B)Medians: 58 (A), 59 (B).

Christian Hennig Model Assumptions and Truth in Statistics

Page 59: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Teacher A results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 1000

12

34

510 20 30 40 50 60 70 80 90 100

Teacher B results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

05

1015

10 20 30 40 50 60 70 80 90 100

Can rejectequality of distributions (Kolmogorov-Smirnov p = 0.012),neither means nor medians differ significantly.

Christian Hennig Model Assumptions and Truth in Statistics

Page 60: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Teacher A results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

01

23

45

10 20 30 40 50 60 70 80 90 100

Teacher B results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

05

1015

10 20 30 40 50 60 70 80 90 100

Lower outliers in B suggest median.

Christian Hennig Model Assumptions and Truth in Statistics

Page 61: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Teacher A results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

01

23

45

10 20 30 40 50 60 70 80 90 100

Teacher B results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

05

1015

10 20 30 40 50 60 70 80 90 100

Favour mean: observations are not erroneous,so all observations should contribute!?

Christian Hennig Model Assumptions and Truth in Statistics

Page 62: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Teacher A results

Marks out of 100

Fre

quen

cy0 20 40 60 80 100

01

23

45

10 20 30 40 50 60 70 80 90 100

Teacher B results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

05

1015

10 20 30 40 50 60 70 80 90 100

But it may not be seen as very relevanthow bad the clearly failed students exactly are.(Would be different for upper outliers.)

Christian Hennig Model Assumptions and Truth in Statistics

Page 63: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Teacher A results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

01

23

45

10 20 30 40 50 60 70 80 90 100

Teacher B results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

05

1015

10 20 30 40 50 60 70 80 90 100

Could look at pass (≥ 50) rate (B clearly better)and other meaningful statistics (rank sum).

Christian Hennig Model Assumptions and Truth in Statistics

Page 64: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Teacher A results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

01

23

45

10 20 30 40 50 60 70 80 90 100

Teacher B results

Marks out of 100

Fre

quen

cy

0 20 40 60 80 100

05

1015

10 20 30 40 50 60 70 80 90 100

“Which aspect is of interest?”/meaning of datadominates “Which model is close to the truth.”

Christian Hennig Model Assumptions and Truth in Statistics

Page 65: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Data analytic approach:evaluated several characteristics,got more comprehensive picture.

Christian Hennig Model Assumptions and Truth in Statistics

Page 66: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

Data analytic approach:evaluated several characteristics,got more comprehensive picture.

Warning:I ran several significance tests,but this compromises test error probabilities.

Don’t necessarily expect that“significant different pass rate” will reproduce.

“Exploratory” interpretation of tests.

May check on independent data.

Christian Hennig Model Assumptions and Truth in Statistics

Page 67: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

p-values : highly controversial!I use them routinely in exploratory mannerfor checking whether what I spotcan be explained by random variation alone.

Christian Hennig Model Assumptions and Truth in Statistics

Page 68: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”

p-values : highly controversial!I use them routinely in exploratory mannerfor checking whether what I spotcan be explained by random variation alone.

Relying on p-values as dominating toolto “secure” scientific findings is dangerous.

It’s not the idea that is at faultbut people want too strong results too easily.

Christian Hennig Model Assumptions and Truth in Statistics

Page 69: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

4. The Bayes-frequentist controversy

Long-standing controversy about foundations of statistics:Should a different probability interpretation be applied?

Can problems with frequentism be resolved by(subjective or objective) Bayesian approaches?

Christian Hennig Model Assumptions and Truth in Statistics

Page 70: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

Bayes’s theorem:

P(H1|data) =P(data|H1)P(H1)∫

H∈HP(data|H)P(H)

Needed: sampling distribution P(data|H),prior P(H) on set of sampling models H.

Christian Hennig Model Assumptions and Truth in Statistics

Page 71: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

4.1 Bayesian interpretations of probability◮ Subjective Bayes (de Finetti) Probabilities measure

individual’s strength of belief in future outcomes,formalised as betting rates (epistemic probabilities).

◮ Objective Bayes (Jaynes) Similar, but believe thatobjectively rational strengths of belief can be found.

◮ Hidden frequentist/ “falsificationist” (Gelman)Use Bayesian methodologywith sampling distribution interpreted frequentist.

Only the latter can check models against data.

Christian Hennig Model Assumptions and Truth in Statistics

Page 72: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

The big question: Where does the prior come from?

Objective Bayesians want either non-informative orstrong “objective” justification from background -rarely available.

Christian Hennig Model Assumptions and Truth in Statistics

Page 73: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

The big question: Where does the prior come from?

Objective Bayesians want either non-informative orstrong “objective” justification from background -rarely available.

“Falsificationist”: could usefrequentist idea even for prior (“all similar studies”);could make sense for course mark databut needs strong backing with past data.

Christian Hennig Model Assumptions and Truth in Statistics

Page 74: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

The big question: Where does the prior come from?

Objective Bayesians want either non-informative orstrong “objective” justification from background -rarely available.

“Falsificationist”: could usefrequentist idea even for prior (“all similar studies”);could make sense for course mark databut needs strong backing with past data.

Subjective impact cannot be suppressed.

Christian Hennig Model Assumptions and Truth in Statistics

Page 75: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

4.2 Exchangeability

If you observe x1, . . . , xn, future probabilitiesdon’t depend on order of observations.

It’s essential for most Bayesian data analysis(at some level) - constructs “Bayesian repetition”.

De Finetti’s theorem:Exchangeability ⇔ P(data) =

∫H∈H

P(data|H)P(H)with i.i.d. model P(data|H).

Christian Hennig Model Assumptions and Truth in Statistics

Page 76: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

Exchangeability constructs “Bayesian repetition”;similar role as i.i.d. for frequentists.

Christian Hennig Model Assumptions and Truth in Statistics

Page 77: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

Exchangeability constructs “Bayesian repetition”;similar role as i.i.d. for frequentists.

Exchangeability implies thatP{1} in the next go doesn’t depend onwhether you observe0,0,1,0,1,1,1,0,0,1,0,1,1,0,1,1,0,0,1,0,1,0,0,1 or

0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0.

Seems counterintuitive as epistemic assumption,rather conscious decision to ignore deviations (as iid).

Christian Hennig Model Assumptions and Truth in Statistics

Page 78: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

From my point of view,the major philosophical problemswith Bayesian statisticsare about the same as with frequentism.

Christian Hennig Model Assumptions and Truth in Statistics

Page 79: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

From my point of view,the major philosophical problemswith Bayesian statisticsare about the same as with frequentism.

General problems of mathematical modelling,“creation” of repetition by i.i.d./exchangeability.

Christian Hennig Model Assumptions and Truth in Statistics

Page 80: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Bayesian interpretations of probabilityExchangeability

From my point of view,the major philosophical problemswith Bayesian statisticsare about the same as with frequentism.

General problems of mathematical modelling,“creation” of repetition by i.i.d./exchangeability.

Bayes & frequentismformalise different concepts of probability(data generating mechanism vs. epistemic)that can make sense in different situations.

Christian Hennig Model Assumptions and Truth in Statistics

Page 81: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

5. Cluster analysis and truth

Cluster analysis: methods to group data(“unsupervised classification”).

Christian Hennig Model Assumptions and Truth in Statistics

Page 82: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

−20 −15 −10 −5 0

−10

−8

−6

−4

−2

0

dc 1

dc 2

−4 −2 0 2 4

−4

−2

02

4

x

y

var 1

−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2

−0.

4−

0.2

0.0

0.2

0.4

−0.

4−

0.2

0.0

0.2

0.4

var 2

var 3

−0.

20.

00.

20.

4

−0.4 −0.2 0.0 0.2 0.4

−0.

4−

0.2

0.0

0.2

−0.2 0.0 0.2 0.4

var 4

81 82 66 70 68 73 76 65 63 61 67 72 74 37 75 71 69 64 41 62 46 54 47 51 45 38 42 48 56 58 43 57 60 49 55 50 39 40 53 44 36 59 52 8 12 30 26 16 32 5 4 9 19 28 3 23 20 17 11 6 18 31 25 10 35 34 29 7 1 15 14 2 13 21 22 27 24 33 80 79 77 78 171

170

173

172 90 85 93 84 83 88 87 86 89 91 92 94 95 96 98 102

101

100 97 104 99 103

106

105

168

169

117

151

136

156

116

138

148

166

167

133

121

165

131

149

142

120

111

127

130

112

135

126

119

122

155

109

115

145

125

152

124

154

158

108

107

114

144

143

157

132

146

159

150

110

137

113

123

141

164

139

118

128

134

140

162

153

129

147

160

161

163

234

226

235

236

231

233

229

228

222

230

227

232

224

223

225

208

182

177

198

197

185

199

219

189

183

181

220

216

205

194

178

200

176

193

179

195

215

190

188

175

191

201

184

203

186

211

210

209

217

204

206

207

213

196

180

187

202

212

192

218

214

174

221

818266706873766563616772743775716964416246544751453842485658435760495550394053443659528123026163254919283232017116183125103534297115142132122272433807977781711701731729085938483888786899192949596981021011009710499103106105168169117151136156116138148166167133121165131149142120111127130112135126119122155109115145125152124154158108107114144143157132146159150110137113123141164139118128134140162153129147160161163234226235236231233229228222230227232224223225208182177198197185199219189183181220216205194178200176193179195215190188175191201184203186211210209217204206207213196180187202212192218214174221

Christian Hennig Model Assumptions and Truth in Statistics

Page 83: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

What are the “true” clusters?

Need to connect substantial requirements tocharacteristics of cluster analysis methods.

Need a formal definition to measure method quality.

Christian Hennig Model Assumptions and Truth in Statistics

Page 84: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Intuitive clusters? (Often in literature)

−4 −2 0 2 4

−4

−2

02

4

x

y

Christian Hennig Model Assumptions and Truth in Statistics

Page 85: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Intuitive clusters?

−4 −2 0 2 4

−4

−2

02

4

x

y

−4 −2 0 2 4

−4

−2

02

4

x

y

−4 −2 0 2 4

−4

−2

02

4

x

y

. . . but may be inappropriate, e.g.,if small within-cluster distances required.

Christian Hennig Model Assumptions and Truth in Statistics

Page 86: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

(Normal) mixture components in mixture distribution?

−5 0 5 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

x

dens

ity

Christian Hennig Model Assumptions and Truth in Statistics

Page 87: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

(Normal) mixture components in mixture distribution?

−5 0 5 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

x

dens

ity

−4 −2 0 2 4 6 8

0.00

0.05

0.10

0.15

x

0.48

* d

norm

(x, 0

, 1)

+ 0

.48

* dn

orm

(x, 5

, 1)

+ 0

.04

* dn

orm

(x,

2

.5, 1

)

. . . but may not “cluster”.

Christian Hennig Model Assumptions and Truth in Statistics

Page 88: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Sometimes we know “true” clusters, don’t we?

M

M

M

MM

M

M

M

M

M MM

M

MM

MM

M

M

M M

MMM

MMM M

M

M

M

M

M

MM

MMM

M

MM

MM

M

M

M

M

M

M

M

F

F

F FF

F

FF

F

F

FF

F

F

F

F

FF

F

F

F

F

F

F

FF

F

F

F

F

F

FF

F

F

F

F

F

FF

F

F

F

F

F

F

FF

F

F

MMM

MM

M

M

M

M

MM

MM

MM

M M

M

M

MMMM

M

M

M

M

M

M

M

M

M

M

MM

M MM

M

M

M

M

M

MM

M

MMM

M

F

FF

F

F

FF

F

F

F

F

F

FF

FF

FFFF

F

F

F

F

F

F

F

F

F

F

F

F

F

FF

F

F

F

F

F

F

F

F

FF

F

F

F

F

F

−20 −15 −10 −5 0

−10

−8

−6

−4

−2

0

dc 1

dc 2

(These look quite “unclustery”.)

Christian Hennig Model Assumptions and Truth in Statistics

Page 89: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Actually we know some more:

B

B

B

BB

B

B

B

B

B BB

B

BB

BB

B

B

B B

BBB

BBB B

B

B

B

B

B

BB

BBB

B

BB

BB

B

B

B

B

B

B

B

B

B

B BB

B

BB

B

B

BB

B

B

B

B

BB

B

B

B

B

B

B

BB

B

B

B

B

B

BB

B

B

B

B

B

BB

B

B

B

B

B

B

BB

B

B

OOO

OO

O

O

O

O

OO

OO

OO

O O

O

O

OOOO

O

O

O

O

O

O

O

O

O

O

OO

O OO

O

O

O

O

O

OO

O

OOO

O

O

OO

O

O

OO

O

O

O

O

O

OO

OO

OOOO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

−20 −15 −10 −5 0

−10

−8

−6

−4

−2

0

dc 1

dc 2

In reality there may be many valid classificationsand only one may be known to us.

Christian Hennig Model Assumptions and Truth in Statistics

Page 90: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

In literature, methods are advertised asfinding the “natural” clusters, assumed unique.

Christian Hennig Model Assumptions and Truth in Statistics

Page 91: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

In literature, methods are advertised asfinding the “natural” clusters, assumed unique.

Literature is not open about researcher’s needto decide what kind of clusters are requiredin given application.Suggests data can make all the decisions.

Christian Hennig Model Assumptions and Truth in Statistics

Page 92: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

Researcher needs to “construct” cluster concept:by what kind of principle should clusters be separated?

(Small within-cluster distances, separation,probability mixture components, centroids etc.)

E.g., need to specify how the idea of a “species”is connected to genetic or phenotype measurements.

Christian Hennig Model Assumptions and Truth in Statistics

Page 93: Model Assumptions and Truth in Statisticsucakche/presentations/modelass0115.pdf · Cluster analysis and truth Model Assumptions and Truth in Statistics Christian Hennig ... Cluster

Statistics and PhilosophyMathematical models and reality

Frequentist probabilitiesThe Bayes-frequentist controversy

Cluster analysis and truth

6. Conclusion◮ Foundations of statistics are quite shaky

for those who hope to discover “objective truth”.◮ Acknowledge what has to be decided subjectively

(hopefully well informed).◮ Acknowledge basic problems and limits

of mathematical modelling.◮ Acknowledge need for assumptions

that cannot be checked.◮ Motivate chosen method and (model) checks

from desired interpretation.

Christian Hennig Model Assumptions and Truth in Statistics