model assumptions and truth in statisticsucakche/presentations/modelass0115.pdf · cluster analysis...
TRANSCRIPT
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Model Assumptions and Truth in Statistics
Christian Hennig
4 February 2015
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
1. Statistics and Philosophy
Statistics is about getting knowledge from data.
What can we know from data?
How can we find out?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ What is probability? - frequentist vs. Bayesian vs. ?
Observers see what happens but probabilityis about “what could have happenedother than what happened”.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ What is probability? - frequentist vs. Bayesian vs. ?
Observers see what happens but probabilityis about “what could have happenedother than what happened”.
◮ Assessing evidence - testing hypotheses,p-values, probabilities of hypotheses. . .
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ What is probability? - frequentist vs. Bayesian vs. ?
Observers see what happens but probabilityis about “what could have happenedother than what happened”.
◮ Assessing evidence - testing hypotheses,p-values, probabilities of hypotheses. . .
often confused with interpretations of probability,but Bayesian methods can work with frequentistprobabilities.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ What is probability? - frequentist vs. Bayesian vs. ?
Observers see what happens but probabilityis about “what could have happenedother than what happened”.
◮ Assessing evidence - testing hypotheses,p-values, probabilities of hypotheses. . .
often confused with interpretations of probability,but Bayesian methods can work with frequentistprobabilities.
◮ Modelling and model assumptions -where does the model come from?How can assumptions be assessed?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ Statistics vs. Data Analysis - our own “PoS vs. STS”.
Data Analysis is about much more than probability modelsand classical inference.Prediction, exploratory analysis, data compression,pattern recognition, data visualisation, stability,. . .
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ Statistics vs. Data Analysis - our own “PoS vs. STS”.
Data Analysis is about much more than probability modelsand classical inference.Prediction, exploratory analysis, data compression,pattern recognition, data visualisation, stability,. . .
Data Analysts: Only use statistical models where they help,often other methods are better.
Statisticians: St. models improve pretty much everything.(And we have experimental design.)
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ Statistics vs. Data Analysis - our own “PoS vs. STS”.
Data Analysis is about much more than probability modelsand classical inference.Prediction, exploratory analysis, data compression,pattern recognition, data visualisation, stability,. . .
Data Analysts: Only use statistical models where they help,often other methods are better.
Statisticians: St. models improve pretty much everything.(And we have experimental design.)
(Parts of the field claimed by Machine Learning now,role of “black box prediction machines”?)
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ Objectivity vs. subjectivity -
automatic/standardised decisions vs. researcher’sdecisions
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ Objectivity vs. subjectivity -
automatic/standardised decisions vs. researcher’sdecisions
◮ Measuring quality of methodology -assumed truth (what is the truth we want to find)?prediction error? replication? interpretability?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Issues◮ Objectivity vs. subjectivity -
automatic/standardised decisions vs. researcher’sdecisions
◮ Measuring quality of methodology -assumed truth (what is the truth we want to find)?prediction error? replication? interpretability?
◮ . . . and more - theory of measurement,role of visualisation, statistics communication. . .
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Overview
1. Statistics and Philosophy
2. Mathematical models and reality
3. Frequentist probabilities
4. The Bayes-frequentist controversy
5. Cluster analysis and truth
6. Conclusion
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
2. Mathematical models and reality - some data
Teacher A results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
01
23
45
10 20 30 40 50 60 70 80 90 100
Teacher B results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
05
1015
10 20 30 40 50 60 70 80 90 100
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Standard model:
x1, . . . , xn ∼ N (µ1, σ21) i.i.d.,
y1, . . . , ym ∼ N (µ2, σ22) i.i.d.
Estimate µ1, µ2 by means, x̄1 = 58.6, x̄2 = 56.9,teacher A students do better, but not significantly so(p = 0.455 under µ1 = µ2).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Standard model:
x1, . . . , xn ∼ N (µ1, σ21) i.i.d.,
y1, . . . , ym ∼ N (µ2, σ22) i.i.d.
Estimate µ1, µ2 by means, x̄1 = 58.6, x̄2 = 56.9,teacher A students do better, but not significantly so(p = 0.455 under µ1 = µ2).
What do the model assumptions mean,and do they make sense?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
With fitted distributions N (µ1, σ21),N (µ2, σ
22)
Teacher A results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
01
23
45
10 20 30 40 50 60 70 80 90 100
Teacher B results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
05
1015
10 20 30 40 50 60 70 80 90 100
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
We actually know that the model assumptions are violated.The normal distribution has an unlimited value range,and produces integer values with probability zero.Actually for such reasons we know thatno data observed by humans can ever be “truly normal”.
But how much harm does this do?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
General mathematical modelling
Identification of items of perceived realitywith mathematical objectsand interpretation of results of mathematical operationsin terms of the items.
“Trivial” manipulation of counts and measurements,deterministic and stochastic models,analytical and computer models,data fitting and mechanistic models.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Constructivist view (H., 2010, Foundations of Science)
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Implications for mathematical modelling◮ Science: establishing agreement in open exchange.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Implications for mathematical modelling◮ Science: establishing agreement in open exchange.◮ Mathematics is about creating a system that makes
absolute agreement possible(but only within mathematics).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Implications for mathematical modelling◮ Science: establishing agreement in open exchange.◮ Mathematics is about creating a system that makes
absolute agreement possible(but only within mathematics).
◮ Mathematical modelling is not about how things are,but about how we think and communicate about them.(Models communicate and change views of reality.)
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Implications for mathematical modelling◮ Science: establishing agreement in open exchange.◮ Mathematics is about creating a system that makes
absolute agreement possible(but only within mathematics).
◮ Mathematical modelling is not about how things are,but about how we think and communicate about them.(Models communicate and change views of reality.)
◮ It cannot be formally analysed how formal models arerelated to informal reality.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Implications for mathematical modelling◮ Science: establishing agreement in open exchange.◮ Mathematics is about creating a system that makes
absolute agreement possible(but only within mathematics).
◮ Mathematical modelling is not about how things are,but about how we think and communicate about them.(Models communicate and change views of reality.)
◮ It cannot be formally analysed how formal models arerelated to informal reality.
◮ Pragmatist attitude: what do we get out of it?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Realism?
Models are about personal and social perceptions.Science: can check models againstobservations from agreed measurement procedures.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Realism?
Models are about personal and social perceptions.Science: can check models againstobservations from agreed measurement procedures.
Compatible with Chang’s Active Scientific Realism :“I take reality as whatever is not subject to one’s will, andknowledge as an ability to act without being frustrated byresistance from reality.”We acknowledge reality’s “resistance”and that this is what science is about -without accessing “truth”about observer-independent reality.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Antony Gormley - Allotment
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
3. Frequentist probabilities (e.g., von Mises)3.1 What do the model assumptions mean?For example,
x1, . . . , xn ∼ N (µ1, σ21) i.i.d.,
y1, . . . , ym ∼ N (µ2, σ22) i.i.d.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
What do the model assumptions mean?
“We think of the situation as . . . ”◮ Potentially infinite repetition (of experimental conditions)◮ P(A): relative frequency limit of occurrence of A
(e.g., normal distribution is defined by P(A) ∀A.)
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
What do the model assumptions mean?
“We think of the situation as . . . ”◮ Potentially infinite repetition (of experimental conditions)◮ P(A): relative frequency limit of occurrence of A
(e.g., normal distribution is defined by P(A) ∀A.)
This is obviously an idealisation -and what constitutes a “repetition”?
“Whatever can be distinguished cannot be identical.”(B. de Finetti)
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
3.2 Independent repetitions (“i.i.d.”)
Frequentism relies on “repetitions of experiments”,e.g. results from different patients in clinical study,different students in the exam.
“x1, . . . , xn ∼ N (µ1, σ21) i.i.d.” defines
a probability distribution for the full dataset,e.g., P(x1 ∈ A1, x2 ∈ A2) = P(x1 ∈ A1)P(x2 ∈ A2).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
3.2 Independent repetitions (“i.i.d.”)
Frequentism relies on “repetitions of experiments”,e.g. results from different patients in clinical study,different students in the exam.
“x1, . . . , xn ∼ N (µ1, σ21) i.i.d.” defines
a probability distribution for the full dataset,e.g., P(x1 ∈ A1, x2 ∈ A2) = P(x1 ∈ A1)P(x2 ∈ A2).
“i.i.d.” is defined in terms of probabilities.
Probabilities cannot be defined in terms ofi.i.d. repetitions.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
In order to define “i.i.d.” sequences, i.i.d. repetitions anddefining repetitions are required on different levels.
x1 x2 x3 x4 x5 ... xn ...
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
In order to define “i.i.d.” sequences, i.i.d. repetitions anddefining repetitions are required on different levels.
x1 x2 x3 x4 x5 ... xn ...
x21 x22 x23 x24 x25 ... x2n ...
x31 x32 x33 x34 x35 ... x3n ...
x41 x42 x43 x44 x45 ... x4n ...
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
In practice, there’s only one level of repetition.
The effective sample size for(in-)dependence assumptions is usually 1.
But independent repetition of some kindis always required in order to learn from data.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
In practice, there’s only one level of repetition.
The effective sample size for(in-)dependence assumptions is usually 1.
But independent repetition of some kindis always required in order to learn from data.
Independent repetition is constructed by conscious decisionto ignore potential dependencies and differences.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
3.3 Can frequentist model assumptions be checked?
Goodness-of-fit/misspecification tests (Mayo & Spanos)If something modelled as very unlikely happens,the model is interpreted to be falsified.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
3.3 Can frequentist model assumptions be checked?
Goodness-of-fit/misspecification tests (Mayo & Spanos)If something modelled as very unlikely happens,the model is interpreted to be falsified.
For example, could “falsify” independencefrom data where students at same tabletend to have similar results.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
3.3 Can frequentist model assumptions be checked?
Goodness-of-fit/misspecification tests (Mayo & Spanos)If something modelled as very unlikely happens,the model is interpreted to be falsified.
For example, could “falsify” independencefrom data where students at same tabletend to have similar results.
Implicitly assumes tables to be i.i.d.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Dependence (between what’s “close”) can only be foundby contrast to independence between what’s further away.
If everything’s dependent on everything else in same manner(or in irregularly different manners),this cannot be found.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
3.4 The goodness-of-fit paradox (H, 2007)
Checking the model assumptions violates them automaticallybecause the possibility of unlikely eventsis constitutive part of the models.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
3.5 “Approximately true?”
Model assumptions are violated -is it good enough if they are “approximately true”?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
“The model assumptions hold approximately” -precise meaning could only be:“We assume Q ∈ P and there is a true P so thatminQ∈P d(P,Q) is small.”
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
“The model assumptions hold approximately” -precise meaning could only be:“We assume Q ∈ P and there is a true P so thatminQ∈P d(P,Q) is small.”
Dissimilarity measure d is required.
Distances for data analysis (L. Davies):d(P,Q) small ⇔ difficult to distinguish data from P,Q.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Problem: In every small d -neighbourhood of Qthere are distributions that behave very differently.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Problem: In every small d -neighbourhood of Qthere are distributions that behave very differently.
Q = N (µ, σ2) has expectation µ.Expectation of P may be anywhere or not exist.Likelihood ratio of Q and P may take any value.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
If P 6= N (µ, σ2), what is estimated estimating µ?
Median, expectation, mode are identical for N (µ, σ2),but different for asymmetric P.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
If P 6= N (µ, σ2), what is estimated estimating µ?
Median, expectation, mode are identical for N (µ, σ2),but different for asymmetric P.
Robustness theory: what characteristics of distributionsdon’t change much in d -neighbourhoods?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
If P 6= N (µ, σ2), what is estimated estimating µ?
Median, expectation, mode are identical for N (µ, σ2),but different for asymmetric P.
Robustness theory: what characteristics of distributionsdon’t change much in d -neighbourhoods?
In any case, cannot rule out too irregular distributions(such as complex dependence patterns).Some assumptions need to be imposed without checking.
Should at least reflect our perception.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
3.6 “Frequentism-as-model”, a new perspective
(Frequentist) models are still useful to. . .◮ communicate researcher’s perception of situation,◮ inspire methodology,◮ check quality of methodology
in situations with known (made up) truth.(Tukey, L. Davies)
◮ give an idea of potential variation to be expected.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Re-formulation of “model checking” :Find out whether data could leadstatistical method (derived from model) astray(requires statistical theory/simulations).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Re-formulation of “model checking” :Find out whether data could leadstatistical method (derived from model) astray(requires statistical theory/simulations).
Some violations are not problematic(e.g., g.o.f.-testing is rarely problematic,discreteness of data assumed normal doesn’t hurt much).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Re-formulation of “model checking” :Find out whether data could leadstatistical method (derived from model) astray(requires statistical theory/simulations).
Some violations are not problematic(e.g., g.o.f.-testing is rarely problematic,discreteness of data assumed normal doesn’t hurt much).
Depends on aim of analysis (researcher’s construct)which features of model are important.
Major motivation of methodology should come fromdesired interpretation of results.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Students from same year: clearly not independent(but may not show dependence pattern),“identical” only by ignoring background(iid is constructed by selective ignorance).
(That’s the view the model carries.)
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Teacher A results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
01
23
45
10 20 30 40 50 60 70 80 90 100
Teacher B results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
05
1015
10 20 30 40 50 60 70 80 90 100
Means: 58.6 (A), 56.9 (B)Medians: 58 (A), 59 (B).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Teacher A results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 1000
12
34
510 20 30 40 50 60 70 80 90 100
Teacher B results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
05
1015
10 20 30 40 50 60 70 80 90 100
Can rejectequality of distributions (Kolmogorov-Smirnov p = 0.012),neither means nor medians differ significantly.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Teacher A results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
01
23
45
10 20 30 40 50 60 70 80 90 100
Teacher B results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
05
1015
10 20 30 40 50 60 70 80 90 100
Lower outliers in B suggest median.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Teacher A results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
01
23
45
10 20 30 40 50 60 70 80 90 100
Teacher B results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
05
1015
10 20 30 40 50 60 70 80 90 100
Favour mean: observations are not erroneous,so all observations should contribute!?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Teacher A results
Marks out of 100
Fre
quen
cy0 20 40 60 80 100
01
23
45
10 20 30 40 50 60 70 80 90 100
Teacher B results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
05
1015
10 20 30 40 50 60 70 80 90 100
But it may not be seen as very relevanthow bad the clearly failed students exactly are.(Would be different for upper outliers.)
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Teacher A results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
01
23
45
10 20 30 40 50 60 70 80 90 100
Teacher B results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
05
1015
10 20 30 40 50 60 70 80 90 100
Could look at pass (≥ 50) rate (B clearly better)and other meaningful statistics (rank sum).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Teacher A results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
01
23
45
10 20 30 40 50 60 70 80 90 100
Teacher B results
Marks out of 100
Fre
quen
cy
0 20 40 60 80 100
05
1015
10 20 30 40 50 60 70 80 90 100
“Which aspect is of interest?”/meaning of datadominates “Which model is close to the truth.”
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Data analytic approach:evaluated several characteristics,got more comprehensive picture.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
Data analytic approach:evaluated several characteristics,got more comprehensive picture.
Warning:I ran several significance tests,but this compromises test error probabilities.
Don’t necessarily expect that“significant different pass rate” will reproduce.
“Exploratory” interpretation of tests.
May check on independent data.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
p-values : highly controversial!I use them routinely in exploratory mannerfor checking whether what I spotcan be explained by random variation alone.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What do the model assumptions mean?Independent repetitionsCan frequentist model assumptions be checked?The goodness-of-fit paradoxApproximately true?“Frequentism-as-model”
p-values : highly controversial!I use them routinely in exploratory mannerfor checking whether what I spotcan be explained by random variation alone.
Relying on p-values as dominating toolto “secure” scientific findings is dangerous.
It’s not the idea that is at faultbut people want too strong results too easily.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
4. The Bayes-frequentist controversy
Long-standing controversy about foundations of statistics:Should a different probability interpretation be applied?
Can problems with frequentism be resolved by(subjective or objective) Bayesian approaches?
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
Bayes’s theorem:
P(H1|data) =P(data|H1)P(H1)∫
H∈HP(data|H)P(H)
Needed: sampling distribution P(data|H),prior P(H) on set of sampling models H.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
4.1 Bayesian interpretations of probability◮ Subjective Bayes (de Finetti) Probabilities measure
individual’s strength of belief in future outcomes,formalised as betting rates (epistemic probabilities).
◮ Objective Bayes (Jaynes) Similar, but believe thatobjectively rational strengths of belief can be found.
◮ Hidden frequentist/ “falsificationist” (Gelman)Use Bayesian methodologywith sampling distribution interpreted frequentist.
Only the latter can check models against data.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
The big question: Where does the prior come from?
Objective Bayesians want either non-informative orstrong “objective” justification from background -rarely available.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
The big question: Where does the prior come from?
Objective Bayesians want either non-informative orstrong “objective” justification from background -rarely available.
“Falsificationist”: could usefrequentist idea even for prior (“all similar studies”);could make sense for course mark databut needs strong backing with past data.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
The big question: Where does the prior come from?
Objective Bayesians want either non-informative orstrong “objective” justification from background -rarely available.
“Falsificationist”: could usefrequentist idea even for prior (“all similar studies”);could make sense for course mark databut needs strong backing with past data.
Subjective impact cannot be suppressed.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
4.2 Exchangeability
If you observe x1, . . . , xn, future probabilitiesdon’t depend on order of observations.
It’s essential for most Bayesian data analysis(at some level) - constructs “Bayesian repetition”.
De Finetti’s theorem:Exchangeability ⇔ P(data) =
∫H∈H
P(data|H)P(H)with i.i.d. model P(data|H).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
Exchangeability constructs “Bayesian repetition”;similar role as i.i.d. for frequentists.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
Exchangeability constructs “Bayesian repetition”;similar role as i.i.d. for frequentists.
Exchangeability implies thatP{1} in the next go doesn’t depend onwhether you observe0,0,1,0,1,1,1,0,0,1,0,1,1,0,1,1,0,0,1,0,1,0,0,1 or
0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0.
Seems counterintuitive as epistemic assumption,rather conscious decision to ignore deviations (as iid).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
From my point of view,the major philosophical problemswith Bayesian statisticsare about the same as with frequentism.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
From my point of view,the major philosophical problemswith Bayesian statisticsare about the same as with frequentism.
General problems of mathematical modelling,“creation” of repetition by i.i.d./exchangeability.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Bayesian interpretations of probabilityExchangeability
From my point of view,the major philosophical problemswith Bayesian statisticsare about the same as with frequentism.
General problems of mathematical modelling,“creation” of repetition by i.i.d./exchangeability.
Bayes & frequentismformalise different concepts of probability(data generating mechanism vs. epistemic)that can make sense in different situations.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
5. Cluster analysis and truth
Cluster analysis: methods to group data(“unsupervised classification”).
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
−20 −15 −10 −5 0
−10
−8
−6
−4
−2
0
dc 1
dc 2
−4 −2 0 2 4
−4
−2
02
4
x
y
var 1
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2
−0.
4−
0.2
0.0
0.2
0.4
−0.
4−
0.2
0.0
0.2
0.4
var 2
var 3
−0.
20.
00.
20.
4
−0.4 −0.2 0.0 0.2 0.4
−0.
4−
0.2
0.0
0.2
−0.2 0.0 0.2 0.4
var 4
81 82 66 70 68 73 76 65 63 61 67 72 74 37 75 71 69 64 41 62 46 54 47 51 45 38 42 48 56 58 43 57 60 49 55 50 39 40 53 44 36 59 52 8 12 30 26 16 32 5 4 9 19 28 3 23 20 17 11 6 18 31 25 10 35 34 29 7 1 15 14 2 13 21 22 27 24 33 80 79 77 78 171
170
173
172 90 85 93 84 83 88 87 86 89 91 92 94 95 96 98 102
101
100 97 104 99 103
106
105
168
169
117
151
136
156
116
138
148
166
167
133
121
165
131
149
142
120
111
127
130
112
135
126
119
122
155
109
115
145
125
152
124
154
158
108
107
114
144
143
157
132
146
159
150
110
137
113
123
141
164
139
118
128
134
140
162
153
129
147
160
161
163
234
226
235
236
231
233
229
228
222
230
227
232
224
223
225
208
182
177
198
197
185
199
219
189
183
181
220
216
205
194
178
200
176
193
179
195
215
190
188
175
191
201
184
203
186
211
210
209
217
204
206
207
213
196
180
187
202
212
192
218
214
174
221
818266706873766563616772743775716964416246544751453842485658435760495550394053443659528123026163254919283232017116183125103534297115142132122272433807977781711701731729085938483888786899192949596981021011009710499103106105168169117151136156116138148166167133121165131149142120111127130112135126119122155109115145125152124154158108107114144143157132146159150110137113123141164139118128134140162153129147160161163234226235236231233229228222230227232224223225208182177198197185199219189183181220216205194178200176193179195215190188175191201184203186211210209217204206207213196180187202212192218214174221
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
What are the “true” clusters?
Need to connect substantial requirements tocharacteristics of cluster analysis methods.
Need a formal definition to measure method quality.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Intuitive clusters? (Often in literature)
−4 −2 0 2 4
−4
−2
02
4
x
y
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Intuitive clusters?
−4 −2 0 2 4
−4
−2
02
4
x
y
−4 −2 0 2 4
−4
−2
02
4
x
y
−4 −2 0 2 4
−4
−2
02
4
x
y
. . . but may be inappropriate, e.g.,if small within-cluster distances required.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
(Normal) mixture components in mixture distribution?
−5 0 5 10
0.00
0.05
0.10
0.15
0.20
0.25
0.30
x
dens
ity
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
(Normal) mixture components in mixture distribution?
−5 0 5 10
0.00
0.05
0.10
0.15
0.20
0.25
0.30
x
dens
ity
−4 −2 0 2 4 6 8
0.00
0.05
0.10
0.15
x
0.48
* d
norm
(x, 0
, 1)
+ 0
.48
* dn
orm
(x, 5
, 1)
+ 0
.04
* dn
orm
(x,
2
.5, 1
)
. . . but may not “cluster”.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Sometimes we know “true” clusters, don’t we?
M
M
M
MM
M
M
M
M
M MM
M
MM
MM
M
M
M M
MMM
MMM M
M
M
M
M
M
MM
MMM
M
MM
MM
M
M
M
M
M
M
M
F
F
F FF
F
FF
F
F
FF
F
F
F
F
FF
F
F
F
F
F
F
FF
F
F
F
F
F
FF
F
F
F
F
F
FF
F
F
F
F
F
F
FF
F
F
MMM
MM
M
M
M
M
MM
MM
MM
M M
M
M
MMMM
M
M
M
M
M
M
M
M
M
M
MM
M MM
M
M
M
M
M
MM
M
MMM
M
F
FF
F
F
FF
F
F
F
F
F
FF
FF
FFFF
F
F
F
F
F
F
F
F
F
F
F
F
F
FF
F
F
F
F
F
F
F
F
FF
F
F
F
F
F
−20 −15 −10 −5 0
−10
−8
−6
−4
−2
0
dc 1
dc 2
(These look quite “unclustery”.)
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Actually we know some more:
B
B
B
BB
B
B
B
B
B BB
B
BB
BB
B
B
B B
BBB
BBB B
B
B
B
B
B
BB
BBB
B
BB
BB
B
B
B
B
B
B
B
B
B
B BB
B
BB
B
B
BB
B
B
B
B
BB
B
B
B
B
B
B
BB
B
B
B
B
B
BB
B
B
B
B
B
BB
B
B
B
B
B
B
BB
B
B
OOO
OO
O
O
O
O
OO
OO
OO
O O
O
O
OOOO
O
O
O
O
O
O
O
O
O
O
OO
O OO
O
O
O
O
O
OO
O
OOO
O
O
OO
O
O
OO
O
O
O
O
O
OO
OO
OOOO
O
O
O
O
O
O
O
O
O
O
O
O
O
OO
O
O
O
O
O
O
O
O
OO
O
O
O
O
O
−20 −15 −10 −5 0
−10
−8
−6
−4
−2
0
dc 1
dc 2
In reality there may be many valid classificationsand only one may be known to us.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
In literature, methods are advertised asfinding the “natural” clusters, assumed unique.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
In literature, methods are advertised asfinding the “natural” clusters, assumed unique.
Literature is not open about researcher’s needto decide what kind of clusters are requiredin given application.Suggests data can make all the decisions.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
Researcher needs to “construct” cluster concept:by what kind of principle should clusters be separated?
(Small within-cluster distances, separation,probability mixture components, centroids etc.)
E.g., need to specify how the idea of a “species”is connected to genetic or phenotype measurements.
Christian Hennig Model Assumptions and Truth in Statistics
Statistics and PhilosophyMathematical models and reality
Frequentist probabilitiesThe Bayes-frequentist controversy
Cluster analysis and truth
6. Conclusion◮ Foundations of statistics are quite shaky
for those who hope to discover “objective truth”.◮ Acknowledge what has to be decided subjectively
(hopefully well informed).◮ Acknowledge basic problems and limits
of mathematical modelling.◮ Acknowledge need for assumptions
that cannot be checked.◮ Motivate chosen method and (model) checks
from desired interpretation.
Christian Hennig Model Assumptions and Truth in Statistics