team: l. gustavo nardin, berna devezer, bert baumgaertner
Post on 22-Dec-2021
2 Views
Preview:
TRANSCRIPT
REPRODUCIBILITYTHEORY, SOCIAL ASPECTS, AND PRACTICE OF
Team: L. Gustavo Nardin, Berna Devezer, Bert
Baumgaertner, Erkan Buzbas
Updated: 2017/09/26
Overview
1. Philosophy, Bert Baumgaertner
2. Theory, Erkan Buzbas
3. Agent Based Model, L. Gustavo Nardin
1
Karl Popper (1902-1994)
Falsifiability; inductive skeptism (confirmation is a myth);deductive logic; logical empiricism; foundationalism.
3
Foundationalism, Holism, and Division of Labor
# Contra Popper, confirmation ought to play a role.# Holism: no hypothesis can be tested in isolation, all tests
are done against a backdrop of assumptions andbackground knowledge (K). There is no “view fromnowhere” and inferences are conditioned on K.
# Science is a collective enterprise. There is a cognitivedivision of labor and social structure matters.
A brief flavor of the history. . .
4
Robert Boyle (1627-1691)
Science is a way to resolve disputes by bringing relevantquestions into contact with an experimental apparatus. E.g.,Aristotlean physics and vacuums.
Systematic experiment is institutionally organized (the Boylecircle established the Royal Society).
5
Francis Bacon (1561-1626)
# Inspired Boyle with idea of cooperative researchinstitution.
# Science by “induction”: gather lots of particular facts andobservations, then generalize and make hypotheses.
# Said process requires: experimentation and systematicreporting of results
6
Roger Bacon (1214/20-1292)
# Aim: develop method of science, like logic tests forvalidity; certification of knowledge (halos are visual errors).
# Role of experiment: confirm, refute, or challengetheoretical claims. He repeated experiments himself. 7
And more. . .
# Avicenna (980–1037) Persian physician and philosopher;principle of replication in medical experiments.
# Alhazen (965–1040) Theoretical physicist and polymath;science is a repeating cycle of observations, hypothesis,experimentation, and independent verification.
# Al-Biruni (973–1048) Islamic scholar and polymath;experiments should be repeated to mitigate human andinstrumentation error.
# Athenaeus of Naucratis (late 2nd to early 3rd century)Wrote of Egyptian judge ordering repeated experiments todetermine if criminals survive death by venomous asps byeating lemons beforehand.
# Galen of Pergamon (c.129-200BC) Roman physician andphilosopher; repeated observations.
8
Reproducibility in Science
Multiple independent validations of scientific claims lendcredibility to those claims. This is referred to as thereproducibility of research findings.
# Many disciplines fail to reproduce their major findings.# A number of factors has been suggested to contribute to
nonreproducibility problem.# A variety of solutions have been offered.# No underlying theory guiding these efforts.
9
Insights from Philosophy and Motivation
1. The issues surrounding reproducibility of results are notnew to science.
2. Open, collaborative practices in scientific inquiry havebeen intuitively proposed as a key to solve these issues.
3. A formal theory to validate this intuition is needed for atechnical discussion of reproducibility.
Our programme is to provide precise definitions for conceptsinvolved in (1) and (2), and build a theory satisfying (3).
10
Two Theoretical References
Ioannidis (2005). Why Most Published Research FindingsAre False.McElreath and Smaldino (2015). Replication,Communication, and the Population Dynamics ofScientific Discovery.
12
A Model-Centric View of Scientific Inquiry
# A model-centric view of scientific inquiry:There exists a true model generating the data and the goalis to approximate it.
# A hypothesis-centric view of scientific inquiry:Make claims about plausible values of parameters,assuming that a model is true.
13
Idealized Experiment under Model-Centric View
True Model (T)
Background knowledge (K)
Data (D)
Proposed Model (M)
Method (S)
E = (K,D,M,S) is an idealized experiment.
NATUREE
14
Idealized Experiment under Model-Centric View
True Model (T)
Background knowledge (K)
Data (D)
Proposed Model (M)
Method (S)
E = (K,D,M,S) is an idealized experiment.
NATUREE
15
Method under Model-Centric View
True Model (T)
Background knowledge (K)
Data (D)
Proposed Model (M)
Method (S)
Method measures model performance:
1. Rewards model fit
P(M|D,K)>P(M|K)
2. Penalizes model complexity, everything else
being equal
NATUREE
16
Comparison of Experiments under Model-Centric View
In comparisons of model performances,
a useful construct is a current ''best'' model,
which we call a Global Model (G).
Method then assigns a score S(M,G)
True Model (T)
Background knowledge (K)
Data (D)
Proposed Model (M)
Method (S)
NATURE
Global Model (G)
E
17
Reproducibility under Model-Centric View
True Model (T)
Background knowledge (K1)
Data (D1)
Proposed Model (M1)
Method (S)
NATURE
Global Model (G)
True Model (T)
Background knowledge (K2)
Data (D2)
Proposed Model (M2)
Method (S)
NATURE
Global Model (G)
E1 E2
The result of E1 is reproduced by E2 only if:
1. S(M1,G) and S(M2,G) agree
2. E2 knows (1)
(2) is true only if K2 includes the information
given in (1). This implies that:
Open Science is a logical necessity of
reproducibility of scientific results.
18
Reproducibility under Model-Centric View
True Model (T)
Background knowledge (K1)
Data (D1)
Proposed Model (M1)
Method (S)
NATURE
Global Model (G)
True Model (T)
Background knowledge (K2)
Data (D2)
Proposed Model (M2)
Method (S)
NATURE
Global Model (G)
E1 E2
The result of E1 is reproduced by E2 only if:
1. S(M1,G) and S(M2,G) agree
2. E2 knows (1)
(2) is true only if K2 includes the information
given in (1). This implies that:
Open Science is a logical necessity of
reproducibility of scientific results.
19
From Scientific Inquiry to Scientific Process
Global model is updated in a sequence of
experiments. Later experiments have
some knowledge of previous experiments.
Experiments are of the same type.
True Model (T)
D1
K1
SE1
M1
G1G0
D2
K2
SE2
M2
G2
S(M1,G0) S(M2,G1)
20
Scientific Process with Multiple Strategies
Given Gt , at time t :1. Kt uses
◦ M� {M1 ,M2 , ...,ML}◦ R � {R1 , R2 , ..., R J}
to probabilistically choose a research strategy R j andpropose a model Mt
2. Dt is generated under T
3. S(Mt ,Gt) determines Gt+1
4. Convergence of the scientific community to T is monitored
22
Theoretical Model Probabilities
The system defines a Markov chain transitioning betweenglobal models. Transition probabilities are given by:
P(M |Gt) �
∑R
P(M |R,Gt)P(R)
� P(M improves Gt)P(propose M |R,Gt)P(R)
� P(S(M,Gt) > 1)(
1#{MR}
)P(R)
23
Theoretical Model Probabilities
Assigning positive probabilities implies the existence of astationary distribution for this Markov chain:In the long term, the probability of each model in themodel space to become the global converges to a constantindependent of the current global model.
24
Agent Based Model
# ABM is a computational technique that enables simulatingthe actions and interactions of agents.
# Agents are entities that behave on the basis of behavioralmechanisms and representations.
# ABMmay be used to evaluate how different initialconditions and behavioral mechanisms affect the systemoutcomes. Thereby they enable◦ users to gain a deeper understand of the system’s causal
relationships and◦ policy-makers to analyze the effects of different policies in
silico before intervening on the real system.
26
Objective
Develop an ABM representing the scientific discoveryincorporating behavioral, institutional, and methodological aspectsto inform evidence-based policies for addressing the problemof nonreproducibility.
27
Objective
Develop an ABM representing the scientific discoveryincorporating behavioral, institutional, and methodological aspectsto inform evidence-based policies for addressing the problemof nonreproducibility.
28
ABM Description
# Set of heterogeneous scientist agents
# that act using different strategies for doing their research# aiming to converge on the model that represents the
phenomenon under investigation, the True Model.
29
ABM Description
# Set of heterogeneous scientist agents# that act using different strategies for doing their research
# aiming to converge on the model that represents thephenomenon under investigation, the True Model.
Replicator Rey
Robust Rob
Concept Cara
Novel Nell
Agents
29
ABM Description
# Set of heterogeneous scientist agents# that act using different strategies for doing their research# aiming to converge on the model that represents the
phenomenon under investigation, the True Model.
Replicator Rey
Robust Rob
Concept Cara
Novel Nell
Agents
29
Research Questions
1. Does the scientific community converge on the TrueModel?
2. Does the True Model complexity affects thereproducibility?
3. Does scientists behavior affects the reproducibility?
30
ABM Structure
# True model: β1x1 + β2x2 + β12x1x2
Model under which the investigated phenomenongenerates the data.
model mean
pro
bab
ility
densi
ty
31
ABM Structure
# True model: β1x1 + β2x2 + β12x1x2
Model under which the concerned natural phenomenongenerates the data.
# Global model: β1x1
Model that the scientific community agrees on asgenerating the data at certain moment in time.
# Scientific Community
Replicator Rey
Robust Rob
Concept Cara
Novel Nell
Agents
32
ABM Structure
# True model: β1x1 + β2x2 + β12x1x2
Model under which the concerned natural phenomenongenerates the data.
# Global model: β1x1
Model that the scientific community agrees on asgenerating the data at certain moment in time.
# Scientific Community
Replicator Rey
Robust Rob
Concept Cara
Novel Nell
Agents
32
Scientist Agents
# Rey (the Replicator) conducts an exact replication of thestudy that immediately precedes her in the population.
# Rob (the Robust) proposes a new model from the subset ofmodels that are one term away from the Global Model.
# Cara (the Concept) proposes a new model by adding a newinteraction to the Global Model.
# Nell (the Novel) does not build off of the literature butcomes up with her original ideas, choosing a random modelfrom the Universe of Models.
33
Scientist Agents
# Rey (the Replicator) conducts an exact replication of thestudy that immediately precedes her in the population.
# Rob (the Robust) proposes a new model from the subset ofmodels that are one term away from the Global Model.
# Cara (the Concept) proposes a new model by adding a newinteraction to the Global Model.
# Nell (the Novel) does not build off of the literature butcomes up with her original ideas, choosing a random modelfrom the Universe of Models.
33
Scientist Agents
# Rey (the Replicator) conducts an exact replication of thestudy that immediately precedes her in the population.
# Rob (the Robust) proposes a new model from the subset ofmodels that are one term away from the Global Model.
# Cara (the Concept) proposes a new model by adding a newinteraction to the Global Model.
# Nell (the Novel) does not build off of the literature butcomes up with her original ideas, choosing a random modelfrom the Universe of Models.
33
Scientist Agents
# Rey (the Replicator) conducts an exact replication of thestudy that immediately precedes her in the population.
# Rob (the Robust) proposes a new model from the subset ofmodels that are one term away from the Global Model.
# Cara (the Concept) proposes a new model by adding a newinteraction to the Global Model.
# Nell (the Novel) does not build off of the literature butcomes up with her original ideas, choosing a random modelfrom the Universe of Models.
33
ABM Dynamics: Initialization
Given a set of input parameters:
# generates the Scientific Community (Prop. Agent Strategies).# generates the Universe of Scientific Models (Predictors).# defines the deterministic part of the True Model by
randomly generating◦ a set of predictor values (Sample Size and Correlation), and◦ β values (Error Variance).
34
ABM Dynamics (1)
# Selects a Scientist Agent randomly
select ascientist
agent randomly
Replicator Rey
Robust Rob
Concept Cara
Novel Nell
Agents1
# True Model: β1x1 + β2x2 + β12x1x2
# Global Model: β1x1 + β2x2 + β3x3 + β12x1x2
35
ABM Dynamics (2)
# The Scientist Agent based on the current Global Model andits own strategy proposes a new model
select ascientist
agent randomly
Global Model
Replicator Rey
Robust Rob
Concept Cara
Novel Nell
Agents1 2
Proposed Model
Universeof Models
add orremovea term
# True Model: β1x1 + β2x2 + β12x1x2
# Global Model: β1x1 + β2x2 + β3x3 + β12x1x2
# Proposed Model: β1x1 + β3x336
ABM Dynamics (3)
# The quality of the Proposed and the Global Model arecompared for the new data generated by the True Model(e.g, AIC, R-Square, T Statistics)
select ascientist
agent randomly
True Model
Global Model
new data
NewGlobal Model
Replicator Rey
Robust Rob
Concept Cara
Novel Nell
Agents1 2 3
Proposed Model
Universeof Models
add orremovea term
Global ModelProposed
Model
compare
# True Model: β1x1 + β2x2 + β12x1x2
# Global Model: β1x1 + β2x2 + β3x3 + β12x1x2
# Proposed Model: β1x1 + β3x3
# New Global Model: β1x1 + β2x2 + β3x3 + β12x1x2 37
Experiment
Parameter ValuesReplications 100Timesteps 10000Universe of
14 (3 predictors)ModelsSample Size 100Comparison
AICMethod
True Modelβ1x1 + β2x2 + β3x3
β1x1 + β2x2 + β12x1x2
β1x1 + β1x2 + β1x3 + β12x1x2 + β13x1x3 + β23x2x3 + β123x1x2x3
Error 20%Variance 60%
Correlation0.20.8 38
Experiment
Parameter Values
Agent Types
5 combinations# 4 combinations in which the Scientific
Community is dominated by 85% of oneAgent Strategy and 5% of the other threeAgent Strategies
# 1 combination in which the ScientificCommunity is composed of 25% of eachAgent Strategy
39
Results: Main Effect of the True Model Complexity
The more complex the True Model, the longer it stays as theGlobal Model.
40
Results: Main Effect of the Agent Strategy
Although they have similar mean and standard deviation, theblobs indicate there are interaction influences.
41
Results: True Model and Agent Strategy Interaction
Cara leads to the lowest proportion of True Model selection asGlobal Model under the simplest True Model because the TrueModel lacks interactions. 42
Research Questions
1. Does the scientific community converge on the TrueModel?Yes, although the convergence is not always stable
2. Does the True Model complexity affects thereproducibility?Yes, it does.Simple True Models tend to be reproduced less than morecomplex models.
3. Does scientists behavior affects the reproducibility?The dominant Agent Strategy has a small effect on thereproducibility of True Models.
43
Research Questions
1. Does the scientific community converge on the TrueModel?Yes, although the convergence is not always stable
2. Does the True Model complexity affects thereproducibility?Yes, it does.Simple True Models tend to be reproduced less than morecomplex models.
3. Does scientists behavior affects the reproducibility?The dominant Agent Strategy has a small effect on thereproducibility of True Models.
43
Research Questions
1. Does the scientific community converge on the TrueModel?Yes, although the convergence is not always stable
2. Does the True Model complexity affects thereproducibility?Yes, it does.Simple True Models tend to be reproduced less than morecomplex models.
3. Does scientists behavior affects the reproducibility?The dominant Agent Strategy has a small effect on thereproducibility of True Models.
43
top related