What I learned last summer....
or
Physics is a “hard” science, biology is a “difficult” one.....
or
Hard-to-Define-Events 2013.1
Bradly Alicea
http://www.msu.edu/~aliceabr http://syntheticdaisies.blogspot.com
If your results are unpredictable, does it make them any less true?
Artificial Life XIII Conference East Lansing, MI July, 2012
2012
Recursive me! Giving this talk at HTDE 2012.
Residuals of the workshop hosted at Synthetic Daisies and Vimeo (videos).
It’s not the phenomenon, it’s you………
Pashler H. and Harris, C.R. Is the Replicability Crisis Overblown? Three Arguments Examined. Perspectives on Psychological Science, 7, 531 (2012). 1) Direct vs. conceptual replication:
* successful direct replication can validate findings, but often conceptual replication (between research groups) is a more attainable goal.
Pashler H. and Harris, C.R. Is the Replicability Crisis Overblown? Three Arguments Examined. Perspectives on Psychological Science, 7, 531 (2012). 1) Direct vs. conceptual replication:
* successful direct replication can validate findings, but often conceptual replication (between research groups) is a more attainable goal.
Conceptual replication: same type of experiment without replicating exact conditions.
TRADEOFF: generalization vs. accuracy.
General tendencies
(THEORY)
Accurate Repetition (EMPIRICISM)
MY INTERPRETATION
Pashler H. and Harris, C.R. Is the Replicability Crisis Overblown? Three Arguments Examined. Perspectives on Psychological Science, 7, 531 (2012). 1) Direct vs. conceptual replication:
* successful direct replication can validate findings, but often conceptual replication (between research groups) is a more attainable goal.
2) Science is ultimately self-correcting: * given enough time, the consensus of the scientific community will prevail (e.g. wisdom of the crowd, swarm intelligence).
* in the absence of information, people will flock to ideas that sound good (e.g. popular fads, internet memes).
Conceptual replication: same type of experiment without replicating exact conditions.
TRADEOFF: generalization vs. accuracy.
General tendencies
(THEORY)
Accurate Repetition (EMPIRICISM)
MY INTERPRETATION
Is Science Ultimately Self-correcting? A historical view of scientific consensus
Heliocentrism 17th century Astronomy
Rates of change differ, all geometry is qualitative
MY INTERPRETATION
CONSENSUS THOUGHT
Is Science Ultimately Self-correcting? A historical view of scientific consensus
Heliocentrism 17th century Astronomy
One gene, One protein 20th century Biology
Rates of change differ, all geometry is qualitative
MY INTERPRETATION
Is Science Ultimately Self-correcting? A historical model of scientific consensus
Heliocentrism 17th century Astronomy
One gene, One protein 20th century Biology
Cultural Relativism 20th century Anthropology
Curved Spacetime 20th century Physics
Phrenology 19th century Psychology
Plate Tectonics 20th century Geology
Lamarckism 18th century Zoology
Rates of change differ, all geometry is qualitative
MY INTERPRETATION
Is Science Ultimately Self-correcting? A historical model of scientific consensus
Heliocentrism 17th century Astronomy
One gene, One protein 20th century Biology
Cultural Relativism 20th century Anthropology
Phrenology 19th century Psychology
Plate Tectonics 20th century Geology
Lamarckism 18th century Zoology
Rates of change differ, all geometry is qualitative
“The Half-life of Facts” Samuel Arbesman
MY INTERPRETATION Facts (scientific and otherwise) decay at a certain rate (see book): * overturned (lose consensus status).
* hard-to-kill ideas (useless but still popular).
Curved Spacetime 20th century Physics
Are most “true” results wrong? And why?
False positive report probability HSIGNIFICANT : H0
Level of significance (e.g. 0.05)
Statistical power (vs. TYPE II error rate)
Ioannidis, Why Most Published Research Findings are False. PLoS Medicine, 2(8), e124.
Are most “true” results wrong? And why?
False positive report probability HSIGNIFICANT : H0
Level of significance (e.g. 0.05)
Statistical power (vs. TYPE II error rate)
High rate of nonreplication: formal analysis of all statistically significant hypotheses
(supported null hypotheses are excluded).
Biases include: experimental design, data analysis, and presentation factors (technical
variation).
* this list assumes technical variation is likely always bad (e.g. particle physics envy).
Ioannidis, Why Most Published Research Findings are False. PLoS Medicine, 2(8), e124.
Findings less likely to be true (for a given field) if (e.g. high PPV):
1) smaller the number of studies conducted (low N, sparse data).
2) smaller the effect size (small 1 – β, low sensitivity). 3) greater number of potential relationships. 4) Greater the flexibility of designs and analytical modes.
SOLUTION: large studies with low levels of bias (e.g. post-modernist envy).
* easy to suggest, harder to do. What are biases that affect the practice of science?
Ioannidis, Why Most Published Research Findings are False. PLoS Medicine, 2(8), e124.
Findings less likely to be true (for a given field) if (e.g. high PPV):
1) smaller the number of studies conducted (low N, sparse data).
2) smaller the effect size (small 1 – β, low sensitivity). 3) greater number of potential relationships. 4) Greater the flexibility of designs and analytical modes.
SOLUTION: large studies with low levels of bias (e.g. post-modernist envy).
* easy to suggest, harder to do. What are biases that affect the practice of science?
Ioannidis, Why Most Published Research Findings are False. PLoS Medicine, 2(8), e124.
“Incestuous Amplification” Effect
Inattentional Blindness
“When objective information is low, follow the herd”
Biases lead to an inability to notice features of data, theory which otherwise would be obvious.
A small set of ideas are perpetually circulated among people in a certain field or social group without external feedback.
Absent informed dissent or debate, argumentum ad populum (popular but incorrect ideas) tends to predominate.
Let’s be random…….
Number 8, Jackson Pollack Random Walk algorithm, 1000 iterations
Let’s be random…….
Number 8, Jackson Pollack Random Walk algorithm, 1000 iterations
Now let’s be quasi-periodic….
Figure 1, Journal of Sound and Vibration, 330(11), 2565–2579
(2011)
Quasi-periodic Crystals
Let’s be random…….
Number 8, Jackson Pollack Random Walk algorithm, 1000 iterations
Now let’s be quasi-periodic….
Figure 1, Journal of Sound and Vibration, 330(11), 2565–2579
(2011)
Quasi-periodic Crystals
COURTESY: Wired Science
Replicates Have Information Content, H(x)
1) Low variance between replicates, low H(x). 2) High variance between replicates, high H(x). * other meaningful, useful patterns beyond tests of the null hypothesis.
Hmmmm…….it turns out I’m pretty good at science. Perhaps it’s the
phenomenon after all!
WHY? Humans and Mice are phylogenetically related, and share much of the same genomic content.
Kolata, G. Mice Fall Short as Test Subjects for Humans’ Deadly Ills. NYT, February 11 (2013).
See paper: "Genomic responses in mouse models poorly mimic human inflammatory diseases". PNAS, doi:10.1073/pnas.1222878110.
Gene expression correlations are highest within human, lowest in comparing humans and mouse, and low within mouse.
MICROARRAY: Human “burn” and “trauma” are more closely related than mouse “burn” and “trauma”.
FROM: "Genomic responses in mouse models poorly mimic human inflammatory diseases". PNAS, doi:10.1073/pnas.1222878110.
Why would there be such massive difference between humans and mice (shared evolutionary history,
highly homologous genomes)?
BLACK BOX ARGUMENT: Multiple layers of physiological regulation explains much of the variance. 1) Epigenetics has many subtle effects. 2) Endless “–omes”.
Satisfying?
COMPLEXITY ARGUMENT: Generative nature of gene expression explains much of the variance. 1) One gene, many products. 2) Genes of large effect and number of genes involved.
Satisfying?
NOISE ARGUMENT: Noise (variation) in gene expression explains much of the variance. 1) Useful (and loss of) information can be generated from fluctuations, 2) Synchronized noise is good, white noise is bad.
Satisfying?
Ramsden, E. Model Organisms and Model Environments: A Rodent Laboratory in Science, Medicine and Society. Medical History, 55, 365–368 (2011).
Surprising and unexpected elements of model organisms:
* does standardization of environment (e.g. social settings, cages, diet) = a standard result (e.g. replication)?
Wikgren, J. et.al Selective breeding for endurance running capacity affects
cognitive but not motor learning in rats. Physiology and Behavior, 106, 95–100 (2012).
Does artificial selection (e.g. selective breeding, etc) affect the response of laboratory animals?
Francis, G. The Psychology of Replication and Replication in Psychology.
Perspectives on Psychological Science, 7(6), 585–594 (2012).
Is there a “psychology” of replication (e.g. a bias for results and settings that make results more replicable but less generalizable and informative)?
VARIATIONAL
VARIATIONAL
EXACT
EXACT
WHY? Humans and Mice are phylogenetically related, and share much of the same genomic content.
Kolata, G. Mice Fall Short as Test Subjects for Humans’ Deadly Ills. NYT,
February 11 (2013).
See paper: "Genomic responses in mouse models poorly mimic human inflammatory diseases". PNAS, doi:10.1073/pnas.1222878110.
My Interpretation
Mode of function in Homo sapiens
Mode of function in Mus musculus
EXACT: single path to activating physiological response. System not robust to perturbation.
VARIATIONAL: several alternate pathways, all are effective. System can be robust to perturbation (one route blocked, take the alternate with minimal cost).
SINGLE ROUTE (TOP): 4,373km.
VARIATIONAL ROUTES (BOTTOM): 4,373km (left); 4,551km (right).
Choice of route depends on
weather, topography, etc.
EXACT VARIATIONAL
CONSERVED
(HIGHLY HOMOLOGOUS)
DIVERGENT
(NOT HIGHLY HOMOLOGOUS)
A
A
A
A
B B
B B
EXACT CONSERVED
EXACT DIVERGENT
VARIATIONAL DIVERGENT
VARIATIONAL CONSERVED
Interaction between evolution of pathways and function of pathways:
* degeneracy: signals and receptors become more promiscuous over evolutionary time (enables further complexity). Synthetic Daisies post.
* what is the role of diversity within species? Unknown (explains within species, between function gene expression outcomes).
TWO OTHER EXAMPLES (Role of Evolutionary Conservation in
Model Organisms)
COURTESY: Figure 2, Longo and Fabrizio.
REGULATION OF STRESS AND LONGETIVITY Longo and Fabrizio, CMLS: Cell and Molecular Life Sciences, 59, 903–908 (2002).
YEAST
WORMS
HUMANS
Similar set of proteins regulated by growth factors
General downregulation of IGF-1 pathway
Stress resistance pathways, switch from reproductive to non-reproductive phase, evolved to induce longetivity, cellular maintenance. Unknown if same factors and pathways are involved, or if they have a common ancestry. Aging modulated by a simple, course-grained intervention: caloric restriction. Conserved genes modulate longetivity in fruit flies, but may not translate into a conserved mechanism.
Howe et.al The zebrafish reference genome sequence and its relationship to the human genome. Nature, doi:10.1038/nature12111 (2013).
Figure 3, Howe et.al (2013)
A: Overlap between species = number of orthologues (copies of gene, not genes themselves) at the time of their phylogenetic split.
B: Relationship among “ohnologues”: TSD (teleost-specific genome duplication)- related genes.
How can we better assess the parallels and differences between Zebrafish (an NIH-
approved model organism) and Humans?
What is experimental replication (the big picture)?
What is experimental replication (the bigger picture)?
Replication as generative model:
TREATMENTS (combinatorial
input)
BLACK BOX (incompletely-known
mechanism)
Range of Outcomes
Experiments populate a prior probability distribution:
* distribution can never truly be known (only estimated). * may be highly complex (non-Gaussian, multimodal).
* algorithmic techniques might help find best approximations (or priors).