bradly alicea - amazon s3...what i learned last summer.... or physics is a “hard” science,...

Post on 26-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

What I learned last summer....

or

Physics is a “hard” science, biology is a “difficult” one.....

or

Hard-to-Define-Events 2013.1

Bradly Alicea

http://www.msu.edu/~aliceabr http://syntheticdaisies.blogspot.com

If your results are unpredictable, does it make them any less true?

Artificial Life XIII Conference East Lansing, MI July, 2012

2012

Recursive me! Giving this talk at HTDE 2012.

Residuals of the workshop hosted at Synthetic Daisies and Vimeo (videos).

It’s not the phenomenon, it’s you………

Pashler H. and Harris, C.R. Is the Replicability Crisis Overblown? Three Arguments Examined. Perspectives on Psychological Science, 7, 531 (2012). 1) Direct vs. conceptual replication:

* successful direct replication can validate findings, but often conceptual replication (between research groups) is a more attainable goal.

Pashler H. and Harris, C.R. Is the Replicability Crisis Overblown? Three Arguments Examined. Perspectives on Psychological Science, 7, 531 (2012). 1) Direct vs. conceptual replication:

* successful direct replication can validate findings, but often conceptual replication (between research groups) is a more attainable goal.

Conceptual replication: same type of experiment without replicating exact conditions.

TRADEOFF: generalization vs. accuracy.

General tendencies

(THEORY)

Accurate Repetition (EMPIRICISM)

MY INTERPRETATION

Pashler H. and Harris, C.R. Is the Replicability Crisis Overblown? Three Arguments Examined. Perspectives on Psychological Science, 7, 531 (2012). 1) Direct vs. conceptual replication:

* successful direct replication can validate findings, but often conceptual replication (between research groups) is a more attainable goal.

2) Science is ultimately self-correcting: * given enough time, the consensus of the scientific community will prevail (e.g. wisdom of the crowd, swarm intelligence).

* in the absence of information, people will flock to ideas that sound good (e.g. popular fads, internet memes).

Conceptual replication: same type of experiment without replicating exact conditions.

TRADEOFF: generalization vs. accuracy.

General tendencies

(THEORY)

Accurate Repetition (EMPIRICISM)

MY INTERPRETATION

Is Science Ultimately Self-correcting? A historical view of scientific consensus

Heliocentrism 17th century Astronomy

Rates of change differ, all geometry is qualitative

MY INTERPRETATION

CONSENSUS THOUGHT

Is Science Ultimately Self-correcting? A historical view of scientific consensus

Heliocentrism 17th century Astronomy

One gene, One protein 20th century Biology

Rates of change differ, all geometry is qualitative

MY INTERPRETATION

Is Science Ultimately Self-correcting? A historical model of scientific consensus

Heliocentrism 17th century Astronomy

One gene, One protein 20th century Biology

Cultural Relativism 20th century Anthropology

Curved Spacetime 20th century Physics

Phrenology 19th century Psychology

Plate Tectonics 20th century Geology

Lamarckism 18th century Zoology

Rates of change differ, all geometry is qualitative

MY INTERPRETATION

Is Science Ultimately Self-correcting? A historical model of scientific consensus

Heliocentrism 17th century Astronomy

One gene, One protein 20th century Biology

Cultural Relativism 20th century Anthropology

Phrenology 19th century Psychology

Plate Tectonics 20th century Geology

Lamarckism 18th century Zoology

Rates of change differ, all geometry is qualitative

“The Half-life of Facts” Samuel Arbesman

MY INTERPRETATION Facts (scientific and otherwise) decay at a certain rate (see book): * overturned (lose consensus status).

* hard-to-kill ideas (useless but still popular).

Curved Spacetime 20th century Physics

Are most “true” results wrong? And why?

False positive report probability HSIGNIFICANT : H0

Level of significance (e.g. 0.05)

Statistical power (vs. TYPE II error rate)

Ioannidis, Why Most Published Research Findings are False. PLoS Medicine, 2(8), e124.

Are most “true” results wrong? And why?

False positive report probability HSIGNIFICANT : H0

Level of significance (e.g. 0.05)

Statistical power (vs. TYPE II error rate)

High rate of nonreplication: formal analysis of all statistically significant hypotheses

(supported null hypotheses are excluded).

Biases include: experimental design, data analysis, and presentation factors (technical

variation).

* this list assumes technical variation is likely always bad (e.g. particle physics envy).

Ioannidis, Why Most Published Research Findings are False. PLoS Medicine, 2(8), e124.

Findings less likely to be true (for a given field) if (e.g. high PPV):

1) smaller the number of studies conducted (low N, sparse data).

2) smaller the effect size (small 1 – β, low sensitivity). 3) greater number of potential relationships. 4) Greater the flexibility of designs and analytical modes.

SOLUTION: large studies with low levels of bias (e.g. post-modernist envy).

* easy to suggest, harder to do. What are biases that affect the practice of science?

Ioannidis, Why Most Published Research Findings are False. PLoS Medicine, 2(8), e124.

Findings less likely to be true (for a given field) if (e.g. high PPV):

1) smaller the number of studies conducted (low N, sparse data).

2) smaller the effect size (small 1 – β, low sensitivity). 3) greater number of potential relationships. 4) Greater the flexibility of designs and analytical modes.

SOLUTION: large studies with low levels of bias (e.g. post-modernist envy).

* easy to suggest, harder to do. What are biases that affect the practice of science?

Ioannidis, Why Most Published Research Findings are False. PLoS Medicine, 2(8), e124.

“Incestuous Amplification” Effect

Inattentional Blindness

“When objective information is low, follow the herd”

Biases lead to an inability to notice features of data, theory which otherwise would be obvious.

A small set of ideas are perpetually circulated among people in a certain field or social group without external feedback.

Absent informed dissent or debate, argumentum ad populum (popular but incorrect ideas) tends to predominate.

Let’s be random…….

Number 8, Jackson Pollack Random Walk algorithm, 1000 iterations

Let’s be random…….

Number 8, Jackson Pollack Random Walk algorithm, 1000 iterations

Now let’s be quasi-periodic….

Figure 1, Journal of Sound and Vibration, 330(11), 2565–2579

(2011)

Quasi-periodic Crystals

Let’s be random…….

Number 8, Jackson Pollack Random Walk algorithm, 1000 iterations

Now let’s be quasi-periodic….

Figure 1, Journal of Sound and Vibration, 330(11), 2565–2579

(2011)

Quasi-periodic Crystals

COURTESY: Wired Science

Replicates Have Information Content, H(x)

1) Low variance between replicates, low H(x). 2) High variance between replicates, high H(x). * other meaningful, useful patterns beyond tests of the null hypothesis.

Hmmmm…….it turns out I’m pretty good at science. Perhaps it’s the

phenomenon after all!

WHY? Humans and Mice are phylogenetically related, and share much of the same genomic content.

Kolata, G. Mice Fall Short as Test Subjects for Humans’ Deadly Ills. NYT, February 11 (2013).

See paper: "Genomic responses in mouse models poorly mimic human inflammatory diseases". PNAS, doi:10.1073/pnas.1222878110.

Gene expression correlations are highest within human, lowest in comparing humans and mouse, and low within mouse.

MICROARRAY: Human “burn” and “trauma” are more closely related than mouse “burn” and “trauma”.

FROM: "Genomic responses in mouse models poorly mimic human inflammatory diseases". PNAS, doi:10.1073/pnas.1222878110.

Why would there be such massive difference between humans and mice (shared evolutionary history,

highly homologous genomes)?

BLACK BOX ARGUMENT: Multiple layers of physiological regulation explains much of the variance. 1) Epigenetics has many subtle effects. 2) Endless “–omes”.

Satisfying?

COMPLEXITY ARGUMENT: Generative nature of gene expression explains much of the variance. 1) One gene, many products. 2) Genes of large effect and number of genes involved.

Satisfying?

NOISE ARGUMENT: Noise (variation) in gene expression explains much of the variance. 1) Useful (and loss of) information can be generated from fluctuations, 2) Synchronized noise is good, white noise is bad.

Satisfying?

Ramsden, E. Model Organisms and Model Environments: A Rodent Laboratory in Science, Medicine and Society. Medical History, 55, 365–368 (2011).

Surprising and unexpected elements of model organisms:

* does standardization of environment (e.g. social settings, cages, diet) = a standard result (e.g. replication)?

Wikgren, J. et.al Selective breeding for endurance running capacity affects

cognitive but not motor learning in rats. Physiology and Behavior, 106, 95–100 (2012).

Does artificial selection (e.g. selective breeding, etc) affect the response of laboratory animals?

Francis, G. The Psychology of Replication and Replication in Psychology.

Perspectives on Psychological Science, 7(6), 585–594 (2012).

Is there a “psychology” of replication (e.g. a bias for results and settings that make results more replicable but less generalizable and informative)?

VARIATIONAL

VARIATIONAL

EXACT

EXACT

WHY? Humans and Mice are phylogenetically related, and share much of the same genomic content.

Kolata, G. Mice Fall Short as Test Subjects for Humans’ Deadly Ills. NYT,

February 11 (2013).

See paper: "Genomic responses in mouse models poorly mimic human inflammatory diseases". PNAS, doi:10.1073/pnas.1222878110.

My Interpretation

Mode of function in Homo sapiens

Mode of function in Mus musculus

EXACT: single path to activating physiological response. System not robust to perturbation.

VARIATIONAL: several alternate pathways, all are effective. System can be robust to perturbation (one route blocked, take the alternate with minimal cost).

SINGLE ROUTE (TOP): 4,373km.

VARIATIONAL ROUTES (BOTTOM): 4,373km (left); 4,551km (right).

Choice of route depends on

weather, topography, etc.

EXACT VARIATIONAL

CONSERVED

(HIGHLY HOMOLOGOUS)

DIVERGENT

(NOT HIGHLY HOMOLOGOUS)

A

A

A

A

B B

B B

EXACT CONSERVED

EXACT DIVERGENT

VARIATIONAL DIVERGENT

VARIATIONAL CONSERVED

Interaction between evolution of pathways and function of pathways:

* degeneracy: signals and receptors become more promiscuous over evolutionary time (enables further complexity). Synthetic Daisies post.

* what is the role of diversity within species? Unknown (explains within species, between function gene expression outcomes).

TWO OTHER EXAMPLES (Role of Evolutionary Conservation in

Model Organisms)

COURTESY: Figure 2, Longo and Fabrizio.

REGULATION OF STRESS AND LONGETIVITY Longo and Fabrizio, CMLS: Cell and Molecular Life Sciences, 59, 903–908 (2002).

YEAST

WORMS

HUMANS

Similar set of proteins regulated by growth factors

General downregulation of IGF-1 pathway

Stress resistance pathways, switch from reproductive to non-reproductive phase, evolved to induce longetivity, cellular maintenance. Unknown if same factors and pathways are involved, or if they have a common ancestry. Aging modulated by a simple, course-grained intervention: caloric restriction. Conserved genes modulate longetivity in fruit flies, but may not translate into a conserved mechanism.

Howe et.al The zebrafish reference genome sequence and its relationship to the human genome. Nature, doi:10.1038/nature12111 (2013).

Figure 3, Howe et.al (2013)

A: Overlap between species = number of orthologues (copies of gene, not genes themselves) at the time of their phylogenetic split.

B: Relationship among “ohnologues”: TSD (teleost-specific genome duplication)- related genes.

How can we better assess the parallels and differences between Zebrafish (an NIH-

approved model organism) and Humans?

What is experimental replication (the big picture)?

What is experimental replication (the bigger picture)?

Replication as generative model:

TREATMENTS (combinatorial

input)

BLACK BOX (incompletely-known

mechanism)

Range of Outcomes

Experiments populate a prior probability distribution:

* distribution can never truly be known (only estimated). * may be highly complex (non-Gaussian, multimodal).

* algorithmic techniques might help find best approximations (or priors).

top related