bioinformatics at nasa or yes virginia, nasa does do biology! michael new astrobiology discipline...
TRANSCRIPT
Bioinformatics at NASAor
Yes Virginia, NASA does do biology!
Michael NewAstrobiology Discipline Scientist
Maryland
Bioinformatics Technology Forum 2
Bioinformatics at NASA?
Bioinformatics is used at NASA in several ways:• Fundamental Space Biology: How do organisms,
including humans, adapt to the space environment?• Planetary Protection: What is the nature of the
community of micro-organisms living in space-craft assembly areas and on space-craft?
• Astrobiology: What can the genomes of life on Earth tell us about the origin, evolution, distribution and future of life on Earth and the potential for life elsewhere
4 March 2009
Bioinformatics Technology Forum 3
Fundamental Space Biology
How are molecular signals, pathways, and products in humans and model organisms (e.g., mice) altered by exposure to microgravity and space radiation factors?
How is drug metabolism affected by space related effects?
Are there critical stages in development that are affected by altered gravity?
Why virulence of pathogens appears to increase in space?
4 March 2009
Bioinformatics Technology Forum 4
Small Sats + On-board Expression Measurements + Bioinformatics
4 March 2009
30 cm x 10 cm x 10 cm
How to make inferences?
Bioinformatics Technology Forum 5
Making good inferences is the key
4 March 2009
ExperimentalData
New Knowledge
Analysis Algorithm
Backgroundknowledge
PreviousResults
Andrew Pohorille, Jeff Shrager and Steve Racunas
NASA Center for Astrobioinformatics, Karl Schweighofer
Bioinformatics Technology Forum 6
An example: the Jnk Pathway
4 March 2009
?
Hypothesis:Jnk1 activates c-Jun
TGF- TPA
Jnk2
c-Jun
IL-11 p53 nur77 . . .
Jnk1
JunD
External Stimuli
Kinases
Transcription Factors
mRNA Transcriptsp19
Bioinformatics Technology Forum 7
Expression studies are inconclusive
4 March 2009
Hypothesis Stanford Medical Sch. Experiments
Jnk → c-Jun p = 0.51
Jnk c-Jun↛ p = 0.46
p value: Probability that posterior of H, p(D|H), is just spurious (i.e., same posterior likely with random D when ¬H)
Bioinformatics Technology Forum 8
Background knowledge makes a difference!
4 March 2009
Hypothesis Stanford Medical Sch. Experiments
With Background Knowledge
Jnk → c-Jun p = 0.51 p = 0.77
Jnk c-Jun↛ p = 0.46 p = 0.003
Bioinformatics Technology Forum 10
Planetary Protection
What organisms are present in and on spacecraft?
How can we assess the “bioburden” of spacecraft?
How can we ensure the no Terran life hitchhikes to a clement spot on another planet?
How can we assess the safety of returned samples?
4 March 2009
Bioinformatics Technology Forum 11
Assessing “crud”
What is the diversity of low-biomass samples taken from a space-craft assembly clean room?
Comparing two new techniques: Affymetrix’s Phylochip and 454 sequencing.
4 March 2009
Bioinformatics Technology Forum 12
Third Generation Phylochip
4 March 2009
G2 PhyloChip G3 PhyloChipProbes 500,000 1 MillionReference Database 30,000 sequences 320,000 sequencesHierarchical No YesStrain specific Probe sets No YesOTUs ~9000 30,000+
• Additional advancements– Smaller feature size -> no increase in chip cost.– Smaller sample volumes: decreased cost in
reagents.– Improved analysis
• More sophisticated fragmentation method• Refined analysis software• Improved validation approach.
Relatively inexpensive and suitable for repeated assays,Less robust quantitation
Bioinformatics Technology Forum 13
454 Sequencing: The Sogin Survey Method
4 March 2009
• In a single run, 454 technology can generate up to 200,000 independent sequence reads of ~100 bases each. Comprehensively samples short variable rRNA regions
• First report on deep sea diversity estimates 10-100 times more species than previously suspected (Sogin et al., PNAS 2006). A few species are common, vast majority are rare
• This method easily adapted to spacecraft bioburden inventory. Gives some estimate of quantity as well as phylogeny
Method is expensive and requires large amounts of DNA.
More suitable for infrequent assays of pooled samples.
454 Inc
Family-level Comparisons
G2 PhyloChip:Families Detected: 96 Detected exclusively on PhyloChip: 31
454 V6 Pyrosequencing:Families Detected: 87 Detected exclusively on PhyloChip: 22
653122
• Overall both methods showed high agreement of detection at the family level, but only when data from all temperature gradients was compiled.
Bioinformatics Technology Forum 15
Astrobiology: Life in a Universal Context
How does life begin and evolve?• What do the rock record and genomes tell us?
Does life exist elsewhere in the Universe?• Life as we know it?• “Weird” life?• How can either be detected?
What is the future for life on Earth and beyond?
4 March 2009
Bioinformatics Technology Forum 16
Three case studies
Development of new tool to assess HGT.• Peter Gogarten and Olga Zhaxybayeva
Use of standard tools to look for independent “leaps to land.”• Zoe Cardon, Louise Lewis, and Harry
Frank Resurrecting ancient proteins.
• Steve Benner, et al.
4 March 2009
Bioinformatics Technology Forum 17
How can we assess the degree of HGT present on the early Earth?
4 March 2009
•Quartet is a smallest unit of phylogenetic information
•Each quartet can have three unrooted tree topologies
• Support for different quartet topologies can be summarized for all gene families
Bioinformatics Technology Forum 18
Why use embedded quartets?
No assumption that all genes in a genome have the same phylogenetic history.
The total number of quartets is much smaller than number of tree topologies, which makes it possible to evaluate all quartets.
Gene families present only in few analyzed genomes can be included in the analyses
Phylogenetic signal can be divided into plurality consensus and the conflicting signal.
Allows us to partition analyzed genomes according to some scenario (e.g., grouping by ecology) and retrieve gene families that support or conflict it.
4 March 2009
Bioinformatics Technology Forum 19
Example: Cyanobacteria & their Genes
4 March 2009
•Analyzed gene families in 11 sequenced cyanobacterial genomes using the developed quartet decomposition method
•Cyanobacterial genomes reveal a complex evolutionary history, which cannot be presented by a single strictly bifurcating tree for all genes or even most genes.
• Across short phylogenetic distances all type of genes appear to be equally affected by transfer. Across large phylogenetic distances genes encoding metabolic functions are more frequently transferred, and genes in transcription and translation are less frequently transferred
Olga Zhaxybayeva, J. Peter Gogarten, Robert L. Charlebois, W. Ford Doolittle and R. Thane Papke: "Phylogenetic Analyses Of Cyanobacterial Genomes: Quantification Of Horizontal Gene Transfer Events", Genome Research, 2006, 16:1099-1108.
Bioinformatics Technology Forum 20
What traits were needed for “leap to land”?
4 March 200920
Chlorophyceae
Trebouxiophyceae
Ulvophyceae
Charophyceae
Prasinophyceae
Embryophytes
5 Major Green Algal Classes(sensu Mattox and Stewart, 1984--recent revision divides Charophyceae into 6
classes)
Terrestrial green plants
Green Plants
N=1
?
???
?
?
?
??
?
??
?
??
??
N=? leaps of eukaryoticgreen algae from aquaticor marine habitats to land
The famousleap to land
Numerous independent habitat
transitions provide statistical power
for detecting traits correlated with
successful leaps from water to land.
Bioinformatics Technology Forum 21
Bioinformatics used to:
4 March 2009
• Infer evolutionary relationships among known aquatic and recently isolated desert algae using data from nucleotide sequences (large data sets, multiple genes) to estimate diversity and describe new species.
• Estimate the number of transitions from aquatic to terrestrial habitats (Bayesian methods). To date, we estimate at least 40 evolutionarily independent transitions!
• Test the correlation of source habitat type with traits that occur in our desert and related aquatic algae, using comparative statistical methods that take into account evolutionary relationships among taxa. Lewis and Lewis 2005, Systematic Biology, 54: 936-
947; Gray et al. 2007, Plant Cell and Environment, 30:1240-1255; Cardon et al. 2008, Bioscience, 58:114-122;Lewis, unpublished
Bioinformatics Technology Forum 22
Moving from single cells to multicellular animals
4 March 2009
This seems hard to do from the perspective of molecular biology:• Change the goal of life to replicate cells as fast as possible (what bacteria do) to replicating cells under control, and then not at all (what you do)• The fossil record makes the transition seem sudden (but the fossil record may be missing many things)• We are not certain that the transition is not driven by planetary change, such as the emergence of abundant oxygen in the atmosphere
Understanding how this transition took place on Earth helps NASA infer how likely it is to have taken place elsewhere, a key part of the Drake equation to estimate the likelihood of intelligent life elsewhere in the cosmos.
Bioinformatics Technology Forum 23
Since fossils are no help, turn to genomes
4 March 2009
Exhaustive matching supported models for protein sequence evolution• New tools to score amino acid replacements• Tools to extend the model that scores replacements
• Tools to exploit homoplasy, compensatory covariation, other non-Markovian behaviors of in the evolution if real proteins diverging under functional constraints
Gonnet, G. H., Cohen, M. A., Benner, S. A (1992) Exhaustive matching of the entire protein sequence database. Science 256, 1443-1445
Multicellularity emerges
What happened here in the genome?
Sequencing of Choanoflagellate provides outgroup, an animal diverging just before multicellularity emergesKing, N. et al. (2008) The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451, 783-66
Bioinformatics Technology Forum 24
So what happened?
4 March 2009
Many things• Steroid receptors emerged, together with oxygen-dependent proteins that make steroid hormones; key at many places in metazoan biology• Protein tyrosine phosphorylating kinases emerged from serine kinases• Protein tyrosine phosphatases emerged (from an unknown source)• Kinase substrates emerged that were phosphorylated on tyrosines• SH2 domains that bind to phosphortyrosine emerged (unknown source)And not just one example. Lots of them with correlated evolution.
JAK is a two domain kinase. The domains are duplicates of a single domain; the duplication occurred in this episode.
STAT is a family of substrates for JAK, also arising by duplication at the same time as the JAK domains duplicated.
JAKSTAT
Bioinformatics Technology Forum 25
How do we know that the ancestral proteins were doing phosphorylation, being phosphorylated etc. at that time?
4 March 2009
Bring the experimental method to bear on historical hypotheses using biotech to resurrect genes and proteins having the inferred ancestral sequence, studying their behavior in the lab. Consider the SH2 domains, which bind to phosphotyrosine, a new function emerging together with multicellularity. The SH2 domains are a large family having various binding specificities. Resurrection shows that the ancestral proteins bind as well, and shows their specificity. (Benner, et al., unpublished)
Binds (Gln or Tyr)-Asn-Tyr)
Binds (Ile or Val)-Asn-(Val or Pro))
outgroup
Bioinformatics Technology Forum 26
Acknowledgements
Andrew Pohorille (NASA ARC) Jeff Schrager (Stanford) Stephen Racunas (Stanford) Karl Schweighofer (SETI Inst) Catharine Conley (NASA ARC) Mitch Sogin (MBL) Kasthuri Venkataswaran (JPL) Gary Andersen (LBL) J. Peter Gogarten (U Conn) Olga Zhaxybayeva (Dalhousie) Zoe Cardon (MBL) Louise Lewis (U Conn)
Frank Lewis (U Conn) Steve Benner (FFAME) Jason Raymond (UC Merced) Rob Knight (CUB) Eric Gaucher (GA Tech)
4 March 2009