making order from chaos: using metagenome data as traits in individuals and as markers in entire...
Post on 19-Dec-2015
215 views
TRANSCRIPT
Making Order from Chaos: Using Metagenome Data as Traits in Individuals and as Markers in Entire Ecosystems“
Andrew K BensonW.W. Marshall Distinguished Professor of
BiotechnologyDirector, Core for Applied Genomics and Ecology
Professor, Dept. of Food Science University of Nebraska
Phylloplane
Rhizosphere
Surface Water
Ground Water
Food
We Live in a World That is Numerically Dominated by Microorganisms
Oceanic
Soil
Rumen
Gastrointestinal
Oral
Organisms in these microbiomes contribute significantly to characteristics of these ecosystems
Phylloplane
Rhizosphere
N2 Fixation
Disease Resistance
Obesity
Inflammatory Bowel Disease
Diabetes
Gastric and Colon cancers
Significant variation in complexity of microbiomes from different ecosystems
Most of our understanding of these microbial ecosystemshas relied on culture-based approaches to cultivate, differentiate
and enumerate different species of microorganisms
Community composition 16S Microbiome
Community Genetic Potential Metagenomics and Metaproteomics
Community Physiology Metabolomics (nanoscale??)
Community Dynamics Microbiomics + FISH
Community interactions Microbiomics + FISH
High-throughput DNA sequencing technologies combined with other “omics” now allow systematic analysis of complex microbial
communities
PCR amplifyTag gene (16S rRNA)
ShotgunLibrary
Microbiome
Total genomic DNA
454 Pyrosequencing 454 Pyrosequencing
Metagenome
The 16S rRNA: the structural component of the Small Subunit and the most widely used molecular clock for bacteria
V6
V1- V2V3
V7
V5V4V8
Noller et al. 2001 Science ~54 recognized Phyla
Lyse bacteria By homogenizationWith glass beads
High throughput fecal DNA extraction
Attach gDNA To magnetic particles
Centrifuge to Remove debris
Robotic extraction
A16S 8F R357 B
gDNA from a sample
PCR amplify 16S rRNA gene
Sample-specific barcodes
Pool from 96 samples and sequence
TCTGCATG
TCTGCATG
GGAACTAA
TCCTTAGG
Quality Filtering
Length >200 bases
Barcode present
5’ 16S primer present
Average Q = 20
Trimming
Remove barcode
Remove 5’ primer
Remove 3’ primer
Remove 3’ adapter
Sample 1 2 3Barcode TCTGCATG GGAACTAA TCCTTAGG
Reads
Strategies for data analysis
1. Define species composition and abundance in each sample
2. Define phylogenetic content (genetic diversity) in each sample
3. Quantitative analysis of the distribution of species abundances and genetic diversity between two environments or through a “gradient” of environments or in multiple environments
Sequences
Kmer-based approaches
Kmer distributionKmer-based Distances
CD-Hit RDP Classifier
Multiple Sequence Alignment BLAST
Phylogenetic treeNearest neighbor(bit score)
Last common ancestorWith control sequences
Search representativeSequence against database
Amenable for high-throughput
All 8 base words from training set of known taxa is calculated and The probability of these words occurring in a query sequence is calculated
subset of words is used for probability calculation confidence of assignment is estimated by 100 reps of subsets (bootstrapping) ranking at higher order achieved by summing results from all taxon at lower level
AAAATTTT AAATTTTTT AATTTTTT
Taxon 1 0.1 0.01 0Taxon 2 0.15 0.03 0.05Taxon 3 0.08 0.006 0Taxon 4 0.012 0.1 0.003Taxon 5 0.09 0.083 0.003Taxon 6 0.048 0.03 0Taxon 7 0.1 0.07 0.002Taxon 8 0.004 0.02 0.01Taxon 9 0.065 0.027 0.1
AAAATTTT AAATTTTTT AATTTTTT
Query 1 0.1 0.01 0Query 2 0.048 0.03 0Query 3 0.065 0.027 0.1Query 4 0.012 0.1 0.003Query 5 0.09 0.083 0.003Query 6 0.1 0.01 0
Prob of Kmers from training set Prob of Kmers from query
Taxonomy-dependent analysis: RDP CLASSIFIER
1. Aligns sequences by length and pulls longest sequence2. Distance between this sequence and all remaining sequences estimated
from short word scores 3. Those sequences within defined threshold word score limit are
added to the cluster4. Reiterate with remaining sequences
Godzik Laboratory
Taxonomy-independent analysis: CD-HIT
BioServX Cluster
Etsuko MoriyamaComputer labCore for Applied
Genomics and Ecology
454 GutMicro Server
Instrument cluster Titanium cluster
Primary data collection
Image analysisBase calling
Quality FilteringDatabase Upload
Data analysisCLASSIFIEROTU-PICKER
Search Functions Composite files and Send for analysis Pipelines
Simplified Database Searches
Taxonomy-dependent and Taxonomy independent pipelines For data analysis
Composite Experiment files fromDatabase available for analysis
Set parameters And submit
Final check on Samples in the experiment
CD-HIT output CLASSIFIER output
Total genomic DNA
V1-V2 region16S rRNA gene
PCR amplification
Getting better at taxonomy-independent analysis
Taxonomy-Dependent blind to taxa not in model Taxonomy-Independent too much data for true alignment
Sample 1 (~10,000 reads)
Sample 2 (~10,000 reads)
Sample 3 (~10,000 reads)
Sample 1,000 (~10,000 reads)
~500 representative sequences ~500 representative sequences ~500 representative sequences
~500 representative sequences
Dereplicate Sequences to 97%
Kmer-based Group distance matrix
Rep Rep Rep seq seq seq 1 2 3
Rep seq 1 1 0.986 0.786Rep seq 2 1 0.693Rep seq 3 1
Complete linkage clustering
OTU 1
OTU2OTU3
OTU4OTU5
OTU6
Data reduction and creation of “sloppy bins”
OTU1
OTU3
OTU4
OTU5
OTU 1
OTU2
OTU3
OTU4
OTU5
OTU6
cmAlignOTU Rep seqs
>Rep seq 1_OTU1>Rep seq 2_OTU1>Rep seq 3_OTU3
>Rep seq 50,000_OTU4
Update Rep seqOTU file >Rep seq 1_OTU1
>Rep seq 2_OTU2>Rep seq 3_OTU3
>Rep seq 50,000_OTU4
Tightening up the OTUs with the secondary structureAware Infernal Aligner
E. P. Nawrocki, D. L. Kolbe, and S. R. Eddy, Infernal 1.0: Inference of RNA alignments Bioinformatics (2009),
Sequences Alignment Complete linkageClustering
Quantitative analysisOf Taxa or OTUs
Diversity estimates
Rarefaction Chao, Shannon
ANOVA and T-tests Confidence intervals
Quantifying abundance of ecological characteristics
From guts to greens Applications
Within this same complex of host tissues, a huge mass of microbes thrive. This massis referred to as the microbiome
The Gastrointestinal tract ecosystem: the next frontier in
biology
Specialized cells and tissues for: Nutrient breakdown and adsorption Flow (peristalsis) Immune surveillance Neural connectivity
How complex is the microbiome
Population density: 106 cells/ml in the ileum 1013 cells/gram in the colon
Species richness: 5 major phyla, 1,800 genera, 2,000-10,000 species of bacteria
Genetic coding content: 20-30 billion bases (10 times the human content)
Highly variable between individuals: extensive variation at the species/strain level
The microbiome essentially acts as a metabolic organ,encoding pathways for:
Nutrient breakdown, adsorption, utilizationSignaling within the microbiota and to the host Immune stimulation/suppression…just to name a few
Fundamental questions about composition of the gut microbiome
What factors influence composition—how much “G” and how much “E”? Are there Keystone species? Mutualists? Engineers? How do aberrations arise in composition? What is more important, species composition or function?
1. Sterile at birth rapidly colonized from maternal environment 2. Successive waves of colonizationstabilizes to climax community after weaning 3. Some resistance to perturbation memory?
Health Disease
Heartdisease
Diabetes
Cancer
IBDObesity
Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6
Gene 100Gene 158Gene 573Gene 744Gene 2763Gene 18950Gene 21305Gene 22481Gene 24796
Gene AGene BGene C
Pathway 1
Pathway 2
Pathway 3
Anatomy of a polygenic complex disease
Diet Exercise
Genetic predisposition
IBD
Environmental factors
Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6
Gene 100Gene 158Gene 573Gene 744Gene 2763Gene 18950Gene 21305Gene 22481Gene 24796
Gene AGene BGene C
Gut microbiota
Pathway 2
Pathway 3
Where does the gut microbiota fit in?
Diet Exercise
Genetic predisposition
IBD
Environmental factors
Gut microbiota
If the gut microbiota is associated (causally) with certain lifestyle diseases
And
If the gut microbiota is influenced by host genotype
…Then genetic susceptibility to certain complex lifestyle diseases may beManifest, in part, as predisposition to colonization by certain gut bacteria
Changing how we think about disease susceptibility
Metabolic effects
Gut microbiota
Disease
1. Selective Breeding Models 2. Genetic mapping models
Systematic approaches to measure the degree of genotypic influence at the individual level
Artificial selection models
If host genotype has significant influence…
Then we should be able to observe significant effects of host genotype on microbiome composition in selective breeding experiments
Composition of gut microbiota in selective breeding lines
AB X CD BA X DCCD X ABDC X BA
A (NIH) X B (ICR)B (ICR) X A (NIH)C (CF1) X D CFW(sw)D CFW(sw) X C (CF1)
15 generations Selection and Breeding (Heat loss)
~30 generations of closed breeding (no selection)10 generations of renewed selection and breeding
MH MC ML
F1Founder populations
Artificial selection models
If host genotype has significant influence…
Then we should observe significant effects of artificial selection on microbiome composition
Multiple generations of selective mating
Host genetic diversity high decreased genetic diversity
16 animals per line (one line rep)pyrosequencing at 5,000-10,000 reads per animal
Did composition of the GI microbiome respond to selection?
UNIFRAC analysis of 16S rRNA phylotypes from MH, ML, and MC
CD-Hit and cluster analysis weighted UNIFRAC analysis
MC
MC
MH + ML MH + ML
Rarefaction curves (97% cutoff) of microbiota from data pooled by line
Number of sequences
Phyl
otyp
es
MCMHML
Selective breeding compositional changes in gut microbiome (abundance of taxa)
Compositional changes contributed to phenotype
Statistics and BioinformaticsSteve Kachman (STATS)
Etsuko Moriyama (BioSci)
Mouse GenomicsDaniel Pomp
(Univ. of North Carolina)
What about direct evidence?
If there is significant effect of host genotype, then it should behaveas a polygenic phenotype: microbiome composition should co-segregate with multiple genomic markers in breeding populations
X
F1
Genotyping SNPs
Phenotyping454 sequencing16S rRNA from poops
QTL mapping to identify genetic architecture controlling Composition of the gut microbiome
F4
What is a trait with respect to gut microbiome?
1. Relative abundance of individual taxonomic ranks
2. Groups of taxa with positive or negative correlation
1 9 17 25 33 41 49 57 65 73 81 89 97 1051131211291371451531610
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Epsilonproteobacteria Deltaproteobacteria Alphaproteobacteria Gammaproteobacteria Betaproteobacteria Actinobacteria Thermodesulfobacteria Aquificae Flavobacteria Sphingobacteria Bacteroidetes Mollicutes Bacilli Erysipelotrichi Clostridia
Outbred ICR Base population
High voluntary wheel running
30 Generations of Selective breeding
HR mice:Higher VO2 MAXReduced Fatness Higher muscle glycogenHigher glycolytic and mitochondrialEnzyme activities
F4 Mapping population
>800 animalsWeaned at 3 weeks and caged by gender
7-8 weeks exercise cages Fecal samples collected at day 1 and day 6In exercise cages
Genotyping 768 fully informative markers Between ICR and B6 (present study stage at 550 QC’d Markers)
Phenotyping 10,000 454 reads from each animal using V1-V2Region, Taxonomy-assignment (RDP CLASSIFIER), normalized as proportion of total reads
QTLs mapped from 200 animals of the F4 cross
10 QTLs mapping to 7 chromosomes 4 different “compositional phenotypes”
Sometimes, you get lucky…QTLs on chromosome 15 control colonization by Helicobacter
Experiment N Sex SNPs Diet Parent Genetic of Origin Diversity
1a) C57 x HR F4 800 Both 768 Regular Y Low1b) C57 x HR F10 400 Both 50 per QTL High Fat vs. Reg Y Low2) Phenome Lines 400 Both 600,000+ Regular N Moderate3) Collaborative 1600 Both 600,000+ High Fat vs. Reg Y High Cross
Experiment N Sex SNPs Phenotypes Parent of Origin
Collaborative 1000 Both 600,000+ Cancer Y Cross
Roadmap for the next two years
Microbiome analysis (Class level) of 700 animals from the F4 mapping population
Are strong effects of host genetics conserved in plants?
Plants also susceptible to infectious disease Microbiome of phylloplane (epiphytes and endophytes) May play protective role Much more prone to environmental variation?
Maize genetic resource populations:Nested Association Mapping (NAM) RILs from crosses of B73 X 25 other Inbred lines
Preliminary evaluation: 27 Inbred lines = parental inbreds of the NAM collection
Sample unit = 3 plants per pot, 3 pots per line
Leaves harvested at 14 days post planting and phylloplane bacteria removed by soaking
16 16 16 17 17 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 23 23 23 24 24 24 25 25 25 26 26 26 12 12 12 13 13 13 14 14
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
GammaproteobacteriaBetaproteobacteria
Classes of Proteobacteria showing statistically significant effects of breeding line from Maize Nested Association Mapping lines
Rel
ativ
e ab
unda
nce
Inbred Line
16 16 16 17 17 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 23 23 23 24 24 24 25 25 25 26 26 26 12 12 12 13 13 13 14 140
0.1
0.2
0.3
0.4
0.5
0.6
ComamonadaceaeXanthomonadaceaeSphingomonadaceaeBradyrhizobiaceaeStreptococcaceae
Bacterial Families showing statistically significant effects of breeding line from Maize Nested Association Mapping lines
Secreted effectorCarbohydrate transportAmino acid metabolism Spore formation
Taxon-based mapping
Function-based mapping
ShotgunLibrary
Metagenome-based mapping
454 Pyrosequencing
Orthologous gene families
Glucose transport
DNA Replication
Capsule polysaccharide
AmylaseAmino acid transport
Species ASpecies BSpecies CSpecies D
The SEED or Pfams
Best Hit
Taxonomic assignment
Functionalassignment
The MG RAST Pipeline:
Functional for low- Throughput metagenomics
Computational Bottleneck
DN
A replication
Protein transport
Protein secretion
Com
plex CH
O transport
Disaccharide transport
Cell division
Motility
Relative abundance
HealthDisease
Taxonomy inferred from best-hit of Metagenome data using the SEED database
Taxonomy inferred from rRNA reads fromMetagenome data using RDP ribosomal database
Host Metabolic effects
Gut microbiota
Disease
Host Metabolic effects
(microbial functions)Amino acid metabolismCarbohydrate metabolism
Disease
Environmental effects
Microbiota
Ecosystemtraits
Environmental effects
(microbial functions)Amino acid metabolismCarbohydrate metabolism
Ecosystemtraits
Environment 1
Environment 2
What factors influence composition—how much “G” and how much “E”? Are there Keystone species? Mutualists? Engineers? Indicators? How do aberrations arise in composition? What is more important, species composition or function?
Role for Computational, Mathematical, and Statistical Modeling
1. Develop models that can predict how microbial communities will respondto perturbation
Therapeutics (e.g. antibiotics) Prebiotics and Probiotics Interventions (chemotherapy) Biologicals (e.g. anti-TNF-alpha) Dietary variables (can diet overcome genetic predisposition and vice versa)
2. Develop models that use microbial communities as predictors of Ecosystem health and performance Climate change
Community composition 16S Microbiome
Community Genetic Potential Metagenomics and Metaproteomics
Community Physiology Metabolomics (nanoscale??)
Community Dynamics Microbiomics + FISH
Community interactions Microbiomics + FISH
Drilling down through complex microbial communities
Energy metabolism Merlyn Nielson Larry Harshman An Sci BioSci
GI Microbiology Andy Benson Jens Walter Robert Hutkins Rod Moxley Food Sci Food Sci Food Sci VBS
Physiology and Nutrition Tim Carr Tom Burkey Ji-Young LEE NUTR An Sci NUTR
Statistics and BioinformaticsSteve Kachman (STATS)
Etsuko Moriyama (BioSci)
Mucosal ImmunologyDan Peterson
Food Sci
Mouse GenomicsDaniel Ciobanu Daniel Pomp David Threadgill
UNL UNC NCSU
Steve Kachman Etsuko Moriyama UNL Statistics UNL School of Biological Sci
Fangrui Ma The Nguyen16S rRNA analysis IT/ database programmingPipelines
Srinivas Aluru Pat Schnable ISU Computer Science ISU Agronomy
Xiao YangISU Computer Science
Ryan Legge454 sequencingData analysis
Daniel PompMouse GenomicsUNC Chapel Hill