pizza club - may 2016 - shaman
TRANSCRIPT
Shaman NarayanasamyEco-Systems Biology Group
Supervisors: Paul Wilmes and Jorge Goncalves
PHD-2014-1/7934898
Computational approaches to predict bacteriophage-host relationships
Robert A. Edwards, Katelyn McNair, Karoline Faust, Jeroen Raes, Bas E. DulithReview Article FEMS Microbiology (9 December 2015)
Computational Biology Pizza Club series: 25th May 2016
2
Article overview
• Metagenomics for identification of viral-host associations• Introduction of wet-lab methods• Focused on bacteriophages (phages) and bacterial
interactions• Benchmark data: 820 bacteriophages, associated hosts and
publicly available metagenomic datasets• Assessment of predictive power of in silico phage-host
signals:– Abundance-based methods– Sequence homology based methods
– Genetic homology– CRISPRs– Oligonucleotide profiles
– Compositional based methods
3
Introduction
4
Introduction
Infection!
Membrane receptor
Figure adapted and modified from Gelbart & Knobler et al. (2008)
5
Introduction
Infection!
Resistance
Defense!!!• Membrane receptor
mutation• CRISPR-Cas• Restriction-modification
Membrane receptor
Figure adapted and modified from Gelbart & Knobler et al. (2008)
6
Introduction
Infection!
Resistance
Defense!!!• Membrane receptor
mutation• CRISPR-Cas• Restriction-modification
Membrane receptor
Mutation
Figure adapted and modified from Gelbart & Knobler et al. (2008)
7
Introduction
Infection!
Resistance Fitness
Defense!!!• Membrane receptor
mutation• CRISPR-Cas• Restriction-modification
Membrane receptor
Mutation
Figure adapted and modified from Gelbart & Knobler et al. (2008)
8
Introduction
Infection!
Resistance Fitness
9
Introduction
Infection!
Resistance Fitness
10
Introduction
Infection!
Resistance Fitness
11
Introduction
Competition
Infection!
Resistance Fitness
Experimental approaches for phage isolation
12
• Spot and plaque assays• Liquid assays• Viral tagging • Microfluidic PCR• PhageFISH• Single cell sequencing• Hi-C sequencing
Spot and plaque assays
13
Requires• Pure culture of host• Pure/environmental culture of phage
Disadvantages• Low throughput• Host isolation required
Photo adapted and modified from http://www.slideshare.net/Adrienna/global-food-safety2013
Liquid assays
14
Requires• Pure culture of host• Pure culture of phage
Disadvantages• Use of OD readout *• Low sensitivity (single endpoint values) *• Host and phage isolate required* Use redox dye, Omnilog platform and real-time/semiquantitative PCRFigure adapted and modified from Goldberg et al. (2014)
Viral tagging
15
Requires• Pure culture of host• Pure culture/environmental isolate of phages• Cell sorter (FACS..?)
Disadvantages• Host isolate required
Figure adapted and modified from http://jgi.doe.gov/dyeing-learn-marine-viruses/
Microfludic PCR
16
Requires• Environmental microbial community sample• PCR primers for target marker genes
Disadvantages• Relies on marker genes for design of PCR primers
Figure adapted and modified from Dang & Sullivan (2014)
PhageFISH
17Figures adapted and modified from Dang & Sullivan (2014) and Allers et al. (2013)
Requires• Environmental microbial community sample• PCR primers for target marker genes
Disadvantages• Relies on marker genes for FISH probe design
time
Single cell sequencing
18
Requires• Single microbial cell from environmental microbial community sample
Disadvantages• Biased towards most abundant environmental microbe
Figure adapted and modified from Lasken (2012)
Benchmark dataset
19
820 complete phage genomes
Field: “host”
153 complete bacterial genomes
NCBI RefSeq
Quality assessment of predictions: ROC curves
20
• Assessment of binary classifier (Host/Not Host)• Does not require cut-off value• Based on the rate of accumulation of true and false positives• True positive rate (Sensitivity), False positive rate (1-Specificity)
TPr = TP/TP + FN FPr = TN/TN + FP
Computational methods for phage-host signal prediction
21
• Abundance profiles• Genetic homology• CRISPR• Exact matches• Oligonucleotide profiles
Abundance profiles
22
• Stern et al. (2012)– Good correlation of phage-host abundance across human gut microbiome (metagenomes)
• Reyes et al. (2013)– 2/5 phages correspond to decrease in host abundance (mouse gut)
• Nielsen et al. (2014)– Occurrence of phage like gene sets corresponding to host (bacterial) gene set– Includes known phage-host pairs
• Dulith et al. (2014)• 22% metagenomic reads may be of phage origin
• Lima-Mendez et al. (2015); TARA Oceon Survey
Figure adapted and modified from Nielsen et al. (2014) and Edwards et al. (2015)
• Improves with the availability of multiple samples from same/similar environments• High spatio/temporal stratification; will improve as publicly available metagenome collection increases• Time series datasets potentially used for time lagged associations• Complicated and non-linear dynamics incompatible with straightforward correlation• 12% correct identification of host
Genetic homology
23
• Phage-host homology is an indication of recent common ancestry, implying interaction• Host genes may benefit phages!
• Auxilary metabolic genes
• Modi et al. (2013) and Dulith et al . (2014)
Figure adapted and modified from Edwards et al. (2015)
• Amino acid based searches applicable for distantly related organisms (29.8%)• Nucleotide based searches more accurate (38.5%)• 30% host identified
24
CRISPR-Cas
Phage genome 2Phage genome 1
R R R RRRS1 S2 S5S3 S4
R: RepeatSx: Spacers
CRISPR
Bacterial genome cas gene CRISPR
CRISPRs
25
• Studies:– Human gut microbiome; Stern et al. (2012), Minot et al. (2013)– Acidophilic biofilms; Andersson & Banfield (2008)– Cow rumen; Berg Miller et al. (2012)– Arctic glacial ice and soil; Sanguino et al. (2015)– Marines environments; Anderson, Brazelton & Baross (2011), Cassman et al. (2012)– Activated sludge; Narayanasamy et al. (unpublished)
• Little to no homology to known sequence• Environmentally dependent• Spacers are rapidly replaced• Most suitable for recent phage-host interactions• Not all prokaryotes encode CRISPRs (bacteria; 48 ± 30%, archaea; 63 ± 30%)• Highly specific, but not sensitive• Degeneracy of up to 13 mismatches allowed (Fineran et al., 2014)
Figure adapted and modified from Edwards et al. (2015)
Exact matches
26
• Integration of phage to host via homologous recombination• attp (POP’) on phage genome and attb (BOB’) on bacterial genome • Common identical core sequence (2-15 bp) between phage and host• Adjacent to integrase gene in phage genome, near tRNA gene in bacterial genomes
Figure adapted and modified from Edwards et al. (2015)
• Longer matches more reliable• Up to 40% matches correct prediction
Contig with cas gene
Contig with known phage gene
Contig with CRISPR locus
Oligonucleotide profiles
27
• Phages ameliorate genomic oligonucleotides profiles according to host• Avoid recognition by restriction enzymes• Adjustment of codon usage to match available host tRNAs• Ogilvie et al. (2013) identified 408 metagenomic fragments with phage like properties (4mers)
Figure adapted and modified from Narayanasamy et al. (unpublished) and Edwards et al. (2015)
• Profiles cannot be too sparse (shorter kmers)• K=3-8 predicted 8-17% correct hosts• Codon usage predicted ~10% hosts correctly• GC content not informative
Summary and overview
28
Signal category Approach Performance Comments
Abundance profiles Phage-host coabundanceprofilesAssociation by correlation
9.5% non-linear dynamics confound correlations
Genetic homology Phage-host nucleotide and protein sequence homology
38.5% - blastn29.8% - blastx
Depends on database
CRISPRs Spacers alignments to phage genomes
15.1% - most similar21.3% - highest
Occurrence of CRISPR system (~40% bacteria, ~70% archaea)No matchesNot sensitive
Exact matches ** Exact matches of phage-host genomes
40.5% Short exact matches may be random
Oligonucleotide profiles
Similarity of kmer profilesof phage-host
17.2% - 4mer10.4% - codon
Table adapted and modified from Edwards et al. (2015)
Summary and overview
29
• Blastn and exact matches provide strongest signal• Most methods predict between 1 - 4 bacteria as most likely host (better than random)• Significant host genome fraction required (except for abundance-based method)• Current knowledge still limited• Phage host range (highly specific vs brad range)• New methods and technology
Figure adapted and modified from Edwards et al. (2015)
Thank you!
PHD-2014-1/7934898