computational prediction and characterization of genomic islands: insights into bacterial...
TRANSCRIPT
Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity
Morgan G.I. Langille
Department of Molecular Biology & BiochemistrySimon Fraser University
http://tinyurl.com/genomic-islands
2
Genomic Island History
Early 1990’s clusters of virulence genes were found in E. coli (Hacker, et al.,1990)
Pathogenicity Islands (PAIs) Clusters of genes that are associated with bacterial
virulence
Genomic Islands (GIs) (Hacker, et al., 2000)
Segments of a genome that are thought to have originated from a horizontal transfer event
3
Genomic Island Interest
Pathogenicity Islands Adhesins
Fimbriae, intimin, etc. Secretion Systems
Type III and Type IV Toxins
Hemolysins, Pertussis toxin Invasins, Modulins, and Effectors
Antibiotic Resistance Islands Metabolic Islands
4
Genomic Island Interest
5
6
Methods for Predicting GIs
1. Sequence based Abnormal sequence composition
GC% bias, dinucleotide bias, codon bias, etc
Genomic features associated with mobile genetic elements Direct repeats, IS elements, presence of tRNA and
mobility genes (Integrases, transposases, etc.)
Methods of Predicting GIs
2. Comparative genomics based Identify genomic regions with anomalous
phylogenetic patterns Requires multiple genomes
8
Previous state of GI identification
1. Sequence based methods Numerous methods and constant improving of
algorithm design Not very user friendly and accuracy of various
methods not well described
2. Comparative based methods Used by many researchers, but with no
established method (only in-house scripts) Limited access to user friendly tools for this type
of analysis
9
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
10
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
11
12
Mauve-whole genome aligner
Allows genome arrangements and inversions Fast – Aligns two genomes < 15 minutes Command line accessible http://gel.ahabs.wisc.edu/mauve/
(Darling, et al., 2004)
13
IslandPick: Outline
Run Mauve
Mauve (A & B)
Extract unique regions
Mauve (A & C) Mauve (A & D)
Genome D
Putative Genomic IslandsBLAST
Identify overlapping unique regions
Query Genome AGenome B Genome C
Genome D
14
Selecting Comparative Genomes
14
Run Mauve
Mauve (A & B)
Extract unique regions
Mauve (A & C) Mauve (A & D)
Genome D
Putative Genomic IslandsBLAST
Identify overlapping unique regions
Genome B Genome CGenome D
Comparative Genome Selection (using CVTree distances)
Query Genome A
15
What genomes to use?
We want to compare the query genome to other comparative genomes within certain evolutionary distances
Need a phylogenetic tree or a distance matrix for all sequenced bacteria species
16
CVTree
Uses matching K-strings between the proteomes of two organisms
Constructs phylogenetic trees without alignment
Avoids choosing genes for phylogenetic reconstruction
Web Server http://cvtree.cbi.pku.edu.cn
Downloadable command line executable
(Qi, et al., 2004)
Example: Pseudomonas Tree
17
0.227
0.256
0.397
0.393
0.411
0.428
0.430
0
0.481
P. fluorescens Pf-5
P. putida KT2440
P. fluorescens PfO-1
P. syringae tomato DC3000
P. syringae phaseolicola 1448A
P. syringae syringae B728a
P. aeruginosa PAO1
P. aeruginosa PA14
Acinetobacter ADP1
Tree built using conserved genes, Omp85 & CarB, and maximum parsimony
CVTree distances from P.syringae B728a are shown
18
Determining Distance Cutoffs
Given the distances between any two species, how do we choose comparison genomes?
Maximum Distance Cutoff Eliminates the use of genomes that have diverged too
much (noise)
Minimum Distance Cutoff Eliminates the use of genomes that have not diverged
enough (very closely related strains)
Minimum Number of Genomes Eliminates the use of too few comparative genomes
0.227
0.256
0.397
0.393
0.411
0.428
0.430
0
0.481
P. fluorescens Pf-5
P. putida KT2440
P. fluorescens PfO-1
P. syringae tomato DC3000
P. syringae phaseolicola 1448A
P. syringae syringae B728a
P. aeruginosa PAO1
P. aeruginosa PA14
Acinetobacter ADP1
19
Example: Pseudomonas Tree
Minimum Distance Cutoff = 0.10
Maximum Distance Cutoff = 0.42
Minimum Number of Genomes = 3
20
Predicting Similar Aged GIs
GI I
nser
tion
Query Genome
1 genome < distance X
Query Genome
GI I
nser
tion
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
21
Accuracy of GI methods Sequence based GI prediction methods
Only require a single genome Can easily make false predictions
Highly expressed genes May miss predictions
Amelioration of DNA to host genome Source genome has same composition as host genome
Usually evaluate accuracy using simulated horizontal gene transfer events or small datasets of verified GIs
IslandPick is independent of sequence composition methods generated a “positive” dataset of islands
22
Developing a Negative Dataset
To identify false positives we need a “negative” dataset that does not contain GIs
Identify regions that are conserved across several genomes using Mauve whole genome alignment
Use the same genomes as selected by IslandPick with one additional cutoff
23
24
Negative Dataset
Query Genome
1 genome > distance X
GI I
nser
tion
Query Genome
GI I
nser
tion
IslandPick Cutoffs
25
26
•118 chromosomes •771 GIs• ~100 genes/strain
173 chromosomes
736 chromosomes
(Langille, et al., 2008)
GI Prediction Accuracy
27
PositiveDataset
NegativeDataset
PredictedDataset
TP FP
FN
Precision = TP / (TP + FP)Recall = TP / (TP + FN)
TN
28
GI Prediction Accuracy
Tool
Average number of nucleotides in GIs per genome
(kb)
Precision RecallOverall
Accuracy
SIGI-HMM 233 92 33.0 86
IslandPath/Dimob
171 86 36 86
PAI IDA 163 68 32 84
Centroid 171 61 28 82
IslandPath/Dinuc
444 55 53 82
Alien Hunter 1265 38 77 71
Literature* 639 100 87 96
(Langille, et al.,2008)
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
29
IslandViewer (Langille, et al., 2009)
Website that integrates the most accurate GI prediction programs SIGI-HMM, IslandPath-DIMOB, and IslandPick
Genomic island prediction pre-calculated for all genomes Automatically updated monthly
User genome submission available
IslandPick can be run using manually selected comparison genomes
Download data for a genomic island, a chromosome, or entire dataset
http://www.pathogenomics.sfu.ca/islandviewer/
30
31
32
33
34
IslandPick – Manual genome selection
35
User Genome Submission
36
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
37
Pseudomonas aeruginosaLiverpool Epidemic Strain (LES)
Highly successful at colonizing cystic fibrosis (CF) patients
Has replaced previously established strains
Caused infections of non-CF patients
Can cause greater morbidity in CF than other strains of P. aeruginosa
(Salunkhe, et al., 2005)38
LES Analysis
39
Genome sequenced by Sanger Centre
I led annotation of the genome and analysis of GIs
6 Prophages
5 Genomic Islands
(Winstanley, Langille, et al., 2008)
Signature-tagged mutagenesis (STM) STM is a
method to identify genes associated with pathogenesis
LES used in a chronic rat lung infection model
47 genes identified by STM
5 of these genes are within GIs and prophage regions
http://www.traill.uiuc.edu/uploads/porknet/papers/LitchtensteigerPaper.pdf
LES Prophage
41
PLES 15491 PLES 15961
4
PLES 25021 PLES 25661
5
Duplication 2
Duplication 1PLES 13201 PLES 13711
3
Duplication 2
2PLES 8321PLES 7891
Duplication 1
PLES 6091 PLES 6271
1
PLES 41181 PLES 41281
6
Pseudomonas Phage F10
Pseudomonas Phage D3112
Pyocin R2 Pseudomonas Phage D3
STM Mutations
Pseudomonas Phage Pf1
5 kb
PLES 15491 PLES 15961
4
PLES 25021 PLES 25661
5
Duplication 2
Duplication 1PLES 13201 PLES 13711
3
Duplication 2
2PLES 8321PLES 7891
Duplication 1
PLES 6091 PLES 6271
1
PLES 41181 PLES 41281
6
Pseudomonas Phage F10
Pseudomonas Phage D3112
Pyocin R2 Pseudomonas Phage D3
STM Mutations
Pseudomonas Phage Pf1Pseudomonas Phage F10Pseudomonas Phage F10
Pseudomonas Phage D3112Pseudomonas Phage D3112
Pyocin R2Pyocin R2 Pseudomonas Phage D3Pseudomonas Phage D3
STM Mutations
Pseudomonas Phage Pf1Pseudomonas Phage Pf1
5 kb5 kb
(Winstanley, Langille, et al., 2008)
LES Genomic Islands
42
(Winstanley, Langille, et al., 2008)
LES in-vivo competitive index
Mutants grown for 7 days in rat lung with the wild type LES
A CI of less than 1 indicates attenuation of virulence
4 genes within prophage and GIs had strong impact on competitiveness
43
(Winstanley, Langille, 2008)
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
44
Overview of CRISPRs
45
CRISPRs: Clustered regularly interspaced short palindromic repeats
Able to provide phage resistance and block conjugation
Thought to be similar to RNAi, except DNA (instead of RNA) is thought to be the target
CRISPRs and HGT
Previous studies have shown some evidence of HGT of CRISPRs Phylogenetic profiles of CAS genes
(Haft, et al., 2005) CRISPRs within 10 megaplasmids
(Godde, et al., 2006) CRISPRs within two prophage in Clostridium
difficile (Sebaihia, et al., 2006)
Analysis of CRISPRs and GIs had not been conducted previously
46
CRISPRs within GIs
Domain of Life
Number of Genomes
Number of GIs
Proportion of Genome in GIs
Total Number of CRISPRs
Expected CRISPRs in GIs
Observed CRISPRs in GIs
Significance (Chi-square Test)*
Archaea 49 298 3.7% 206 7.7 14 0.020
Bacteria 306 4874 6.4% 837 53.3 114 8.1x 10-18
Archaea & Bacteria
355 5172 6.1% 1043 64.0 128 1.6x 10-16
47
CRISPRs predictions were obtained from CRISPRdb, http://crispr.u-psud.fr/crispr/CRISPRHomePage.php
GI predictions were taken from the union of IslandPick, IslandPath-DIMOB, and SIGI-HMM
Number of CRISPRs inside and outside GIs were compared
CRISPRs are over-represented in GIs
Phage genes within GIs
Many GIs are known to contain phage genes What proportion of GI genes have links to phage? Identified genes with “phage” in their annotation within GIs
48
Genomic Regions
Number of ‘phage genes’Total number of genes in
region
Chi- Square
TestObserved Expected3
Inside GIs1 6990 1264.22 165784~0
Outside GIs1 12868 18593.78 2438303
35% of all ‘phage genes’ are within GIs (6% expected)
Phage genes are over-represented in GIs
Archaea and CRISPRs
Archaea Bacteria
Genomes containing a CRISPR 90% 40%
Proportion of phage genes 0.10% 0.79%
Proportion of GIs with a phage gene 5.1% 17.6%
49
Prevalence of CRISPRs in Archaea genomes could result in reduced
phage genes
GIs with CRISPRs and phage genes
Is there evidence supporting that some CRISPRs are being transferred by phage?
50
Genomic Regions
Number of ‘phage genes’Total number of genes in
region
Chi- Square
TestObserved Expected3
GIs containing CRISPR(s)2 13 4.5 1500
5.7 x 10-5
Outside GIs2 812 820.5 274073
GIs containing CRISPR(s) also contain an over-representation of phage genes -> suggesting that some CRISPRs are transferred by phage
CRISPR conclusions
CRISPR over-representation in GIs suggest that they are being horizontally transferred
Some GIs that contain CRISPRs may have phage origins
CRISPRs in Archaea could be limiting HGT by increasing resistance to phage
51
Conclusions
Several advances in GI computational prediction IslandPick, a novel automated comparative genomics
based GI prediction program Analysis of the accuracy of several sequenced based GI
prediction methods IslandViewer: An integrated interface for computational
identification and visualization of genomic islands
Insights into GI evolution and their pathogenicity P. aeruginosa LES – evidence that genomic islands and
prophage regions contain genes that provide a competitive advantage for infection in a chronic rat infection model.
CRISPRs and their association with genomic islands
52
53
Acknowledgements
SupervisorDr. Fiona Brinkman
Supervisor CommitteeDr. BaillieDr. Pio
P. aeruginosa LESCraig WinstanleyRoger LevesqueBob HancockNick Thomson