cross-domain and within-domain horizontal gene transfer: implications for bacterial pathogenicity...
TRANSCRIPT
Cross-Domain and Within-Domain Horizontal Gene Transfer: Implications
for Bacterial Pathogenicity
1. Pathogenomics Project
2. Cross-Domain Horizontal Gene Transfer Analysis
3. Horizontal Gene Transfer: Identifying Pathogenicity Islands
Pathogenomics
Goal:
Identify previously unrecognized mechanisms of microbial pathogenicity using a combination of informatics, evolutionary biology, microbiology and genetics.
Explosion of data
23 of the 37 publicly available microbial genome sequences are for bacterial pathogens
Approximately 21,000 pathogen genes with no known function!
>95 bacterial pathogen genome projects in progress …
The need for new tools
Prioritize new genes for further laboratory study
Capitalize on the existing genomic data
Bacterial Pathogenicity
Processes of microbial pathogenicity at the molecular level are still minimally understood
Pathogen proteins identified that manipulate host cells by interacting with, or mimicking, host proteins
Yersinia Type III secretion system
Approach
Idea: Could we identify novel virulence factors by identifying bacterial pathogen genes more similar to host genes than you would expect based on phylogeny?
Prioritize for biological study. - Previously studied in the laboratory? - Can UBC microbiologists study it? - C. elegans homolog?
Search pathogen genes against databases. Identify those with eukaryotic similarity.
Evolutionary significance. - Horizontal transfer? Similar by chance?
Modify screening method /algorithm
Approach
Genome data for…
Anthrax Necrotizing fasciitis Cat scratch disease Paratyphoid/enteric feverChancroid Peptic ulcers and gastritisChlamydia Periodontal diseaseCholera PlagueDental caries PneumoniaDiarrhea (E. coli etc.) SalmonellosisDiphtheria Scarlet feverEpidemic typhus ShigellosisMediterranean fever Strep throatGastroenteritis SyphilisGonorrhea Toxic shock syndromeLegionnaires' disease Tuberculosis Leprosy TularemiaLeptospirosis Typhoid feverListeriosis UrethritisLyme disease Urinary Tract InfectionsMeliodosis Whooping cough Meningitis +Hospital-acquired infections
Bacterial Pathogens
Chlamydophila psittaci Respiratory disease, primarily in birdsMycoplasma mycoides Contagious bovine pleuropneumoniaMycoplasma hyopneumoniae Pneumonia in pigsPasteurella haemolytica Cattle shipping feverPasteurella multicoda Cattle septicemia, pig rhinitisRalstonia solanacearum Plant bacterial wiltXanthomonas citri Citrus cankerXylella fastidiosa Pierce’s Disease - grapevines
Bacterial wilt
World Research Community
ApproachPrioritized candidates
Study function of homolog in model host (C. elegans)
Study function of gene in bacterium.
Infection of mutant in model host
C. elegansDATABASE
Collaborations with others
Informatics/Bioinformatics• BC Genome Sequence Centre• Centre for Molecular Medicine
and Therapeutics
Evolutionary Theory• Dept of Zoology
• Dept of Botany
• Canadian Institute for Advanced Research
Pathogen Functions• Dept. Microbiology
• Biotechnology Laboratory
• Dept. Medicine
• BC Centre for Disease Control
Host Functions• Dept. Medical Genetics
• C. elegans Reverse Genetics Facility
• Dept. Biological Sciences SFU
Interdisciplinary group
Coordinator
Pathogenomics Database: Bacterial proteins with unusual similarity with Eukaryotic proteins
Haemophilus influenzae Rd-KW20 proteins most strongly matching eukaryotic proteins
PhyloBLAST – a tool for analysis Brinkman et al. (2001) Bioinformatics. In Press.
Trends in the Initial Analysis
• Identifies the strongest cases of lateral gene transfer between bacteria and eukaryotes
• Most common “cross-domain” horizontal transfers:
Bacteria Unicellular Eukaryote
• Identifies nuclear genes with potential organelle origins
• A control: Method identifies all previously reported Chlamydia trachomatis “eukaryote-like” genes.
First case: Bacterium Eukaryote Lateral Transfer
0.1
Bacillus subtilis
Escherichia coli
Salmonella typhimurium
Staphylococcua aureus
Clostridium perfringens
Clostridium difficile
Trichomonas vaginalis
Haemophilus influenzae
Acinetobacillus actinomycetemcomitans
Pasteurella multocida
N-acetylneuraminate lyase (NanA) of the protozoan Trichomonas vaginalis is 92-95% similar to NanA of Pasteurellaceae bacteria.
de Koning et al. (2000) Mol. Biol. Evol. 17:1769-1773
N-acetylneuraminate lyase – role in pathogenicity?
Pasteurellaceae
•Mucosal pathogens of the respiratory tract
T. vaginalis
•Mucosal pathogen, causative agent of the STD Trichomonas
N-acetylneuraminate lyase (sialic acid lyase, NanA)
Involved in sialic acid metabolism
Role in Bacteria: Proposed to parasitize the mucous membranes of animals for nutritional purposes
Role in Trichomonas: ?
Hydrolysis of glycosidic linkages of terminal sialic residues in glycoproteins, glycolipids SialidaseFree sialic acid
Transporter
Free sialic acid NanA
N-acetyl-D-mannosamine + pyruvate
Another case: A Sensor Histidine Kinase for a Two-component Regulation System
Signal Transduction
Histidine kinases common in bacteria
Ser/Thr/Tyr kinases common in eukaryotes
However, a histidine kinase was recently identified in fungi, including pathogens Fusarium solani and Candida albicans
How did it get there?
Candida
Neurospora crassa NIK-1
Fusarium solani FIK2 Streptomyces coelicolor SC4G10.06c
Candida albicans CaNIK1
Escherichia coli RcsC
Erwinia carotovora RpfA / ExpSEscherichia coli BarASalmonella typhimurium BarA
Pseudomonas aeruginosa GacS
Pseudomonas fluorescens GacS / ApdAPseudomonas tolaasii RtpA / PheN
Pseudomonas syringae GacS / LemA
Pseudomonas viridiflava RepAAzotobacter vinelandii GacS
0.1
Streptomyces coelicolor SC7C7.03
Xanthomonas campestris RpfCVibrio cholerae TorS
Escherichia coli TorS
Fusarium solani FIK1Fungi
Pseudomonas aeruginosa PhoQ
100
100
51100
100
100
100
100100
100
100
100
100
86
54
39
100
100
Streptomyces Histidine Kinase. The Missing Link?
virulence factor=
virulence factor ?
Reduced virulence of a Pseudomonas aeruginosa transposon mutant disrupted in the
histidine kinase gene gacS
Groups of 7-8 neutropenic mice challenged on two separate occasions with doses ranging from 8 to 8 x 106 bacteria
Wildtype LD50 = 10 1 bacteria
gacS mutant LD50 = 7,500 100 bacteria
750-fold increase
Recent report: P. aeruginosa eukaryote-type Phospholipase plays a role in infection
Wilderman et al. 2001. Mol Microbiol 39:291-304
• Phospholipase D (PLDs) virtually ubiquitous in eukaryotes (relatively uncommon in prokaryotes)
• P. aeruginosa expresses PLD with significant (1e-38 BLAST Expect) similarity to eukaryotic PLDs
• Part of a mobile 7 kb genetic element
• Role in P. aeruginosa persistence in a chronic pulmonary infection model
Eukaryote Bacteria Horizontal Transfer?
0.1Rat
Human
Escherichia coli
Caenorhabditis elegans
Pig roundworm
Methanococcus jannaschii
Methanobacterium thermoautotrophicum
Bacillus subtilis
Streptococcus pyogenes
Aquifex aeolicus
Acinetobacter calcoaceticus
Haemophilus influenzae
Chlorobium vibrioforme
E. coli Guanosine monophosphate reductase 81% similar to corresponding enzyme in humans and rats
Role in virulence not yet investigated.
Expanding the Cross-Domain Analysis
• Identify cross-domain lateral gene transfer between bacteria, archaea and eukaryotes
• No obvious correlation seen with protein functional classification
• Most cases: no obvious correlation seen between “organisms involved” in potential lateral transfer
Exceptions:
– Unicellular eukaryotes
– “Organelle-like” proteins in Rickettsia and Synechocystis
– “Plant-like(?)” genes in the obligate intracellular bacteria Chlamydia
“Plant-like” genes in Chlamydia
Enoyl-acyl carrier protein reductase (involved in lipid metabolism) of Chlamydia trachomatis is similar to those of Plants
Organelle relationship?
Notably more similar to plants than Synechocystis
0.1
Aquifex aeolicus
Haemophilus influenza
Escherichia coli
Anabaena
Synechocystis
Chlamydia trachomatis
Petunia x hybrida
Nicotiana tabacum
Brassica napus
Arabidopsis thaliana
Oryza sativa
100
100
100
96
63
64
52
83
99
Eukaryote Top Hits in Bacterial Genomes (after excluding relatives of the same Family)
0
50
100
150
200
250
300
0 1000 2000 3000 4000 5000 6000
No. of Proteins in each Bacterial Genome
No
. o
f E
uka
ryo
tic
To
p H
its
Synechocystis
Eukaryote Top Hits in Bacterial Genomes (excluding "Family" and Synechocystis )
0
10
20
30
40
50
60
70
80
0 1000 2000 3000 4000 5000 6000
No. of Proteins
No
. o
f E
uka
ryo
tic
To
p H
its
Rickettsia and Chlamydia
0
200
400
600
800
1000
1200
1400
1600
1800
0 1000 2000 3000 4000 5000 6000
No. of Proteins
No
of
Pro
tein
s w
ith
Eu
kary
oti
c H
om
olo
gy
Proteins Homologous to Eukaryote Proteins (according to BLAST Exp=1)
Horizontal Gene Transfer and Bacterial Pathogenicity
Transposons: ST enterotoxin genes in E. coli
Prophages:Shiga-like toxins in EHECDiptheria toxin gene, Cholera toxinBotulinum toxins
Plasmids:Shigella, Salmonella, Yersinia
Horizontal Gene Transfer and Bacterial Pathogenicity
Pathogenicity Islands:
Uropathogenic and Enteropathogenic E. coliSalmonella typhimuriumYersinia spp.Helicobacter pyloriVibrio cholerae
Pathogenicity Islands
Associated with
– Atypical %G+C– tRNA sequences– Transposases, Integrases and other mobility genes– Flanking repeats
IslandPath: Identifying Pathogenicity Islands
Yellow circle = high %G+C
Pink circle = low %G+C
tRNA gene lies between the two dots
rRNA gene lies between the two dots
Both tRNA and rRNA lie between the two dots
Dot is named a transposase
Dot is named an integrase
Neisseria meningitidis serogroup B strain MC58 Mean %G+C: 51.37 STD DEV: 7.57
%G+C SD Location Strand Product 39.95 -1 1834676..1835113 + virulence associated pro. homolog 51.96 1835110..1835211 - cryptic plasmid A-related 39.13 -1 1835357..1835701 + hypothetical 40.00 -1 1836009..1836203 + hypothetical 42.86 -1 1836558..1836788 + hypothetical 34.74 -2 1837037..1837249 + hypothetical 43.96 1837432..1838796 + conserved hypothetical 40.83 -1 1839157..1839663 + conserved hypothetical 42.34 -1 1839826..1841079 + conserved hypothetical 47.99 1841404..1843191 - put. hemolysin activ. HecB 45.32 1843246..1843704 - put. toxin-activating 37.14 -1 1843870..1844184 - hypothetical 31.67 -2 1844196..1844495 - hypothetical 37.57 -1 1844476..1845489 - hypothetical 20.38 -2 1845558..1845974 - hypothetical 45.69 1845978..1853522 - hemagglutinin/hemolysin-rel. 51.35 1854101..1855066 + transposase, IS30 family
Variance of the Mean %G+C for all Genes in a Genome: Correlation with bacteria’s clonal nature
non-clonal clonal
Variance of the Mean %G+C for all Genes in a Genome
Is this a measure of clonality of a bacterium?
Are intracellular bacteria more clonal because they are ecologically isolated from other bacteria?
Pathogenomics Project: Future Developments
• Identify eukaryotic motifs and domains in pathogen genes
• Threader: Detect proteins with similar tertiary structure
• Identify more motifs associated with• Pathogenicity islands• Virulence determinants
• Functional tests for new predicted virulence factors
• Expand analysis to include viral genomes
• Fundamental research
• Interdisciplinary
• Lack of fit with alternative funding sources
Peter Wall Major Thematic Grant
Pathogenomics group Ann M. Rose, Yossef Av-Gay, David L. Baillie, Fiona S. L.
Brinkman, Robert Brunham, Rachel C. Fernandez, B. Brett Finlay, Hans Greberg, Robert E.W. Hancock, Steven J. Jones, Patrick Keeling, Audrey de Koning, Don G. Moerman, Sarah P. Otto, B. Francis Ouellette, Ivan Wan.
www.pathogenomics.bc.ca
Universal role of this Histidine Kinase in pathogenicity?
Pathogenic Fungi•Senses change in osmolarity of the environment•Role in hyphal formation pathogenicity
Pseudomonas species plant pathogens•Role in excretion of secondary metabolites that are virulence factors or antimicrobials
Virulence factor for human opportunistic pathogen Pseudomonas aeruginosa?
A Histidine Kinase in Streptomyces.The Missing Link?
0.1
Neurospora crassa NIK-1
Streptomyces coelicolor SC7C7
Fusarium solani FIK
Candida albicans CHIK1
Erwinia carotovora EXPS
Escherichia coli BARA
Pseudomonas aeruginosa LEMA
Pseudomonas syringae LEMA
Pseudomonas viridiflava LEMA
Pseudomonas tolaasii RTPA
Euykaryotic top hits in bacterial genomes(after excluding "tertiary" relatives)
0
50
100
150
200
250
300
350
0 1000 2000 3000 4000 5000 6000
No. of Proteins
No
. of
Eu
kary
ote
Hit
s
Synechocystis
Rikettsia