comparative bacterial genomics joão carlos setubal vbi/virginia tech for embo course florianopolis,...

72
Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Post on 23-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Comparative bacterial genomics

João Carlos SetubalVBI/Virginia Techfor EMBO course

Florianopolis, July 2008

Page 2: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Contents

• Tree of Life• Basic notions of genomics• Motivation for comparative genomics• Whole replicon alignment: pairwise and

multiple• Gene-centric comparisons• Orthology and Synteny• Exercises

April 21, 2023 JC Setubal 2

Page 3: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

April 21, 2023 3JC Setubal

Page 4: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Ciccarelli et al, Science, 2006

April 21, 2023 4JC Setubal

Page 5: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

5Williams, Sobral, and DickermanJBAC, 2007

proteobacteria

April 21, 2023 JC Setubal

Page 6: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Genomes

• The entire DNA complement of a single cell• Abstraction

– a string s in the alphabet = {A, C, G, T}– Example

CTTCCAGTTCAACCGGCCGGTCGTCGCGGACGACGCGGCCGCCGGCGCCGCGATGCTGGCGGACGTACCGCACACCCGCCCCATCTCCATCTTCGCTTC

April 21, 2023 6JC Setubal

Page 7: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Genome sizes

• Genomes are measured in – kb (kilo base pairs), Mb (mega), or Gb (giga)

• Viruses: |s| = [5 – 200] kb• Bacteria: |s| = [1 – 10] Mb• Eukaryotes: |s| = [10 Mb – 100 Gb]• Humans: 3 Gb• marbled lungfish: 130 Gb T. Gregory, www.genomesize.com

April 21, 2023 7JC Setubal

Page 8: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Famous bacteria

• Haemophilus influenzae (1.8 Mb)– Human pathogen, first genome to be sequenced (1995)

• Escherichia coli (4.6 Mb)– Human pathogen and model organism (1997)

• Agrobacterium tumefaciens (6 Mb)– Plant pathogen and biotechnology tool (2001)

April 21, 2023 8JC Setubal

Page 9: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

What is a gene

• A small substring of s that contains information

• Bacteria generally have 1 gene every 1 kb– 5 Mb genome = 5,000 genes

April 21, 2023 9JC Setubal

Page 10: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

>A small section of a genomeAGCTCGCGCTCCGCATCCATCCAGTAGGGTTCGGTGTCGACGAGCGTGCC

GTCCATATCCCAGAAGACGGCGGCCGGCATCGCGTGCGGAGTCAGTTCGG

TCACGGCTGACAAGTCTATCCCGGCGGCCCCGGGCCTATTCTTGAGGGAC

GGCGTCCTGACCGGTCGCCGGATGAAAGGACCAGAACGCCCCGTGACTGA

CGCGAACAGCATCCTCGGAGGGCGCATCCTCGTGGTGGCCTTCGAAGGGT

GGAACGACGCTGGCGAGGCCGCCAGCGGGGCCGTCAAGACGCTCAAGGAC

CAGCTGGATGTCGTCCCGGTCGCCGAGGTCGATCCCGAGCTGTACTTCGA

CTTCCAGTTCAACCGGCCGGTCGTCGCGGACGACGACGGCCGCCGGCGCC

TCATCTGGCCGTCCGCGGAGATCCTGGGCCCAGCTCGCCCCGGCGACACC

GGCGATGCGCGCCTGGACGCCACCGGCGCCAACGCGGGCAATATCTTCCT

TCTCCTCGGCACCGAGCCGTCGCGCAGCTGGCGCAGCTTCACCGCGGAGA

TCATGGATGCGGCCCTGGCCTCCGACATCGGCGCCATCGTCTTCCTCGGT

GCGATGCTGGCGGACGTACCGCACACCCGCCCCATCTCCATCTTCGCTTC

GAGCGAGAACGCGGCCGTCCGTGCGGAGCTCGGCATCGAACGCTCTTCGT

ACGAGGGGCCGGTCGGTATCCTGAGCGCGCTCGCCGAAGGGGCGGAGGAC

GTGGGCATTCCGACCATCTCCATCTGGGCGTCGGTTCCGCACTATGTCCA

CAATGCGCCCAGCCCGAAGGCGGTGCTCGCACTGATCGACAAGCTCGAAG

AGCTGGTGAATGTCACCATCCCGCGTGGCTCGCTGGTGGAGGAGGCCACG

GCCTGGGAAGCCGGGATCGACGCGCTGGCTCTGGACGACGACGAGATGGC

TACGTACATCCAGCAGCTGGAGCAGGCACGCGACACCGTGGACTCCCCTG

AGGCCAGCGGCGAGGCGATCGCCCAGGAGTTCGAGCGCTACCTCCGCCGC

CGCGACGGCCGCGCCGGCGATGACCCCCGCCGTGGCTGACGTCACCCCCT

CTCTGCGTCCGCCGTCCTCTGTTCCCCCCGCTCGGCCTCCCCTGAGGCCG

AGGAGTCGCGCCCACATGCCGGAAACTCCTCCTTTCCTGACTTTCTGGAG

A bacterial gene

April 21, 2023 10JC Setubal

Page 11: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

“Central Dogma” of molecular biology

• gene (DNA) messenger (RNA) protein (aminoacids)

transcription translation

Proteins are 3D objectsmade out of a linear sequence

of amino acids

April 21, 2023 11JC Setubal

Page 12: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

A protein

www.berkeley.edu/.../ images/ras-rid-protein.gif April 21, 2023 12JC Setubal

Page 13: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Molecular Plant-Microbe Interactions

Sugar cane pathogen

Rattoon-stunting disease

Monteiro-Vitorello et al 2004

April 21, 2023 13JC Setubal

Page 14: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Comparative genomics

• There are currently more than 300 completed sequenced microbial genomes publicly available

• Many are of closely related species• In a few years there will be thousands• Why compare?• How to do it?

April 21, 2023 14JC Setubal

Page 15: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Why comparative genomics?

• To understand the genomic basis of the present– Differences in lifestyle

• pathogen vs. nonpathogen • Obligate vs. free-living

– Host specificity• animals vs. plants, plant X vs. plant Y, etc

– In the case of pathogens: this understanding should help us in fighting disease

• To understand the past– How organisms evolved to be what they are

April 21, 2023 15JC Setubal

Page 16: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Citrus cankerXanthomonas

axonopodis pathovar citri

April 21, 2023 16JC Setubal

Page 17: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Black rot: Xanthomonas campestris pathovar campestris

April 21, 2023 17JC Setubal

Page 18: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

What is comparative genomics • Assuming input is the sequence and its annotation• There are many ways that genomes can be compared

– Different resolutions

• Whole genome– Genome alignments– Synteny (gene order conservation)– Anomalous regions

• Gene-centric– Gene families and unique genes– Gene clustering by function

• Gene sequence variations– Codon usage, SNPs, inDels, pseudogenes

April 21, 2023 18JC Setubal

Page 19: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Resolution

• Low resolution– Scope: entire genomes– Example event: rearrangement

• High resolution– Scope: nucleotide sequences– Example event: single mutation

April 21, 2023 JC Setubal 19

Page 20: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Genome-wide evolutionary events

• Replicon rearrangements• Gene/region duplication• Gene/region loss• Chromosome plasmid DNA exchange• Lateral transfer

April 21, 2023 20JC Setubal

Page 21: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Copyright ©2004 by the National Academy of Sciences

Boussau, Bastien et al. (2004) Proc. Natl. Acad. Sci. USA 101, 9722-9727

Fig. 4. Net gene loss or gain throughout the evolution of the {alpha}-proteobacterial species

April 21, 2023 21JC Setubal

Page 22: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Example of a “multipartite genome”

Agrobacterium tumefaciens C58

April 21, 2023 22JC Setubal

Page 23: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Replicon structure in all completely sequenced rhizobiaceae plus M. loti

c58 s4 k84 Retli Rleg Sm Ml

1 2.84 3.73 4.00 4.38 5.06 3.65 7.04

2 2.07 1.28 2.65 0.64 0.87 1.68 0.35

3 0.54 0.63 0.39 0.51 0.68 1.35 0.21

4 0.21 0.26 0.18 0.37 0.49

5 0.21 0.04 0.25 0.35

6 0.13 0.19 0.15

7 0.08 0.18 0.15

Numbers are replicon size in Mbp

replicon

genome

April 21, 2023 23JC Setubal

Page 24: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Whole replicon alignments: the pairwise case

If the sequences were identical we would see

B

AApril 21, 2023 24JC Setubal

Page 25: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

an inversion

A B C D

A

C B

D

April 21, 2023 25JC Setubal

Page 26: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

A B C D

A

C

D

B

Such inversions seem to happen around the origin or terminus of replication

April 21, 2023 26JC Setubal

Page 27: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Eisen JA, Heidelberg JF, White O, Salzberg SL. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 2000;1(6):RESEARCH0011

April 21, 2023 27JC Setubal

Page 28: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Replicon sequence comparisons

• Basic tool: MUMmer – Delcher AL, Phillippy A, Carlton J, Salzberg SL. Fast

algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002 Jun 1;30(11):2478-83.

– Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12

• http://mummer.sourceforge.net

April 21, 2023 28JC Setubal

Page 29: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

29

Page 30: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Xanthomonas axonopodis pv citri

E. coli K12 Promer alignment

Both are proteobacteria!Red: direct; green: reverse

April 21, 2023 30JC Setubal

Page 31: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Basics of MUMmer

• It finds Maximal Unique Matches• These are exact matches above a user-specified threshold

that are unique• Exact matches found are clustered and extended (using

dynamic programming)– Result is approximate matches

• Data structure for exact match finding: suffix tree– Difficult to build but very fast

• Nucmer and promer– Both very fast– O(n + #MUMs), n = genome lengths

April 21, 2023 31JC Setubal

Page 32: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

sample nucmer output (coords file)• /home/setubal/agro/comp/mummer/../../rhizogenes/v1/ctgs.fasta

/home/setubal/agro/comp/mummer/../../vitis/v3/all.fasta• NUCMER

• [S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [TAGS]• =====================================================================================• 73024 73193 | 242351 242181 | 170 171 | 93.60 | Contig789 Contig608• 220 6244 | 38759 32766 | 6025 5994 | 86.64 | Contig791 Contig604• 2798 6297 | 174039 177532 | 3500 3494 | 83.31 | Contig791 Contig606• 3828 6297 | 124183 126645 | 2470 2463 | 81.80 | Contig791 Contig606• 4767 5392 | 551684 551059 | 626 626 | 82.11 | Contig791 Contig607• 8214 8453 | 30747 30508 | 240 240 | 84.65 | Contig791 Contig604• 15408 15987 | 181050 181624 | 580 575 | 86.23 | Contig791 Contig606• 63864 74254 | 191954 181567 | 10391 10388 | 89.08 | Contig791 Contig604• 77203 79534 | 178882 176555 | 2332 2328 | 84.35 | Contig791 Contig604• 157451 158456 | 139804 140812 | 1006 1009 | 82.09 | Contig791 Contig606• 157483 157800 | 58429 58110 | 318 320 | 89.13 | Contig791 Contig604• 163575 166223 | 62781 60133 | 2649 2649 | 78.80 | Contig791 Contig605• 166754 168442 | 49403 47716 | 1689 1688 | 85.79 | Contig791 Contig604• 171247 173701 | 45005 42556 | 2455 2450 | 88.17 | Contig791 Contig604• 171261 172115 | 157617 158476 | 855 860 | 86.30 | Contig791 Contig606• 181828 184458 | 41748 39140 | 2631 2609 | 93.13 | Contig791 Contig604• 184829 185852 | 38838 37821 | 1024 1018 | 91.61 | Contig791 Contig604

April 21, 2023 32JC Setubal

Page 33: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

April 21, 2023 JC Setubal 33

A suffix tree for BANANAS

www.somethinkodd.com/.../2006/01/suffixtree.png

Page 34: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Proteome alignment done with LCS (top: Xcc; bottom: Xac )

Blue: BBHs that are in the LCS; dark blue: BBHs not in the LCS; red: Xac specifics; yellow: Xcc specifics

April 21, 2023 34JC Setubal

Page 35: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Whole replicon multiple alignment

• The program MAUVE• Darling AC, Mau B, Blattner FR, Perna NT.

Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004 Jul;14(7):1394-403.

April 21, 2023 35JC Setubal

Page 36: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

36

RSA 493

RSA 331

Dugway

Chromosome alignmentMAUVE

April 21, 2023 JC Setubal

Page 37: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

37

Genome Alignments MAUVE

April 21, 2023 JC Setubal

Page 38: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

How MAUVE works

• Seed-and-extend hashing• Seeds/anchors: Maximal Multiple Unique

Matches of minimum length k• Result: Local collinear blocks (LCBs)• O(G2n + Gn log Gn), G = # genomes, n =

average genome length

April 21, 2023 38JC Setubal

Page 39: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Alignment algorithm

1. Find Multi-MUMs2. Use the multi-MUMs to calculate a phylogenetic

guide tree3. Find LCBs (subset of multi-MUMs; filter out spurious

matches; requires minimum weight)4. Recursive anchoring to identify additional anchors

(extension of LCBs)5. Progressive alignment (CLUSTALW) using guide tree

April 21, 2023 JC Setubal 39

Page 40: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Gene-centric comparisons

• Homologs: genes that have the same ancestor; in general retain the same function

• Orthologs: homologs from different species (arise from speciation)

• Paralogs: homologs from the same species (arise from duplication) – Duplication before speciation (ancient duplication)

• Out-paralogs; may not have the same function

– Duplication after speciation (recent duplication)• In-paralogs; likely to have the same function

April 21, 2023 40JC Setubal

Page 41: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Orthologs

April 21, 2023 41JC Setubal

speciation

Page 42: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Out-paralogs

April 21, 2023 42JC Setubal

Page 43: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

April 21, 2023 JC Setubal 43

In-paralogs

Page 44: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

44

Published April 16, 2008

10 genomes

Orthology+

Phylogeny

Page 45: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

45

AG: ancestral (belli [2], canadensis) TG: typhus (prowasekii, typhi)TRG: transitional (akari, felis) SFG: spotted fever (rickettsii, conorii, sibirica)

Page 46: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

46

Page 47: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

How to find orthologs

• Desired features of ortholog clustering– Ability to distinguish between in- and out-paralogs

• In-paralogs should be clustered with their orthologs

– Ability to cluster genes that have the same domain architecture, rather than simply sharing just one domain

• Methods– Phylogenetic trees– BLAST– MCL– orthoMCL

April 21, 2023 47JC Setubal

Page 48: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

OrthoMCL

• Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003 Sep;13(9):2178-89

• Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002 Apr 1;30(7):1575-84

April 21, 2023 JC Setubal 48

Page 49: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

OrthoMCL

1. BLAST all-against-all2. weighting scheme 3. MCL algorithm• Nota bene: orthoMCL is not perfect!

– Two or more families may be wrongly joined– One family may be wrongly split

April 21, 2023 JC Setubal 49

Page 50: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Li Li et al. Genome Res. 2003; 13: 2178-2189

orthoMCL pipeline

Page 51: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Li Li et al. Genome Res. 2003; 13: 2178-2189

OrthoMCL weighting scheme for similarity graph

Page 52: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

52

(Tribe)MCL

• Enright, Van Dongen, Ouzonis [2002]• Adaptation of MCL clustering algorithm of Van Dongen• Markov cluster• Simulates random walks in the graph• Expands and inflates certain matrices until equilibrium is

reached• Expansion: matrix squaring• Inflation: make expanded matrix become stochastic• Has been reasonably validated

Page 53: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Gene Set Computations

• Given a set of genomes, represented by their ‘proteomes’ or sets of protein sequences

• Given homlogous relationships (as given for example by orthoMCL)– Which genes are shared by genomes X and Y?– Which genes are unique to genome Z?– Venn or extended Venn diagrams

April 21, 2023 53JC Setubal

Page 54: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

3-way genome comparison

April 21, 2023 JC Setubal 54

AB

C

Page 55: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Brucella gene set computations

April 21, 2023 JC Setubal 55

Page 56: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Joining synteny and homology

April 21, 2023 56JC Setubal

Page 57: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Ortholog setBuilder (orthoMCL)

Genome 1

Genome 2

Genome n

Script 1

HTML Tables

Script 2

OAK: ortholog alignment for prokaryotes

graph

report annotatorsApril 21, 2023 57JC Setubal

Page 58: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

R/G S4 C58 K84 R. etliR. leguminosarum

S. melilotiM. loti MAFF

M. loti BNC

12nd chromosome

linear chromosome

2nd chromosome

- - - - -

2 plasmid 630kb AT plasmid plasmid 390kbplasmid F 640kb

plasmid pRL12 870kb

plasmid pSymA

plasmid 1 plasmid 1

3 plasmid 259kb Ti plasmid plasmid 179kbplasmid E 510kb

plasmid pRL11 680kb

plasmid pSymB

plasmid 2 plasmid 2

4 plasmid 210kb - plasmid 44kbplasmid D 370kb

plasmid pRL10 490kb

- - plasmid 3

5 plasmid 130kb - -plasmid C 250kb

plasmid pRL9 350kb

- - -

6 plasmid 79kb - -plasmid A 190kb

plasmid pRL7 150kb

- - -

7 - - -plasmid B 180kb

plasmid pRL8 150kb

- - -

Replicon color key for HTML tables

April 21, 2023 58JC Setubal

Page 59: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

April 21, 2023 59JC Setubal

Page 60: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

April 21, 2023 60JC Setubal

Page 61: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

April 21, 2023 61JC Setubal

Page 62: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

What do the tables show

• conserved blocks (aka “microsyntenic regions”), and how these blocks appear in different replicons across the genomes compared

• some of these blocks are not operons (would need to show strand)

• possible block losses

April 21, 2023 62JC Setubal

Page 63: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Polymorphism detection

• inDels, SNPs• pseudogenes

April 21, 2023 63JC Setubal

Page 64: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

I

II

Figure 4.

Page 65: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

65

Pseudogenes

• Nonfunctional protein coding genes• Mutations introduce “sequence problems”

(frameshifts, stop in frame, absence of stop)• Natural mutation or sequencing error?

Page 66: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

66

Pseudogene cases

Page 67: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

67

• “Normal” bacterial genomes have 1-5% of pseudogenes [Liu et al]

• Pseudogenes can give interesting clues to evolutionary pathways

Why study pseudogenes?

Page 68: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

68

Why study pseudogenes? Cont’d

• High fractions of pseudogenes suggest a “genome degradation” process

• May be cause or effect of niche restriction• Examples

– Mycobacterium leprae: 36% (~1,100 genes)– Leifsonia xyli subsp. xyli: 13% (~300 genes)

• Pseudogenes do not show up in BLAST searches– Ortholog computations will in general not include them!

Page 69: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

69

BLASTN

Annotated Pseudogenes vs. Genome Sequences

Previously Known PseudogeneKnown Gene (Homologous to Pseudogene)Newly Identified Pseudogene

Pseudogene Identification by Sequence SimilarityStudy of 8 Brucella Genomes

Brucella Pseudogene Analysis

Identification of New Pseudogenes by Homology

0

100

200

300

400

500

600

Bab9941 BabS19 Bcan23365 Bmel16M Bab2308 Bovi25840 Bsui1330 Bsui23445

PG Count: Initial

Tot. A lignments

Know n Genes

PG Count: Final

Total Alignments 4120 0.98

Gene hits 2627 0.62

pseudogenes 1493 0.35

Page 70: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Genomics is just the beginning

Genomics/proteomics

Interactions between molecules

Cell processes

complexity

Whole organisms

April 21, 2023 70JC Setubal

populations

Page 71: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

21st century Biology: integration

April 21, 2023 JC Setubal 71

Page 72: Comparative bacterial genomics João Carlos Setubal VBI/Virginia Tech for EMBO course Florianopolis, July 2008

Acks

• Nalvo Almeida• Chris Lasher• Brett Tyler• Rebecca Wattam

April 21, 2023 JC Setubal 72