genomics: 1 genomics-sequencing of microbial genomes this lecture illustrates the strategies used in...

76
Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome content and organisation amongst microbes, and shows how to derive information on gene function across genome. Objectives for students: Expected to describe strategies involved in microbial genome sequencing and functional genomics Provide examples of information that can be derived from genomics

Post on 20-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 1

Genomics-sequencing of microbial genomes

This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome content and organisation amongst microbes, and shows how to derive information on gene function across genome.

Objectives for students:• Expected to describe strategies involved in microbial genome

sequencing and functional genomics• Provide examples of information that can be derived from

genomics

Page 2: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 2

Microbial Genome Sequencing• Genome Sequencing Projects

– strategy & methods– annotation

• Comparative genomics– organisation– gene content

• Functional genomics– transcriptome– proteome– genome-wide mutation

• Concentrate on strategy & ideas

Page 3: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 3

Bacterial genome projects• Many completed:

– Haemophilus influenzae– Escherichia coli– Bacillus subtilis– Mycoplasma genitalium– Helicobacter pylori (x2)– Campylobacter jejuni– Treponema pallidum– Neisseria menigitidis– Neisseria gonnorhoea– Vibrio cholerae– E. coli O157

• Good link to projects:– http://www.tigr.org/– http://www.ncbi.nlm.nih.gov/– http://www.sanger.ac.uk/– http://www.genomesonline.org/

Genome sequencing progress• Complete:

– Archaeal: 70 (2007&2008: 49&55)– Bacterial: 945 (554&728)– (Eukaryal: 121) (76&97)

• Ongoing: – Prokaryotic: 3498 Archaeal: 111– (Eukaryotic: 1223)

• Metagenome projects: 200

Page 4: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 4www.genomesonline.org

Page 5: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 5

Microbial eukaryote projects• Complete

– Yeast -Saccharomyces cerevisiae– Plasmodium falciparum– Aspergillus nidulans, A.niger, A.oryzae & A.fumigatus– Trypanosoma cruzi & brucei– Leishmania– Entamoeba histolytica– Giardia lamblia– Candida albicans & glabrata– Paramecium

• Underway– Pneumocystis carinii– Plasmodium vivax– some complete chromosomes finished– Other species and isolates from completed list

Page 6: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 6

Why bother? -To sequence or not to sequence(considerations in the pre-genome era)

• piecemeal collection of sequenced genes– slow– costly– ever complete?

• genome project– rational approach– efficient and rapid– quality assurance– address novel questions

• problems/issues– ownership– strain choice– cost– approach– data release– some now less relevant

• Post genomic era– Comparative genomics– Functional genomics

Page 7: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 7

Genome sequencing strategy• Strategy choice• large collaborative cosmid/BAC-based projects

– now better suited for larger genomes– slow

• small insert shotgun approach– centralised– rapid and efficient– choice for bacteria

• Strain choice– fresh isolate vs lab strain– clinical vs environmental– subsequent genetic analysis

Page 8: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 8

Yeast genome sequence strategy• Yeast chromosomes (16) individually sequenced• several approaches used• Make genome library in cosmids• order cosmid library

– which cosmid overlaps with which– link cosmid to genome map– produced tiled set of cosmids– only sequence minimum number

• Use chromosome specific probe to identify chr-specific cosmids• sequence cosmid inserts by subcloning• Solve problems by direct PCR sequencing, walking and other libraries

(lambda)• Telomeres

Page 9: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 9

Tiled set

Page 10: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 10

c1A B

c2C D

c3E F

c4G H

c5I J

 c1 c2 c3 c4 c5

A          

B          

C          

D          

E          

F          

G          

H          

I          

J          

OrderingClones

Page 11: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 11

PH011

200100

80 100 120 140 160 180

70512

70449

70893

70515

70124

70266 7202

70265

70871

70463

Page 12: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 12

Whole genome/chromosome shot-gun strategy (WGS)

• Rapid• Generation of small insert genomic library• Library is not initially ordered• DNA sequence ends of inserts• Depends on powerful computing to

assemble sequence reads

Page 13: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 13

Main steps in generating a complete genome sequence

Isolation

Construction

Shotgun sequencing

Finishing

Annotation

Minimum time period (weeks)

2

4-6

2-4

12

12

Page 14: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 14

bacterial chromosome

vectorplasmid

random shearing

size selection

libraryof

clones

sequenceend of

each clone

individual clones

Page 15: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 15

Assembly

Sequencing individual clones

genome sequence with gaps

Page 16: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 16

Automated sequencers: ABI 3700

• Made by Applied Biosystems

• Most widely used automated sequencers:– 96 capillaries– robot loading from

384-well plates• Two to three hours per

run• 600–700 bases per run

96–well plate

robotic arm and syringe

96 glass capillaries

load bar

Page 17: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 17

Automated sequencers: MegaBACE• Made by Amersham• 96 capillaries• Robotic loading from

384–well plate• Two to four hours per

run• Can read up to 800

basesSource : GE Healthcare Life Science, Uppsala, Sweden

Page 18: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 18

Automatic gel reading• Top image: confocal

detection by the MegaBACE sequencer of fluorescently labeled DNA

• Bottom image: computer image of sequence read by automated sequencer

Page 19: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 19

Industrialization of sequencing• Most genome

sequencing projects divide tasks among different teams– Genome libraries– Production sequencing– Finishing

• Sequencing machines run 24/7

• Many tasks performed by robots

The Broad Institute of MIT and Harvard, www.genome.gov

Page 20: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 20

The future is here?..454 sequencing

Reprinted by permission from Macmillan Publishers Ltd: [NATURE] (Margulies et al., 437: 376 copyright (2005)

Page 21: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

454 sequencing: the system

Genomics: 21

DNA Library Preparation emPCR Sequencing

4.5 hours 8 hours 7.5 hours

• Well diameter: average 44μm• 400,000 reads obtained in parallel• A single cloned amplified sstDNA

bead is deposited per well

• 4 bases (TACG) cycled 100 times• Chemiluminescent signal generation• Signal processing to determine base

sequence and quality score

Source :454 Sequencing © Roche Diagnostics

Page 22: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 22

WGS: Just how much effort?• individual sequencing reads accumulate

– each read about 500bp– computing used to assemble reads– contiguous sequences called contigs

• Aim for 8-10 read coverage of genome for accuracy

• example:– H.influenzae

• 19,687 templates• 24,304 reads assembled• 11,631,485 bp

• 9

Page 23: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 23

Sequencing a genome

vgecisahubofevaluatedgeneticsrelatedresourcesforteachershealthprofessionalsandgeneralpublic

contiguous sequence

luatedgeneticsrel

tatedgene

ourcesforteachcisahubofevaluatedgenc

hprofessionalsandgeneralpub hprofessionalsandgeneralpub

cisahubofevaluatedgen

esforteachershealt

cisahubofevaluatedgenc chershealthprofession

luatedgeneticsrel

esforteachershealt

atedgene

ourcesforteach

chershealthprofession

atedgene

fragments of sequence luatedgeneticsrel ourcesforteachchershealthprofession

vgecisahubof bofevaluatedgenetics

icsrelatedresourcesforteachershealth lthprofessionalsandgeneralp

generalpublicoverlaps

Page 24: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 24

Gaps

Physical Gap

Sequence Gap

Genome

Library cloneSequence read

contig

Page 25: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 25

Bridging Gaps

• rise in contig number as amount of reads increases• steady fall as accumulating sequence bridges gaps between contigs• levels off as new reads more likely in known contig than gap• start finishing

Number of reads

Num

ber

of c

onti

gs

1

rapid gap bridging

difficult gap bridging

Finishing

Page 26: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 26

Finishing• Why are gaps present?• Gap bridging

– sequence gaps• sequence gaps –choose appropriate clone and walk

– physical gaps• alternative libraries (which?)• PCR across gap

• Mistakes/poor sequence– areas where sequence reads are less than 8-10– repeated sequences -rRNA

• closure and completion

Page 27: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 27

Finished Yet?atgaatccaagccaaatacttgaaaatttaaaaaaagaattaagtgaaaacgaatacgaaaactatttatcaaatttaaaattcaacgaaaaacaaagcaaagcagatcttttagtttttaatgctccaaatgaactcatggctaaattcatacaaacaaaatacggcaaaaaaatcgcgcatttttatgaagtgcaaagcggaaataaagccatcataaatatacaagcacaaagtgctaaacaaagcaacaaaagcacaaaaatcgacatagctcatataaaagcacaaagcacgattttaaatccttcttttacttttgaaagttttgttgtaggggattctaacaaatacgcttatggagcatgtaaagccatagcacataaagacaaacttggaaaactttataatccaatctttgtttatggacctacaggacttggaaaaacacatttacttcaagcagttggaaatgcaagcttagaaatgggaaaaaaagttatttacgctaccagtgaaaatttcatcaacgattttacttcaaatttaaaaaatggttctttagataaatttcatgaaaagtatagaaactgcgatgttttacttatagatgatgtacagtttttaggaaaaaccgataaaattcaagaagaatttttctttatatttaatgaaatcaaaaataacgatggacaaatcatcatgacttcagacaatccacccaacatgctaaaaggtataaccgaacgcttaaaaagtcgttttgcacatgggatcatagctgatataactccacctcaactagatacaaaaatagccatcataagaaaaaaatgtgaatttaacgatatcaatctttctaatgatattataaactatatcgctacttctttaggggataatataagagaaatcgaaggtatcatcataagtttaaatgcttatgcaaccatactaggacaagaaatcacactcgaacttgccaaaagtgtgatgaaagatcatatcaaagaaaagaaagaaaatatcactatagatgacattttatctttggtatgtaaagaatttaacatcaaaccaagcgatgtgaaatccaataaaaaaactcaaaatatagtcacagcaagacgcattgtgatttacctagctagggcacttacggctttgactatgccacaacttgcgaattattttgaaatgaaagatcatacagctatttcacataatgttaaaaaaatcacagaaatgatagaaaatgatgcttctttaaaagcaaaaatcgaagaacttaaaaacaaaattcttgttaaaagtcaaagttaagtgaaaggatgtgaaaaataaattctagagtgtgaaaaaaagaaattaagcaaagtatgataaaatacaaatttgattattttgctttgaaaaatttcacaatttcaacaagcttattattacaacgaatttaaaattaaaataaaccaaggagaaaaaatgaagttaagtatcaataaaaatactttagaatctgcagtgattttatgtaatgcttatgtagaaaaaaaagactcaagcaccattacttctcatcttttttttcatgctgatgaagataaacttcttattaaagctagtgattatgaaataggtatcaactataaaataaaaaaaatccgcgtagaatcaagtggttttgctactgcaaatgcaaaaagtattgcagatgttattaaaagcttaaacaatgaagaagttgttttagaaaccattgataattttttatttgtaagacaaaaaagtacaaaatacaaacttcctatgtttaatcatgaagattttccaaattttccaaatacagaaggaaaaaaccaatttgacattgattcaagtgatttaagccgttctcttaaaaagatattaccaagtattgatacaaataacccaaaatactccttaaatggtgcatttttagatataaaaacagataaaattaacttcgtaggaactgatacaaaacgccttgcaatctatactttagaaaaagcaaataatcaagaatttagttttagtatccctaaaaaagctattatggaaatgcaaaaacttttctatgaaaaaatagaaattttttatgatcaaaatatgcttattgccaaaaatgaaaattttgaattctttacaaaacttatcaatgataaatttccagattatgaaaaagttataccaaaaactttcaaacaagaactcagtttttcaactgaagattttatagatagtcttaaaaaaatcagcgttgtaactgaaaaaatgagacttcattttaacaaagataaaatcatctttgaaggtataagtttagacaatatggaagcaaaaacagaacttgaaattcaaacaggagtaagtgaagaatttaatcttactataaaaatcaaacatttacttgatttcttaacttctatagaagaagaaaaattcactttaagtgtaaatgaacctaattcagcatttatagtcaaatcccaaggactatcaatgattatcatgcctatgattttgtaataaaacaagtaaaagataaaggaaaaatatgcaagaaaattacggtgcgagtaatattaaagtcctaaaaggcttagaagctgttagaaaacgcccaggtatgtatataggagatacaaacataggcggacttcatcatatgatttatgaagttgtggataattctatcgatgaagctatggcaggacattgcgatactatagatgtagaaatcactactgaaggaagctgtatagttagtgataatggtcgtggtattcctgttgatatgcacccaactgaaaatatgccaactttaactgttgttttaactgtcctacatgcagggggaaaattcgataaagatacttataaagtttcaggcggtttgcacggtgttggggtttcggttgtaaatgcactctctaaaaaacttgtagctacagttgaaagaaatggagaaatttatcgtcaagaattttcagaaggtaaagttatcagtgaatttggtgtgataggaaaaagtaaaaaaacaggaacaactatagaattttggcctgatgatcaaatttttgaagtgactgaatttgattatgaaattttggctaaaagatttcgtgaacttgcatacttaaatccaaaaatcactataaattttaaagataaccgcgtaggcaaacatgaaagttttcactttgaaggtggaatttctcagtttgttacagacttaaataaaaaagaagctttaactaaagcaattttctttagtgtagatgaagaagatgtgaatgttgaagtagctttgctttacaatgatacttatagtgaaaatttactctcttttgtaaataatattaaaaccccagatggtggaacacacgaagctggttttagaatgggtttaactcgtgtgataagtaactatatagaagcaaatgcaagtgctagagaaaaggataataaaatcacgggtgatgatgtgcgtgaaggtttgatcgctattgtgagtgtaaaggtacctgaaccacaatttgaaggacaaaccaaaggaaaacttggttcaacttatgtgcgtcctatagtttcaaaagcaagttttgagtatttgactaaatattttgaagaaaatcctatcgaagctaaagctataatgaataaagctttaatggcagctagaggaagagaagcagcgaaaaaagctagagaattaacgcgcaaaaaagaaagtttaagcgtaggaactttaccagggaaattagctgattgtcaaagtaaagatccaagtgaaagtgaaatttatcttgtggaaggggattctgcaggaggttctgcaaaacaaggtagagaaagatctttccaagctatactgcctttgcgtggtaaaattttaaatgttgaaaaagcaagactagataaaattttaaaatctgagcaaattcaaaatatgattaccgcttttggctgtggtataggtgaagattttgatctttcaaaacttagatatcataaaatcatcatcatgacagatgcggatgttgatggatctcatatacaaaccttgcttttaactttcttcttccgttttatgaatgaacttgtggcaaatggacatatttatctagcacaaccacctttatatctttataaaaaagctaaaaagcaaatttatttaaaagatgaaaaagctttgagcgaatacctgatagaaacgggaatagaaggtttaaactatgaaggtataggaatgaatgatttaaaagattatttaaaaatcgttgcagcttatcgtgcgattttaaaagatcttgaaaagcgttttaatgtgatttctgtgatacgctatatgatagaaaattcaaatttagttaaaggaaataatgaagaattatttagtgtaatcaaacaatttttagaaacacaaggacacaatatcttaaatcattatatcaacgaaaatgaaattcgagctttcgttcaaactcaaaatggcttagaagaacttgtgatcaatgaagaacttttcactcatccactatatgaagaagcgagttatatttttgataagattaaagatagaagcttggaatttgataaagatattttagaagttcttgaagatgttgaaaccaatgctaaaaaaggtgctactatacaacgctataaaggtttaggggaaatgaatcctgagcaactttgggaaaccacaatggatccaagcgtaagaagacttttaaaaatcactattgaagatgcacaaagtgcaaatgatacctttaatctctttatgggtgatgaggttgaaccaagacgcgattatatccaagcgcacgctaaagatgtaaagcatttggatgtgtaaaaatttatcattgaagaaatcatttcttcaatgagttttgttttgtaagagtatagctagaggaattcttcttcttgtatcgtatttttctccataatatttttcaagataatttaaaattttttcttcatcttcaggttctatttcccaaagtccttcactatcttgcatccatcttatagctgctaaccaagcttttctacttgcatgcatattggtaatgagattggatccatgacaagctaaacaatttgcttccactaaaggtgaatcaggatcgataatcaatcctgtatcagggttaatttcaagattttgagcccaacttgcacttaaaaacaatgctaagatcaatataatttttttcatacttaaactccataaacattaactctatggcatgcattattgatatatcctcctggattccactgtgctaaaaccataggttgactgttaccttgactatcgatagctcttgcccaaatttcataatatccttttgttggtattgatatttgagcactccatttttgccatgctaatctatttaatggtttttctacctttgc ………………….

Page 28: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 28

Sequencing a genome

VGEC is a hub of evaluated genetics related resources for teachers, health professionals and general public.

annotation

vgecisahubofevaluatedgeneticsrelatedresourcesforteachershealthprofessionalsandgeneralpublic

contiguous sequence

luatedgeneticsrel

tatedgene

ourcesforteachcisahubofevaluatedgenc

hprofessionalsandgeneralpub hprofessionalsandgeneralpub

cisahubofevaluatedgen

esforteachershealt

cisahubofevaluatedgenc chershealthprofession

luatedgeneticsrel

esforteachershealt

atedgene

ourcesforteach

chershealthprofession

atedgene

fragments of sequence luatedgeneticsrel ourcesforteachchershealthprofession

vgecisahubof bofevaluatedgenetics

icsrelatedresourcesforteachershealth lthprofessionalsandgeneralp

generalpublicoverlaps

Page 29: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 29

Genome Annotation• Find ORFs

– look for ATG-Stop (+alternatives) – over certain size– overlaps– computer based (“Glimmer” & “Orpheus”) and

trained eye.• ORF function

– Search databases with predicted translated sequences –BLASTX

– Consider level of similarity and context– Domain comparisons

• Pfam/Prosite

• Other features

Page 30: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 30

www.yeastgenome.org

Page 31: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 31

http://mips.gsf.de/genre/proj/yeast/index.jsp

http://www.yeastgenome.org/MAP/GENOMICVIEW/GenomicView.shtml

Page 32: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 32

Artemis: sequence viewer and annotation tool from the Sanger Centre (http://www.sanger.ac.uk/Software/Artemis/)

Page 33: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 33

Page 34: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 34

Page 35: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 35

http://xbase.bham.ac.uk/

xBASE is a database for comparative genome analysis of all bacterial genome sequences

Chaudhuri RR, Pallen MJ. xBASE, a collection of online databases for bacterial comparative genomics. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D335-7.

Page 36: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 36

Coordinator

DNA

Shotgun sequences

Finishinginstructions

Shotgun templates

Annotation tasks

Finishing sequences

Bioinformatics Lab

Annotations

SS

S

S

SS

S

SS

S

S

S

S

S S SS

S S

S

S

S

S

S

S

SS S

S

SS

SS

Working draft sequence

Finished sequence

Finished annotated sequence

A conceptual diagram of the flux and information in a network-based genome-sequencing project

Page 37: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 37

Post Genome Sequence• Comparative genomics

– comparing genome organisation and content– genome size– genome repeats/Tn/phages– gene content– minimal gene content

• Functional genomics –ascribing gene function across a genome– gene function –knowns– phenotype prediction– gene function –unknowns– investigating function

• Bacteria-Yeast

Page 38: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 38

Bacteria: Does size matter?• Link genome size to adaptive capability

– biosynthetic capability• synthesis of nutrients

– Stress resistance• resist environmental insults

– structural complexity• surface structures, sporogenesis

– Regulation –sensing signals and transcriptional responses• detect change or requirement and respond

appropriately• transcriptional regulation

Page 39: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 39

Not just Size but how you use it….. • Small genomes

– Mycoplasma genitalium• 580,070 bp• smallest genome for self-replicating organism• free living but only just..infects host cells (guess which!)• few biosynthesis and regulatory systems• has replication & transcription & translation, metabolism etc

functions– Borrelia burgdorferi

• 910,725 bp• Lyme disease• few cellular biosynthetic systems

– Mycoplasma pneumoniae (0.8 Mbp); Chlamydia trachomatis (1.0 Mbp);

Page 40: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 40

bigger genomes• Haemophilus influenzae

– 1.830 Mbp– colonises human respiratory tract– limited environment

• Helicobacter pylori– 1.667 Mbp– colonises human stomach– limited environment

• Campylobacter jejuni– 1.641 Mbp– colonises intestine– limited environment

Page 41: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 41

and bigger….• Escherichia coli (K-12)

– 4.639 Mbp• Bacillus subtilis

– 4.214 Mbp– soil/plant organism– secondary metabolites

• Pseudomonas aeruginosa – incomplete (5.9 Mbp)

• Yersinia pestis (4.4 Mbp)• Clostridium spp (4-5 Mbp)• Mycobacterium tuberculosis

– 4.411 Mbp– slow growing (double in 24h)– large proportion of genome on lipid metabolism

• Streptomyces coelicolor (~8 Mbp)– secondary metabolites –antibiotics!

Page 42: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 42

Organisation• Linear chromosomes

– Borrelia burgdorferi– Streptomyces coelicolor

• Multiple chromosomes– Vibrio cholerae

• Plasmids– Borrelia burgdorferi– 17 linear & circular plasmids– 50% genome size– plasmid replication, “decaying genes”, ?Ag variation

• Transposons, IS elements, phages– found in most genomes– Campylobacter has none

• Repeats

Page 43: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 43

Replication• Origin (oriC) and termination (terC) of replication

– OriC often near dnaA gene (replication initiation protein)

– In Borrelia burgdorferi (linear) oriC (& dnaA) in centre

• strand bias– which strand is each gene on?– transcription in same direction as replication –more

efficient– variation in level of strand bias

• Mt 55% vs Bs 75%

Page 44: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 44

Gene Content• Annotation

– sequence similarity• gene families• regulators, transport, biosynthesis

– domain matches• trans-membrane domains, DNA binding

• Paralogues and Orthologues– Paralogues:

• Members of same family (homologous) in same genome.• Likely to have different exact function

– Orthologues:• homologues (same family) in different genomes• May have identical function

Page 45: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Vibrio cholerae as predicted by genome........

Reprinted by permission from Macmillan Publishers Ltd: [NATURE]( Heidelberg et al, 406 ,477-483), copyright (2000) Genomics: 45

Page 46: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 46

Gene content (cont.)• ORFans

– significant proportion of genome contains ORFs of unknown function

– some may be orthologues of unknowns in other organisms

– some unique to organism• important for biology of organism

– examples:• H.influenzae: 42%• H.pylori: 33%• E.coli: 38%• M.tuberculosis: 60% to 16%

– number decreasing• Gene size –most about 1kb

Page 47: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 47

Genomic rearrangements

• Example comparison

• Comparison of:S.e Typhi CT18 withS.e Typhi Ty2

• inversion that spans terminus

http://www.sanger.ac.uk/resources/software/act/

Page 48: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 48

Variation by gain and/or loss• Core regions

– shared by closely related species• Additional “flexible” gene pool

– variable regions– acquired from mobile genetic

elements• First described as pathogenicity

islands– in non-pathogens too– wider role

• Genomic Islands– pathogens– commensals– symbionts– environmental

• Gain of GI sometimes assoc with gene loss– reduction in obligate intracellular

pathogens• Genome organisation as well as

genome content correlates with microbial lifestyle

Genome reduction by deletion events

Gene acquisition by HGT

Mutations rearrangements

Common bacterial ancestor

Intracellular bacterium, obliagate intracellular

pathogen, endosymbiont

Extracellular bacterium, facultative pathogen,

symbiont

All lifestyles

GEIPlasmid

Page 49: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 49

Other tRNA-associated elements: tRNAPProL

Black arrows=Sal+Ec; white arrows=Sal or Ec; grey=strain/serovar specificGC is for S. Typhi

Infection and Immunity, May 2002, p. 2351-2360, Vol. 70, No. 5

Page 50: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 50

Other tRNA-associated elements: tRNAArgU

Infection and Immunity, May 2002, p. 2351-2360, Vol. 70, No. 5

Page 51: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

The supragenome• The distributed-genome hypothesis (DGH)• Bacteria have a (supra) genome much larger than

the genome of any single bacterium.• Core and non-core gene sets

– Example: Hiller et al. sequenced 8 strains of Streptococcus pneumoniae + 9 already available

– Core set of genes in all strains– 20-30% genes non-core (not present in all strains)

• Genetic recombination generates diversity across strains.

• Also for Haemophilus influenzae (Hogg et al.)– ~1400 in core set and ~1300 non-core in subset of strains

Genomics: 51

• Hiller et al. Journal of Bacteriology, November 2007, p. 8186-8195, Vol. 189, No. 22

• Hogg et al. Genome Biology 2007, 8:R103 (doi:10.1186/gb-2007-8-6-r103).

Page 52: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 52

Yeast• 16 chromosomes totalling 12.068Mbp• 5885 orfs –6275 but 390 unlikely translated• Few introns ~4%• Avg gene size 2kb (worm ~6kb and human >30kb)• GC vary along chr length

– low GC at telomere & centromere– GC rich correlate with higher recombination

• Tn and remnants in genome– evidence of hotspots

• 50% orfs known function – some exact role unclear

• http://genome-www.stanford.edu/Saccharomyces/• http://mips.gsf.de/projects/fungi

Page 53: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 53

Functional genomics

• Functional genomics –ascribing gene function across a genome

• function and inter-relationships• strategy

• [bioinformatic analysis -gene identification]• Transcriptome -expression pattern– Proteome -expression pattern– Mutantome -mutant phenotype– Interactome –protein-protein interactions

GENOME

TRANSCRITOME RNA

Copies of the active protein-coding genes

PROTEOMEThe cell’s repertoire

Page 54: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 54

Arrays: micro and chip• Microarrays

– Glass slides with <10000 individual samples applied in known position

– Use of robotics– Samples can be PCR products or oligos– example: oligo/PCR product complementary to each

ORF• Chip arrays

– silicon based– >10,000 sequences– http://www.affymetrix.com/index.html

• Redundancy• fluorescent labels

Page 55: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 55

One cell= one specific sequence

AC

GT

AT

AC

GT

AT

AC

GT

AT

TG

CA

TA

TG

CA

TA

TG

CA

TA

LaserChipArrays

Individual sequences &bound sample

Page 56: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 56

Transcriptome• Genome-wide determination of expression

level of each ORF• when expressed relates to role• also assess mutants• compare expression of each ORF in

different conditions• Genome wide expression maps• global patterns of expression

Page 57: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 57

AGGCAT AATGAA When expressed?

mRNAs

2 x ORF

Bacillus genieae

AATGAA

AGGCAT

orf 1 orf 2

orf 2orf 1

grow in conditions when only orf 2

expressed

isolate mRNAs and make cDNA

copy

AATGAATTACTT

TTACTT

Page 58: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 58

extractmRNA

Grow underdifferent

conditions

Probe array with labelled copy of mRNA

Page 59: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 59

Differentially labelled probes

Red channel

Green channel

Combined

Page 60: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 60

http://www.bio.davidson.edu/courses/genomics/chip/chip.html

Page 61: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 61

Expression profiling C. jejuni in low iron

Cj1659 (P19)

Cj0177

Cj0037c

Page 62: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 62

Proteome• Genome-wide determination of protein expression• Gives information stimulons• protein expression linked to function• assess mutants (regulatory mutants affect several

proteins)

• Grow bacteria under defined conditions• Extract proteins• 2D-gel electrophoresis• Protein spot identification • Mass Spectrometry• peptide size predictions from Genome data

Page 63: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 63

Defining the Campylobacter proteome –chasing spots

Which protein? Which conditions?

Which other proteins are co-expressed?

Page 64: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 64

C. jejuni iron example

Page 65: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 65

digest with

protease

pIM

ol m

ass

Mass Spec

* * ***

http://depts.washington.edu/yeastrc/pages/ms.html

Page 66: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 66

Mass Mutagenesis: mutantome• Mutate every ORF in genome

– organism specific technology

• High throughput analysis of phenotype– need to analyse many 1000s of mutants under many

conditions

• Signature-tagged technology– enables analysis of mutant pools– requires array technology for genome-wide projects

• Association on ORF with mutant phenotypes• Regulators might be pleiotropic

Page 67: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 67

Arrays: micro and chip• Microarrays

– Glass slides with <10000 individual samples applied in known position

– Use of robotics– Samples can be PCR products or oligos– example: oligos complementary to each unique Tag– example: oligo/PCR product complementary to each

ORF• Chip arrays

– silicon based– >10,000 sequences– http://www.affymetrix.com/index.html

• Redundancy• fluorescent labels

Page 68: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 68

One cell= one specific sequence

AC

GT

AT

AC

GT

AT

AC

GT

AT

TG

CA

TA

TG

CA

TA

TG

CA

TA

LaserChipArrays

Individual sequences &bound sample

Page 69: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 69

Signature Tagged

• Tags are short unique DNA sequences

• Tag linked to mutation

• Each individual mutant has unique tag

• Each mutant ORF has unique Tag

ORF X

Chromosomal Mutants

Page 70: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 70

ORF X

Chromosomal Mutants Mutant Pools

compare

condition ‘normal’

functional role ?

Page 71: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 71

Bar coding genes

mutant 2

mutant 3

mutant 4

and so on…to mutant 1654.

mutant 1mutant-

specific DNA sequence

“normal, un-mutatedCampylobacter

Page 72: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 72

Which bar codes are missing?

• Which bar coded mutants are missing?

• Gene involved in process

mutant pool

post-treatmentmutant pool

copies of barcodes present

1 2 3 4……… 9 10

1121

91 100

+ + + + + ++++++

++ + +

++ - - --

- +-

-

Bar code Array

+ + +

www.freedigitalphotos.net/

Page 73: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Reprinted by permission from Macmillan Publishers Ltd: [NATURE REVIEWS GENETICS] (Mazurkiewicz et al. 7 929-939), copyright (2006) Genomics: 73

Page 74: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

InteractomeYeast 2 hybrid

Genomics: 74

http://en.wikipedia.org/wiki/Two-hybrid_screening

Which proteins can interact?

•Expression library of binding-domain::protein 1 (bait)

•Expression library of activation-domain::protein 2 (prey)

•Test combinations of all genome orfs

•Which combinations turn on the reporter gene?

Page 75: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Protein-protein interaction networks

Genomics: 75

Parrish et al. 2007. A proteome-wide protein interaction map for Campylobacter jejuni. Genome Biol 8:R130.

Page 76: Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome

Genomics: 76

Genomotyping or Genomic indexing

11 12 13 14

6 7 8 9

1 2 3 4

15

10

5

11 12 13 14

6 7 8 9

1 2 3 4

15

10

5

11 12 13 14

6 7 8 9

1 2 3 4

15

10

5

11 12 13 14

6 7 8 9

1 2 3 4

15

10

5

• Array of all known genes in microbe• Genes 1, 2, 3 &14 forms minimal gene set• Hybridise array with labelled chromosomal DNA

1

2

3

146

5

9

8

11

45

15

Isolate 1 Isolate 2 Isolate 3