exploration of neighbourhoods for inductive reasoningadanchin/lectures/bologna_05_summary.pdf ·...

52
© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centre http://www.pasteur.fr/recherche/unites/REG/ [email protected] Exploration of neighbourhoods for inductive reasoning

Upload: vutuong

Post on 03-Feb-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Exploration ofneighbourhoods forinductive reasoning

Page 2: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Authors

in silicoEduardo RochaIvan MoszerClaudine Médigue

in vivo / in vitroAgnieszka SekowskaAnne Marie GillesOctavian Barzu

CollectiveStanislas Noria

Page 3: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

a virtuous circle

ContextDataHypotheses => Today’s presentation

A Chinese view for ….

vision_en.html

Page 4: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Anglo-American

NATO

Bottom Up

Data-driven

Greco-Latin

OTAN

Top Down

Hypothesis-driven

Chinese

« Bombardment of the Chinese Embassy in Belgrade »

Sideways

Context-driven

causeries/Western.html

Page 5: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Data vs Hypotheses

What biologicalquestion are you

asking?

Page 6: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Variation / Selection / Amplification

Evolution creates

Function recruits

Structure coding process

Sequence

Empedocle / Maupertuis / Malthus /Darwin

vision_en.html

Page 7: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

What is Life?

Physics: matter, energy, time

Biology: Physics + information,coding, control...

Page 8: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

What functions for Life?An extension of Cuvier’s

view….Physical stability ([cyto]skeleton)ReproductionRespirationLocomotionPerceptionTransport (import / export)Circulation (internal fluxes)Digestion and recyclingAssimilationAccommodation (regulation)Maintenance (repair)Etc…

causeries/causeries.html

Page 9: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Three processes are needed for Life:

Information transfer (Living Turing Machines)

Driving force for a coupling between the genome structure andthe structure of the cell:

Metabolism (Internal organisation)Compartmentalization (General structure)

Because of these two processes, note that “concentration”usually does not have a meaning inside a cell

What is Life?

Page 10: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Inductive strategy:exploring

“neighborhoods”Genes do not operate inisolationProteins are part ofcomplexes, as are partsin an engine

It is important tounderstand theirrelationships, as those inthe planks which make aboat

The Delphic Boat: Harvard University

Press, february 2003

Page 11: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Induction: exploringneighborhoods

To make discoveries we explore the general« neighborhoods » of genes of interest: proximity in thechromosome, in evolution, in the literature, in biochemicalcomplexes, in metabolism etc.

Comparative genomics is essential, hence the use of« subtractive » genomics (comparison of pairs or largersets of similar genomes)

Page 12: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

From sequence tofunction

Combining genome sequence data and insilico prediction (bioinformatics) we test ourhypotheses using large scale genomicstechniques (transcriptome and proteomeanalysis) as well as other types ofneighborhoods, such as common electriccharge or codon usage bias.

↓ Note that regulation evolves much fasterthan all other processes

Page 13: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Genome organisationIs the gene order random in the chromosomes?

At first sight, despite different DNA managementprocesses not much is conserved, and horizontallytransferred genes are distributed throughoutgenomes

However, groups of genes, such as operons orpathogenicity islands tend to cluster in specificplaces, and they code for proteins with commonfunctions

A question: where are located repeats?

Page 14: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Caveat:Repeats are meaningful

Remember also:

This clock has aminute minutehand

Page 15: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

.

0

0

1000

2000

3000

4000

1000 2000 3000 4000

Escher i ch i a

c o l i

1000

0 200

2000

600 1000 1400 1800

Haemophi lus

in f l uenzae00

500

1000

1500

0 500 1000 1500

Methanococcus

j an n asch i i 100

200

300

400

0

0 100 200 300 400 500

Mycoplasma

gen i ta l i um

500

0200

400

600

800

0 200 400 600 800

Mycoplasma

pneumoniae

0500

1000

1500

0 500 1000 1500

Hel icobacter

py l o r i

0 1000 2000 3000 4000

0

B a c i l l u s

s u b t i l i s

1000

2000

3000

4000

0 500 1000 1500

0

Methanobacterium

thermoautotrophicum

500

1000

1500

NR = 397

NT = 283

NR = 170

NT = 54

NR = 204

NT = 111

NR = 139

NT = 82

NR = 260

NT = 187

NR = 552

NT = 250

NR = 183

NT = 75

NR = 280

NT = 137

DNA management:Repeats in genomes

E. Rocha, A. Viari & A. Danchin Analysis of long repeats in bacterial genomes reveals alternative evolutionary mechanisms in Bacillus subtilis andother competent prokaryotes. Mol. Biol. Evol. (1999) 16: 1219-1230

Page 16: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Genome organisation

The genome organisation is so rigidthat the overall result of selectionpressure on DNA is visible in thegenome text, which differentiates theleading strand from the laggingstrand

Page 17: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

E. Rocha

Page 18: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

180

90

0

270

55% leading

Escherichia coli

Ori

Ter

90270

65% leading

Treponema pallidum

Ori

Ter

180

9027075% leading

Bacillus subtilis

Ori

Ter

90270

87% leading

Thermoanaerobactertengcongensis

Ori

Ter

CDS densityLeading CDS density

(updated from Kunst etal , Nature, 97)

Different “OperatingSystems”?

Page 19: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

To lead or to lag...

Is it possible to see whether the position ofgenes in the chromosome is randomlydistributed on the leading and lagging strand?

Chosing arbitrarily an origin of replication and aproperty of the strand (base composition, codoncomposition, codon usage, amino acidcomposition of the coded protein…) one canuse discriminant analysis to see whether thehypothesis holds.

Page 20: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Chosing arbitrarily anorigin of replicationand a property of thestrand (basecomposition, codoncomposition, codonusage, amino acidcomposition of thecoded protein…) onecan use discriminantanalysis to seewhether thehypothesis holds.

To lag or to lead...

E. Rocha, A. Danchin & A. Viari Universal replication biases in bacteria. Mol. Microbiol. (1999) 32: 11-16

Page 21: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

To lag or to lead, that is thequestion

.

0,450,5

0,550,6

0,650,7

0,750,8

0,85

0 20 40 60 80 100

Bacillussubtilis

accu

racy

Borreliaburgdorferi

0,4

0,5

0,6

0,7

0,8

0,91

0 20 40 60 80 100 0,4

0,5

0,6

0,7

0,8

0,91

Chlamydiatrachomatis

0 20 40 60 80 100

0,45

0,5

0,55

0,6

0,65

0,7

0,75

0 20 40 60 80 100

Escherichiacoli

accu

racy

0,45

0,5

0,55

0,6

0,65

0,7

0,75

0 20 40 60 80 100

Heamophilusinfluenzae

0 20 40 60 80 100

HelicobacterPylori

0,4

0,45

0,5

0,55

0,6

0,65

0,7

0,40,45

0,50,55

0,60,65

0,70,75

0,8

0 20 40 60 80 100

Methanobacteriumthermoautotrophicum

position (%) position (%) position (%)

accu

racy

0,45

0,5

0,55

0,6

0,65

0,7

0,75

0 20 40 60 80 100

Mycobacteriumtuberculosis

0,4

0,5

0,6

0,7

0,8

0,9

1

0 20 40 60 80 100

Treponemapallidum

Bases

Amino acids

Codons

Dinucleotides

Page 22: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Visible even in proteins…

Page 23: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Replication transcriptionconflicts

Transcription may proceed opposite tothe movement of the replication forkmovementThis will abort transcription, leading totruncated mRNAIf translated truncated mRNA may leadto truncated proteins, this will becomenegative dominant if in complexes…

E.P.C. Rocha & A. Danchin Essentiality, not expressiveness, drives gene-strand bias in bacteria. Nature Genetics (2003) 34 : 377-378

Page 24: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Essentiality in B. subtilis

highly

expressed

0%

25%

50%

75%

100%

non-highly

expressed

Essential genes

highly

expressed

non-highly

expressed

Non-essential genes

Lagging

Leading

Page 25: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

When polymerases collide (II)

DNAPdeceleration

End oftranscription

Arrest of RNAP & DNAP

Transcriptionabortion

Co-oriented Head-onConsequences:1. Replication slow-down

2. Loss of transcripts

Consequences:1. Aborted transcripts

2. Truncated essentialproteins

E. Rocha

Page 26: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Distribution of highly expressed genes

Highly expressed genescluster near the origin infast-growing bacteria

Origin

Terminus

Middle

Ori

Ter

10%

20%

30%

40%

50%

60%

70%

0%

C. c

resc

entu

s

M. t

uber

culo

sis

E. c

oli

B. su

btili

s

Fast growers | Slow growersFast growers | Slow growers

E. Rocha

Page 27: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Gene vicinity: synteny

C. Médigue

Page 28: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Multivariate Analyses

In contrast to standard genetics, genomics analyses large collectionsof genes and gene products.

Multivariate analyses try to extract information by simplifying thenumber of relevant descriptors in the objects of interest.

Principal Component Analysis uses the centered average and a simpledistance (identity); it is the reference method.

Correspondence Analysis belongs to the same family, but it uses the χ2 measure as a distance. This allows the user not only to work withhighly heterogeneous objects but also to work simultaneously on thespace of objects and on the space of descriptors.

Independent Component Analysis uses the non gaussian character ofthe values associated to descriptors

Page 29: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Neighborhoo:distribution ofaminoacids inthe proteome

G. Pascal

Bias in amino acid distribution

Page 30: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Universal biases in proteinamino acid composition

First axis: separates Integral Inner MembraneProteins (IIMP) from the rest; driven by oppositionbetween charged and large hydrophobic residues

Second axis: separates proteins according toan opposition driven by the G+C content of the firstcodon base

Third axis: separates proteins by their contentin aromatic amino acids; enriched in orphanproteins

Page 31: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Temperature-dependentbiases in protein amino

acid composition

The amino acid composition of proteins dependsheavily on the phylogeny => need to compareorganisms related to each other

The general trend of amino acid compositionbias is to avoid some aminoacids at highertemperatures

Mesophilic bacteria belong to at least twodifferent classes (in a 5-clusters analysis)

Biases are always dominated by the IIMPclustering

Page 32: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Codon usage biases

20 amino acids 61 codonsStudy of the genes in the codon space,using Correspondence Analysis (χ2

measure)At least three classes of genes,including one corresponding tohorizontal transfer

C. Médigue, T. Rouxel, P. Vigier, A. Hénaut & A. Danchin. Evidence for horizontal gene transfer in Escherichia coli speciation.J. Mol. Biol. (1991) 222 pp. 851-856

Page 33: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Gene exchangeGenes expressed at a

high level

under exponential

growth conditions

Horizontally

exchanged genes

Core metabolism

of the cell

Class I: core metabolism

Class II: high expression inexponential growth

Class III: horizontal transfer

Page 34: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Codon usage, organisation andevolution of the B. subtilis genome

(Moszer, 98)

Correspondence analysis

Classification

Highly expressedAtypical / HGTOthers

Page 35: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

The cell organizers

It is too early to understand theselection pressures that organize thecell architecture. However, at least inbacteria, the role of gasses andchemical highly reactive radicals playprobably a major role. Most of thecorresponding genes are stillunknown….

Page 36: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Sulfur undergoes oxido-reduction reactions from -2 to +6Incorporation of sulfur into metabolism usually requiresreduction to the gaseous form H2SH2S is highly reactive, in particular towards dioxygen=> These two gasses, despite their diffusion properties,must be kept separate as much as possibleSulfur scavenging is energy-costly=> Sulfur containing molecules have to be recycled

Selection pressure fororganisation: Oxido-

reduction

A. Sekowska, H-F. Kung & A. Danchin Sulfur metabolism in Escherichia coli and related bacteria, facts and fiction.J. Mol. Microbiol. Biotechnol. (2000) 2: 145-177

Page 37: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Sulfur metabolism: anunexpected organiser of the

cell ’s architecture• Sulfur metabolism-related proteins are more acidic(average pI 6.5) than bulk proteins (richer in asp and glu),they are poor in serine residues

• They are significantly poor in sulfur-containing amino-acids

• Their genes are very poor in codons ATA, AGA and TCA

• There are no class III (horizontal transfer) genes in theclass (only 2 in 150 genes)

• => sulfur-metabolism genes are ancestral and may for acore structure for the E. coli genome

Page 38: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Proximity in the chromosomeSulphur islands

E.P.C. Rocha, A. Sekowska & A. Danchin Sulfur islands in the Escherichia coli genome: markers of the cell's architecture?FEBS Lett. (2000) 476: 8-11

Page 39: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

The error catastrophe

Similarity in sequence leads to functionalinference

Because of recruitment of pre-existing structures,there is often no obvious link between a structureand a function (the book-paperweight

Hence a propagation of annotation errors ykrS annotated as « translation factor » is a

component of sulfur metabolism! A Sekowska, V Dénervaud, H Ashida, K Michoud, D Haas, A Yokota, A Danchin Bacterial variations on the methioninesalvage pathway BMC Microbiol (2004) 4: 9

Page 40: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

A new metabolicpathway

A. Sekowska

Page 41: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Just so story: proximity inthe genome

cmk (mssA) rpsAEscherichia coli

cmk ypfDBacillus subtilis no rpsA !!!

cmk rpsA

cmk rpsA

Haemophilus influenzae

Sinorhizobium meliloti

Page 42: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

The pyrimidine diphosphateparadox

OMP UMP UDPUDP UTP CTP

In order to make deoxyribonucleotides the cell uses

ribonucleosides diphosphates, not triphosphates

NDP dNDP dNTPNDR NDK

no CDP !!!

And here is the paradox:

Page 43: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

How is the paradoxresolved?

OMP UMP UDPUDP UTP CTP

mRNA

DNA

CMP

CDPdCDP

RNases Cmk

PNPase

Page 44: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Phylogenetic neighbors:the S1 box

• rpsA codes for ribosomal protein S1. It contains the S1 box (PROSITE PS50126). Many other proteins contain a similar box: polynucleotide phosphorylase, RNases E, G and R, RNAhelicases etc.• protein RegB of bacteriophage T4, associated to S1, cuts mRNA at GAGG motifs.• S1 is a subunit of bacteriophage Qβ replicase…

=> All this points to a function for S1 in RNA metabolism

Page 45: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Codonusagebias

neighbors

Gene Comment

blacatdicBlppompA

long mRNA turnover

pyrF pyrimidine metabolism

hflBftsHmrsACFlpp

cell architecture

nusApcnBmetYpnprnarnbrncrne/amsrngrph

RNA maturation and turnover

trxA oxido-reduction, subunit of T7replicase, needed for synthesisof deoxyribonucleotides

Page 46: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Protein complexes:the Degradosome

PNPasePolyA polymeraseRNAse ES1

Polyphosphate kinase

Enolase

mRNA degradation

CDP for de novo DNA synthesis

GDP recycling of GTP for carbohydrate secretion

GDP + PEP GTPNDK +PYK

Page 47: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Just so story: the cmk rpsAoperon

cmk (mssA) rpsAEscherichia coli

Conclusion:The function of the cmk rpsA operon is to make CDPfor DNA synthesis

mssA was discovered as a suppressor ofsmbA (pyrH), itself a suppressorof MukB, amyosin-like protein involved in chromosomesegregation=> DNA synthesis is involved in the function.

Page 48: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Selection pressure forcompartmentalization: adangerous intermediate

OMP UMP UDPUDP UTP CTP

dUDP

dUTPDNA dUMP + PPi

dTMP

dTDP

dTTP

DNA

Uridylate kinase (UMK)pyrH (smbA) No CDP: no DNA…

S. Noria & A. Danchin Just so genome stories : what does my neighbor tell me? International Congress Series 1246 Elsevier Science (2002) 3-13

Page 49: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

In conclusion:

UMK must becompartmentalized

S. Landais, P. Gounon, C. Laurent-Winter, J.C. Mazié, A. Danchin, O. Barzu& H. Sakamoto Immunochemical analysisof UMP kinase from Escherichia coli. J.Bacteriol. (1999) 181: 833-840

Page 50: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

A prediction: ribosomerecycling and UTP

pyr H frr Escherichia coli

pyr H frr Bacillus subtilis

pyr H frr Photorhabdus luminescens

This organisation is conserved in most Gram+ and Gram-bacteria. Why ?

Page 51: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Ribosome recycling andUTP

frr codes for the ribosome recycling factor, that allows 70Sribosomes to split into 30S and 50S subunits. In polycistronicoperons, the 70S ribosome can go on from one gene to thenext one without recycling (this requires formylation of the firstmethionine). At the end of the message, the ribosomes mustrecycle. This happens in a context where transcripts makestem and loops, ending with a polyU sequence.

Conjecture: is UTP controlling the activity of Frr? Rememberthat one cannot speak of « concentrations » of molecules in acell. 1 micromolar would mean 600 molecules. There are20,000 ribosomes, therefore 1 mM means only 30 individualmolecules in the immediate vicinity of each ribosome...

Page 52: Exploration of neighbourhoods for inductive reasoningadanchin/lectures/Bologna_05_summary.pdf · Exploration of neighbourhoods for inductive reasoning © Genetics of Bacterial Genomes

© Genetics of Bacterial Genomes Institut Pasteur / HKU-Pasteur Research Centrehttp://www.pasteur.fr/recherche/unites/REG/ [email protected]

Transcriptiontermination

UUUUUUUUUU

At Rho-independentsites for termination oftranscription themessenger RNA endswith rows of U. Thismust lower the localavailability of UTP….

This suggests Frr as a drug target, with analogs of UTP as leads...