How do Replication and Transcription Change Genomes?
Andrey Grigoriev
Director, Center for Computational and Integrative Biology Rutgers University
2
What are we going to do?
• Observe effects of fundamental processes• Estimate their relative contribution• Link them to genome features
• Analyze nucleotide composition
How do Replication and Transcription Change Genomes?
Well, do they?
4
Replication and Transcription
• textbook view
faithful reproduction machinery
• basis for selection
parental DNA fitness advantages
5
Replication and Transcription
• paradox
both systematically change genomes
which they faithfully reproduce
• and they leave traces
6
What is in the sequence?
• The usual – coding, regulatory regions, exons, introns,
RNAs, etc.
• Biases in nucleotide composition– Traces of organism‘s „lifestyle“– Links to genome features
7
Counting nucleotides: GC Skew
sw = ([G]-[C])/([G]+[C])
• Short sequence interval (window) w• Relative excess of G vs C [-1;1]
• Plot vs % of genome position [0;100]
8
0 20 40 60 80 100
0 20 40 60 80 100
position, % genome length
Simian virus 40
Haemophilis influenzae
9
Cumulative Skew Diagrams
sw = ([G]-[C])/([G]+[C])
S = W sw w/L
For W adjacent windows of size w << L
S is an integral of skew function
10
0 20 40 60 80 100
0 20 40 60 80 100
position, % genome length
Simian virus 40
replication origin (ori)
replication terminus (ter)
11
0 20 40 60 80 100
0 20 40 60 80 100
position, % genome length
Haemophilis influenzae
replication origin (ori)
replication terminus (ter)
12
Genome of Escherichia coli
position, % genome length
0 20 40 60 80 100
Terminus
Origin
13
Genome of Bacillus subtilis
0 20 40 60 80 100
position, % genome length
14
Genome of Borellia burgdorferi
position, % genome length
0 20 40 60 80 100
15
Cumulative Skew Diagrams
• Now widely used to predict ori and ter in novel and less studied microbial genomes
• Predictions confirmed experimentally
• Constant skews over half-genomes• oriter G>C terori G<C• Strand properties change at ori and ter
16
Causes: Selection vs. Mutation
• Properties of encoded proteins• Regulatory sequences
• Most pronounced in 3rd codon position• Suggests mutation, not selection pressure
17
DNA single-stranded, not protected
continuous DNA synthesis
discontinuous DNA synthesismRNA synthesis
template DNA
Transcription Replication
18
Most Consistent Explanation
• spontaneous deamination of C or 5-MetC
– by far the most frequent mutation (rates raise over 100-fold when DNA is single-stranded)
– fixing the mutated base during the next round of replication
– depletion of cytosines vs guanines
19
Cytosine Deamination
Cytosine
Uracil
Thymine
20
Replication
• Leading strand exposed in replication bubble, generation after generation
• Unusual replication models consistent with the single-strand hypothesis– adenovirus– mitochondria
21
0 20 40 60 80 100
Series1Poly. (Series1)
position, % genome length
Adenovirus Replicationorigins
22
Replication or Transcription
• Leading-lagging switch at ori and ter• Consistent with replication models
• Transcription often colinear with replication• Direction often changes at ori and ter
23
0 20 40 60 80 100
position, % genome length
Replication vs. Transcription
HPV-16
24
Replication vs. Transcription
• Comparable contribution to skew
• [G]=900, [C]=690 in the same direction
additive effect on skew• [G]=758, [C]=773 in the opposite direction
cancel each other out
25
Genome of Bacillus subtilis
0 20 40 60 80 100
position, % genome length
26
Diagrams „jagged“
• Sequence constraints – amino acid composition, regulatory sequences,
etc.
• Sequence inversions – swaps strands and change the skew to its
opposite between the borders of the inversion
• Horizontal transfer between species
27
5‘ 3‘
A B C D A C B D
3‘5‘
Inversion
28
Rearrangements in two sequenced strains of Helicobacter pylori
Colored areas under the curve correspond to inversions and translocations
cagPAI – pathogenicity island (likely horizontal transfer)
29
Conclusions
• Analyze nucleotide composition• Observe effects of fundamental processes• Link them to genome features• Estimate their relative contribution
• Start asking own questions