evolution of protein coding sequences
DESCRIPTION
Evolution of protein coding sequences. Single substitution. Multiple substitution. Coincidental substitution. C. A. T. G. A. A. A. A. C. C. 1 change, 1 difference. 2 changes, 1 difference. 2 change, 1 difference. Parallel substitution. Convergent substitution. Back substitution. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/1.jpg)
Evolution of protein coding sequences
![Page 2: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/2.jpg)
Kinds of nucleotide substitutionsGiven 2 nucleotide sequences, how their similarities and differences arose from a common ancestor? We assume A the common ancestor:
A
A
C
Single substitution
1 change, 1 difference
T
A
A
C
Multiple substitution
2 changes, 1 difference
A
C
G
Coincidental substitution
2 change, 1 difference
A
C
C
Parallel substitution
2 changes, no difference
A
T
T
C
Convergent substitution
3 changes, no difference
A
A
AC
Back substitution
2 changes, no difference
![Page 3: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/3.jpg)
Important properties inherent to
the standard genetic code
![Page 4: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/4.jpg)
Synonymous vs nonsynonymous substitutions• Nondegenerate sites: are codon position where mutations always result in amino acid substitutions.
(exp. TTT (Phenylalanyne, CTT (leucine), ATT (Isoleucine), and GTT (Valine)).
• Twofold degenerate sites: are codon positions where 2 different nucleotides result in the translation of the same aa, and the 2 others code for a different aa.
(exp. GAT and GAC code for Aspartic acid (asp, D),
whereas GAA and GAG both code for Glutamic acid (glu, E)).
• Threefold degenerate sites: are codon positions where changing 3 of the 4 nucleotides has no effect on the aa, while changing the fourth possible nucleotide results in a different aa.There is only 1 threefold degenerate site: the 3rd position of an isoleucine codon. ATT, ATC, or ATA all encode isoleucine, but ATG encodes methionine.
![Page 5: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/5.jpg)
Standard genetic code
• Three amino acids: Arginine, Leucine and Serine are encoded by 6 different codons:
• Five amino-acids are encoded by 4 codons which differ only in the third position. These sites are called “fourfold degenerate” sites
• Fourfold degenerate sites: are codon positions where changing a nucleotide in any of the 3 alternatives has no effect on the aa.
exp. GGT, GGC, GGA, GGG(Glycine);
CCT,CCC,CCA,CCG(Proline)
![Page 6: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/6.jpg)
Standard genetic code• Nine amino acids are encoded by a pair of codons which differ by a transition substitution at the third position. These sites are called “twofold degenerate” sites.
• Isoleucine is encoded by three codons(with a threefold degenerate site)
• Methionine and Triptophan are encoded by single codon
• Three stop codons: TAA, TAG and TGA
Transition:
A/G; C/T
![Page 7: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/7.jpg)
Evolution of protein coding sequences
• Some amino acid substitutions require more DNA substitutions than others
• Ile Thr : at least one DNA change• AUU ACU• AUC ACC• AUA ACA
• Ile Cys: at least two DNA changes• AUU (Ile) AGU (Ser) UGU (Cys)• AUU (Ile) UUU (Phe) UGU (Cys)
![Page 8: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/8.jpg)
SEQ.1 GAA GTT TTT
SEQ.2 GAC GTC GTA
Glu Val Phe
Asp Val Val
•Codon 1: GAA --> GAC ;1 nuc. diff., 1 nonsynonymous difference;
•Codon 2: GTT --> GTC ;1 nuc. diff., 1 synonymous difference;
•Codon 3: counting is less straightforward:
TTT(F:Phe)
GTT(V:Val) TTA(L:Leu)
GTA(V:Val)
1 2Path 1 : implies 1 non-synonymous and 1 synonymous substitutions;Path 2 : implies 2 non synonymous substitutions;
Example: 2 homologous sequences
![Page 9: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/9.jpg)
Evolution of protein coding sequences
• Redundancy of the genetic code• Biochemical properties of amino acids
• Under neutral evolution (no effect of selection) amino acids should replace each other with a probability determined by the number of DNA substitutions
![Page 10: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/10.jpg)
Evolution of protein coding sequences
• Some amino acid substitutions require more DNA substitutions than others
• Ile Thr : at least one DNA change• AUU ACU• AUC ACC• AUA ACA
• Ile Cys: at least two DNA changes• AUU (Ile) AGU (Ser) UGU (Cys)• AUU (Ile) UUU (Phe) UGU (Cys)
![Page 11: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/11.jpg)
Rates and patterns of nucleotide substitution
• Influenced by three things– Functional constraint (negative selection)– Positive selection– Mutation rate
![Page 12: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/12.jpg)
Rate of nucleotide substitution
• K = mean number of substitutions per site• T = time since divergence • rate = r = number of substitutions per site
per year• r = K/2T
Sequence 1 Sequence 2
Ancestralsequence
T T
![Page 13: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/13.jpg)
Genomes 2 edition 2002. T.A. Brown
Gene tree - Species tree
Species tree
A B C
Gene tree
A B C
•
•
Time Duplication
Duplication
Speciation
Speciation
A B C
![Page 14: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/14.jpg)
Allele A Allele BAncestral species
Human Gorilla
speciation
Common ancestor of sequencesT
ime
![Page 15: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/15.jpg)
Evolution of protein-coding sequences
• The Genetic Code is redundant• Some nucleotide changes do not change
the amino acid coded for– 3rd codon position often synonymous– 2nd position never – 1st position sometimes
![Page 16: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/16.jpg)
Standard Genetic CodePhe UUU Ser UCU Tyr UAU Cys UGU
UUC UCC UAC UGC
Leu UUA UCA ter UAA ter UGA
UUG UCG ter UAG Trp UGG
Leu CUU Pro CCU His CAU Arg CGU
CUC CCC CAC CGC
CUA CCA Gln CAA CGA
CUG CCG CAG CGG
Ile AUU Thr ACU Asn AAU Ser AGU
AUC ACC AAC AGC
AUA ACA Lys AAA Arg AGA
Met AUG ACG AAG AGG
Val GUU Ala GCU Asp GAU Gly GGU
GUC GCC GAC GGC
GUA GCA Glu GAA GGA
GUG GCG GAG GGG
![Page 17: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/17.jpg)
rates
• In general ...• Rates of nucleotide substitution are lowest
at nondegenerate sites (0.78 x 10-9 per site per year)
• Intermediate at two-fold degenerate sites (2.24 x 10-9)
• Highest at fourfold degenerate sites (3.71 x 10-9)
![Page 18: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/18.jpg)
Effect of amino acid substitutions
• Deleterious 86%• Neutral 14%• Advantgageous 0.0% ? (very low)
• In protein coding sequences, selection is often acting to remove changes
• Less common outcome is drift of neutral changes
• Rarely see positive selection for advantageous changes
![Page 19: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/19.jpg)
Functional Constraint
• Proteins often have some functional constraint
• The stronger the functional constraint, the slower the rate of evolution
![Page 20: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/20.jpg)
Haemoglobin
• Haeme pocket is highly constrained at protein seq. level
• Remainder of protein only constrained to be hydrophillic
![Page 21: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/21.jpg)
Histone 4
• Two copies in Histone octamer
• Forms complex with other histones and binds DNA into chromatin
• Almost the whole protein is highly constrained
![Page 22: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/22.jpg)
Fibrinopeptides
• Hardly any sequence constraint
![Page 23: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/23.jpg)
![Page 24: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/24.jpg)
Rates and Patterns
• Patterns of change can be informative of the function of a protein
• Different genes evolve at different rates• Amino acids that are always conserved
are likely to be critical to the function
![Page 25: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/25.jpg)
Biochemical properties
![Page 26: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/26.jpg)
![Page 27: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/27.jpg)
Histone 4
• Highly conserved protein
• Compare human and wheat H4 genes
• 55 DNA differences• 2 amino acid differences
– Val Ile (both aliphatic)– Lys Arg (both charged)
![Page 28: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/28.jpg)
Evolution of non-coding regions
• homologous sequences• e.g., compare introns of homologous
genes• 5’ UTR and 3’ UTR (untranslated region)• Pseudogenes
![Page 29: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/29.jpg)
![Page 30: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/30.jpg)
![Page 31: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/31.jpg)
Synonymous substitution rate variation
• Synonymous rates may differ between genes
• How come?
• Maybe different mutation rates in different parts of the genome
![Page 32: Evolution of protein coding sequences](https://reader036.vdocuments.us/reader036/viewer/2022062309/568134c0550346895d9be46b/html5/thumbnails/32.jpg)
Variation in the rates of synonymous substitutions: Secondary structure constraints
• Stems in secondary RNA structures are more constrained than loops.