on the evolution of the genetic codes, represented as...

22

Upload: others

Post on 12-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

On the evolution of the genetic codes, represented

as attractors 2-adic functions

Dr. Ekaterina Yurova Axelsson

Linnaeus University, Sweden

September 10, 2015

Page 2: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

P-adic numbers found numerous applications, e.g., cognitive models andpsychology, and genetics:

1. A. Khrennikov, Information dynamics in cognitive, psychological,social, and anomalous phenomena. Ser.: Fundamental Theories ofPhysics, Kluwer, Dordreht, 2004.

2. Khrennikov, A. Yu., 2006, P-adic information space and geneexpression. In: Integrative approaches to brain complexity, editorsGrant S., Heintz N., Noebels J., Wellcome Trust Publ., p.14.

3. B.Dragovich, A.Dragovich, A p-Adic Model of DNA Sequence andGenetic Code, p-Adic Numbers, Ultrametric Analysis andApplications, 1, N 1, 34-41 (2009). arXiv:q-bio/0607018v1

4. A. Khrennikov, Gene expression from polynomial dynamics in the2-adic information space, Chaos, Solitons, and Fractals, 42, 341-347(2009).

5. A. Khrennikov and S. Kozyrev, p-Adic numbers in bioinformatics:from genetic code to PAM-matrix; arXiv:0903.0137v3 (2009).

6. Dragovich, B.: p-Adic Structure of the Genetic Code.NeuroQuantology, Vol. 9, No. 4, 716727. (2011).arXiv:1202.2353v1.

7. A. Khrennikov, S. V. Kozyrev, Genetic code on the diadic plane,Physica A: Statistical Mechanics and its Applications, 381, 265-272(2007).

Page 3: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Outline

I Short introduction

I Proposed 2-adic model

I Some observations

Page 4: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Introduction

I Deoxyribonucleic acid (DNA) is a molecule that carries most ofthe genetic instructions used in the development, functioning andreproduction of all known living organisms and many viruses.

I Within cells, DNA is organized into long structures calledchromosomes. During cell division these chromosomes areduplicated in the process of DNA replication, providing each cell itsown complete set of chromosomes.

I Eukaryotic organisms (animals, plants, fungi, and protists) storemost of their DNA inside the cell nucleus and some of their DNA inorganelles, such as mitochondria or chloroplasts.

I In contrast, prokaryotes (bacteria and archaea) store their DNAonly in the cytoplasm. Within the chromosomes, chromatin proteinssuch as histones compact and organize DNA. These compactstructures guide the interactions between DNA and other proteins,helping control which parts of the DNA are transcribed.

Page 5: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Introduction

I Mitochondrial DNA (mtDNA) is the DNA located in organellescalled mitochondria, structures within eukaryotic cells that convertchemical energy from food into a form that cells can use, adenosinetriphosphate (ATP).

I Mitochondrial DNA is only a small portion of the DNA in aeukaryotic cell; most of the DNA can be found in the cell nucleus,and in plants, the chloroplast as well.

I Mitochondria are thought to have originated from incorporateα-purple bacteria. During its evolution into the present-daypowerhouses of the eukaryotic cell, the endosymbiont transferredmany of its essential genes to the nuclear chromosomes.Nevertheless, the mitochondrion still carries hallmarks of itsbacterial ancestor.

I Soon after mtDNA sequences became available, comparisons withmitochondrial protein sequences revealed deviations from thestandard genetic code and later even variations in codon usage werefound in mitochondria from dierent species.

Page 6: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Introduction

I The genetic code is the map g : K → A, |K | = 64, |A| = 21, whichgives the correspondence between codons in DNA and aminoacids.

I 4 nucleotides: C (Cytosine), A (Adenine), G (Guanine), T(Thymine). In Ribonucleic acid (polymeric molecule implicated invarious biological roles in coding, decoding, regulation, andexpression of genes) Thymine is replaced by U (Uracil).

I Codon is an ordered triple of nucleotides.

I 20 amino acids and 1 stopcodon (Ter): alanine (Ala), threonine(Thr), glycine (Gly), proline (Pro), serine (Ser), aspartic acid (Asp),asparagine (Asn), glutamic acid (Glu), glutamine (Gln), lysine (Lys),histidine (His), arginine (Arg), tryptophan (Trp), tyrosine (Tyr),phenylalanine (Phe), leucine (Leu), methionine (Met), isoleucine(Ile), valine (Val), cysteine (Cys).

Page 7: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Table for Standard Nuclear Genetic Code, 64 codons and 21

amino acids

Page 8: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

The origin of genetic code? The evolutionary history of

organisms? Taxonomy?

Page 9: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Preliminaries, P-adic approach

I For every nonzero integer n let ordp(n) be the highest power of pwhich divides n, i.e. n ≡ 0 (mod pordp(n)), n 6≡ 0 (mod pordp(n)+1)for any prime p ≥ 2. Then the p-adic norm is |n|p = p−ordp(n),|0|p = 0. For rationals n

m ∈ Q we set | nm |p = p−ordp(n)+ordp(m).

I The completion of Q with respect to the p-adic metricρp(x , y) = |x − y |p is called the eld of p-adic numbers Qp. Thenorm satises the strong triangle inequality |x ± y |p ≤ max |x |p; |y |pwhere equality holds if |x |p 6= |y |p.

I The set Zp = x ∈ Qp : |x |p ≤ 1 is called the set of p-adicintegers.

I Every x ∈ Zp can be expanded in canonical form, i.e. in aconvergent by p-adic norm series:

x = x0 + px1 + . . .+ pkxk + . . . , xk ∈ 0, 1, . . . , p − 1, k ≥ 0.

I Zp is equipped with the Haar measure µp normalized so thatµp(Zp) = 1.

Page 10: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Proposed model

I We consider a 2-adic dynamical system 〈Z2, µ2, f 〉 , f : Z2 → Z2.

I An attractor of 〈Z2, µ2, f 〉 is a subset A ⊆ Z2 such that:

1. A is invariant with respect to f , i.e. f (A) = A;2. There exists a set U ⊂ Z2, which shrinks to A under the action of

the function f , i.e. f (k)(U)→ A for k →∞;

I The representation of the nucleotids C ,A,T (U),G can be choosenin 24 variants. To obtain the function f in a compact way we setnucleotids as T (U)↔ (1, 0), C ↔ (1, 1), A↔ (0, 0), G ↔ (0, 1).

I Each codon is represented as a binary vector of the length 6, or ascorresponding 2-adic number. For example, CAG ↔ (1, 1, 0, 0, 0, 1).This vector denes the 2-adic number 1 + 2 + 25 = 35.

Page 11: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Proposed modelLet us choose the function f in the way that each its attractor (as a setof 2-adic integers) coincide with the set of codons which coding theamino acid.

For example, attractors of the function that denes Standard NuclearGenetic Code are:

Amino acid Attractor Amino acid AttractorAla 14, 46, 30, 62 Arg 8, 40, 27, 59, 11, 43Asn 16, 48 Asp 18, 50Cys 25, 57 Gln 3, 35Glu 2, 34 Gly 10, 42, 26, 58His 19, 51 Ile 4, 20, 52Leu 5, 37, 23, 55, 7, 39 Lys 0, 32Met 36 Phe 21, 53Pro 15, 47, 31, 63 Ser 13, 45, 24, 56, 29, 61Thr 12, 44, 28, 60 Trp 41Tyr 17, 49 Val 6, 38, 22, 54Stop 1, 33, 9

Page 12: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Variation of genetic codes1. The Standard Code2. The Vertebrate mtCode3. The Yeast mtCode4. The Mold, Protozoan, Coelenterate mtCode5. Mycoplasma, Spiroplasma Code6. The Invertebrate mtCode7. The Ciliate, Dasycladacean and Hexamita Nuclear Code8. The Echinoderm and Flatworm mtCode9. The Euplotid Nuclear Code10. The Bacterial, Archaeal and Plant Plastid Code11. The Alternative Yeast Nuclear Code12. The Ascidian mtCode13. The Alternative Flatworm mtCode14. Chlorophycean mtCode15. Trematode mtCode16. Scenedesmus obliquus mtCode17. Thraustochytrium mtCode18. Pterobranchia mtCode19. Candidate Division SR1 and Gracilibacteria Code20. Blepharisma Nuclear Code

Page 13: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Example of representationsI We represented 20 known genetic codes (National Center for

Biotechnology Information) by the attractors of 2-adic functionusing van der Put and coordinate form.

I The function that denes Vertebrate mitochondrial code has thefollowing van der Put representation: Fm(x) =

∑63

k=0Mkχk(x).I The function Fm can be represented in the explicit form depending

on the values of binary digits in the canonical representation of the2-adic numbers in the following way:

Fm(x0 + 2x1 + 22x2 + 23x3 + 24x4 + 25x5) = Ω0 − Ω1 − Ω2,

where

Ω0 =x0 + 2x1 + 4x2 + 8x3 + 16x4 + 32x5,

Ω1 =(x3 + x1x2x3)(32x4 − 16)x5

Ω2 =x0x1x2x3(16− 32x4)x5+

x0x1x2x3(23− 44x4)x5+

x0x1x2 (23x3 − 18)x4x5+

x0(−7x1x2 + 18x1x2)x3x4x5.

Page 14: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

"Universal" function

All considered variations of the genetic code can be obtained using"operations" on the cycles of some "Universal" function (6 variants).

For example, the "Universal" function F can be dened by the followigcycles (attractors):

0, 32 8, 40 16, 481, 33 9, 41 17, 492, 34 10, 42, 26, 58 18, 503, 35 11, 43, 27, 59 19, 514, 36 12, 44, 28, 60 20, 525, 37 13, 45, 29, 61 21, 53

6, 38, 22, 54 14, 46, 30, 62 24, 567, 39, 23, 55 15, 47, 31, 63 25, 57

Page 15: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

"Universal" function

I Analytically, considered function F has the following form

F (x) = F (x0 + 2x1 + 22x2 + 23x3 + 24x4 + 25x5) =

= x + 32(−1)x5 + 16x5(−1)x4 I (x1 + x2 + x3 ≥ 2), (0.1)

where I (x1 + x2 + x3 ≥ 2) = 1 as soon as x1 + x2 + x3 ≥ 2 issatised, otherwise I = 0.

I In other words, I is a characteristic function of the eventx1 + x2 + x3 ≥ 2.

Page 16: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

"Universal" function

I the "universal" function F consists of 8 cycles of the length 4 and16 cycles of the length 2;

I "Universal" function 6= Genetic code!

Page 17: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

"Universal" function, "Operations"

1. Let a(b), where a is the length of the cycle, b is some element fromthe cycle, be this cycle of the "Universal" function F .

2. For example, 7, 39, 23, 55 we write as 4(7).

3. We need 3 types of "operations" on such cycles and 1"iteration" (for Alternative Yeast nuclear code, Chlorophycean,Scenedesmus obliqnus, Thrastochytrium, Pretobranchia) in order todene any of 20 genetic codes.

Page 18: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

"Operations"

I "Addition": let a1(b1) and a2(b2) be the cycles of the function F .Let us consider new cycle a1(b1)⊕ a2(b2) = a1 + a2(b1). Forexample,

4(7) = 7, 39, 23, 55and

4(12) = 12, 44, 28, 60,then we get 8(7) = 7, 39, 23, 55, 12, 44, 28, 60, which correspondsto amino acid Threonine (Thr) in the Yeast mt code.

I "Division": let 2(b1) = b1, b2 and 2(c1) = c1, c2. Then2(b1) ∨ 2(c1) = b1, c1, c2 ∪ b2.

I "Cleavage": for some codes we need to split the cycle of the length2 into 2 cycles of the length 1 each. For example,∆2(9) = ∆9, 41 = 9 ∪ 41.

Page 19: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Proposed model

NUCLEAR CODE DNA

PROCARYOTA EUKARYOTA

Bacterial, Archaeal, PlantPlastid2(5)+4(7)2(8)+4(11)2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 362(1) ∨ 2(9) = 1, 33, 9 + 41

Mycoplasma, Spiloplasma

2(5)+4(7)2(8)+4(11)2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 36

Candidate Division, GraciliBacteria2(5)+4(7)2(8)+4(11)2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 362(9) ∨ 4(10) = 10, 42, 58, 26, 9 + 41

Standart nuclear code2(5)+4(7)2(8)+4(11)2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 362(1) ∨ 2(9) = 1, 33, 9 + 41

Ciliate, Desycladacean, Hexamita

2(5)+4(7)2(8)+4(11)2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 362(1)+2(3)∆2(9) = 9 + 41

Euploid2(5)+4(7)2(8)+4(11)2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 362(9) ∨ 2(25) = 9, 25, 57 + 41Blepharisma2(5)+4(7)2(8)+4(11)2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 36∆2(9) = 9 + 412(1) ∨ 2(3) = 3, 35, 33 + 119 + 11 = 1, 9

Alternative Yeast nuclear code

2(5) ∨ 4(7) == 5, 37, 7, 55, 23 + 39

2(8)+4(11)2(24)+4(13)2(1) ∨ 2(9) = 1, 33, 9 + 411(39) + [2(24) + 4(13)]

1

Page 20: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Proposed model

mt CODE DNA

Chlorophycean2(5)+4(7)2(8)+4(11)2(24)+4(13)

2(4) ∨ 2(20) = 4, 20, 52 + 36∆2(9) = 9 + 412(1) ∨ [2(5) + 4(7)] == 1 + 5, 37, 33, 7, 55, 39, 23

1(1) + 1(9)

Scenedesmus obliqnus2(5)+4(7)2(8)+4(11)2(24) ∨ 4(13) == 24, 56, 61, 45, 29 + 13

2(4) ∨ 2(20) = 4, 20, 52 + 36∆2(9) = 9 + 412(1) ∨ [2(5) + 4(7)] == 5, 37, 33, 7, 55, 39, 23 + 1

1(13) + 1(1) + 1(9)

Thrastochytrium2(5) ∨ 4(7) = 37, 7, 39, 55, 23 + 52(8)+4(11)2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 362(1) ∨ 2(9) = 1, 33, 9 + 411, 33, 9 + 5 = 1, 33, 9, 5

Mold, ProtozeanCoelenterate2(5)+4(7)2(8)+4(11)2(24)+4(13)2(4) ∨ 2(20) =4, 20, 52 + 36

Echinoderm, Flatworm2(5)+4(7)2(8)+2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 362(0) ∨ 2(16) = 0, 48, 16 + 32

Alternative Flatworm2(5)+4(7)2(8)+2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 362(0) + 2(16) = 0, 48, 16 + 322(1) ∨ 2(17) = 1, 49, 17 + 33

Trematode2(5)+4(7)2(8)+2(24)+4(13)2(0) ∨ 2(16) = 0, 48, 16 + 32

Invertibrate2(5)+4(7)2(8)+2(24)+4(13)

Pretobranchia2(5)+4(7)2(24)+4(13)2(4) ∨ 2(20) = 4, 20, 52 + 362(0) ∨ 2(8) = 0, 32, 40 + 81(8) + [2(24) + 4(13)]

Yast4(7)+4(12)2(8)+4(11)2(24)+4(13)

Ascidian2(5)+4(7)2(8)+4(10)2(24)+4(13)

Vertibrate2(5)+4(7)2(1)+2(8)2(24)+4(13)

1

Page 21: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Proposed model, Observations

I Presented approach can be seen as a contribution to the discussionsabout evolutionary systematics and evolutionary origins of thegenetic code.

I Classication (relationships) of the organisms based on the structureand the method of producing their genetic code from the "universal"function?

I Dierence of the genetic codes between (groups of) species that arelocated at the same branch of the phylogenetic (evolutionary) tree?

I Operation of "Cleavage" ∆ appears in the genetic codes oforganisms that perform photosynthesis.

I Flatworm mtCode vs. Alternative Flatworm mtCode - "shift":2(5) + 4(7), 2(8) + 2(24) + 4(13),2(4)∨ 2(20) = 4, 20, 52+ 36, 2(0) + 2(16) = 0, 48, 16+ 32

2(1) ∨ 2(17) = 1, 49, 17+ 33.

Page 22: On the evolution of the genetic codes, represented as ...p-adics2015.matf.bg.ac.rs/slides/yurova2.pdf · Introduction I Deoxyribonucleic acid (DNA) is a molecule that carries most

Paper

E. Yurova Axelsson, On the representation of the genetic code bythe attractors of 2-adic function, Physica Scripta, IOP Publishing,

September 2015