1 genome evolution dan graur 2 topics: genome size genome content gene geography nucleotide...

Post on 16-Jan-2016

223 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Genome EvolutionGenome Evolution

Dan GraurDan Graur

2

Topics:Topics:

Genome SizeGenome SizeGenome ContentGenome ContentGene GeographyGene GeographyNucleotide CompositionNucleotide Composition

3

The entire complement The entire complement of genetic material of genetic material carried by an carried by an individual is called theindividual is called the

ggenomeenome

4

Genome

Genic Non-genic

ad hoc

ad hoc

5

Transcribed UntranscribedTranscribedUntranscribed

Genome

Genic Non-genic

Transcriptome

6

TranslatedUntranslated

Transcribed UntranscribedTranscribedUntranscribed

Genome

Genic Non-genic

Proteome

7

Genome Size:Genome Size:The Anthropocentric ViewThe Anthropocentric View

8

atggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacgatggactggatgccccaggaaaaggaaagaggtataaccataaccgttgcaacgaccgcatgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttc

Page 1 Page 1 out of out of

1,500,0001,500,000total = ~3.5 billion bptotal = ~3.5 billion bpNo index… No index…

No annotation… No annotation… No explanation… No explanation… Only a, c, t, & g, Only a, c, t, & g, ad nauseamad nauseam… …

9

1-2 trillion cells1-2 trillion cells

2 2 23 = 46 chromosomes 23 = 46 chromosomes

The human haploid genome is 3.5 The human haploid genome is 3.5 10 1099 bp. bp.

10

On the average, a single human chromosome consists of 5 cm of DNA.A human cell contains 2-3 meters of DNA.The total length of DNA in one adult human is 2.0 × 1013 meters.

1400 nm

700 nm300 nm

Condensedchromosome

Condensed chromatin

Extended chromatin

Nucleosomes

DNA double helix

Packed nucleosomes

2 nm

Histone11 nm

DNA wound arounda cluster of histonemolecules

30 nm

Scaffoldingprotein

30 nmfiber

The marvels of The marvels of packagingpackaging

12

The total length of DNA in an adult human is 2.0 × 1013 meters (the equivalent of 70 trips from Earth to the Sun and back, or 5 trips from the Sun to Neptune and back).

With the right With the right public relationspublic relations you can make you can make the genome look the genome look bibigg……

13

3.5 billion letters in a four-letter alphabet

1 CD

<

Information content

… … or or smallsmall..

1 CD

14

Human chromosome 22

48,000,000 bp

December 1999

15

Does the human genome size Does the human genome size reflect the fact that we are the reflect the fact that we are the pinnaclepinnacle of creationof creation??

16

17

How to lie with How to lie with ssttaattiissttiiccss

The case of the The case of the missing axis.missing axis.

18

1. Chromosome number2. DNA length3. Number of genes

Measures of genome Measures of genome sizesize

19

1.1.

20

1 10 100 1,000 10,000

max

minmean

Logarithmic scale4 orders of magnitude

21

Human karyotype = 46 chromosomes

22

1 10 100 1,000 10,000

Myrmecia pilosula (males)(1)

Jumping jack

23

Haplopappus gracilis (4)

1 10 100 1,000 10,000

Yellow spiny daisy

24

Pisum sativum (14)

1 10 100 1,000 10,000

25

Helianthus annuus (34)

1 10 100 1,000 10,000

Sunflower

26

Felis catus (38)

1 10 100 1,000 10,000

27

Homo sapiens(46)

1 10 100 1,000 10,000

Canis familiaris(78)

28

1 10 100 1,000 10,000

Tympanoctomys barrerae(102)

Red viscacha rat

29

Senecio roberti-friesii(90)

1 10 100 1,000 10,000

Robert & Friesi’s groundsel(belongs to the daisy family)

yellow spiny daisy (4)

30

1 10 100 1,000 10,000

Lysandra atlantica(250)

Atlantic Adonis blue

31

1 10 100 1,000 10,000

Ophioglossum reticulatum(~1260)

……and we are only here.

and we are only here.

Netted adder's-tongue (a fern)

32

KK-value paradox: Complexity -value paradox: Complexity does not correlate with does not correlate with chromosome numberchromosome number..

46 250

Ophioglossum reticulatumHomo sapiens Lysandra atlantica

~1260

33

2.2.

34

105 106 107 108 109 1010 1011 1012

largest

smallest

mean

DNA length (bp)

Logarithmic scale7 orders of magnitude

35

105 106 107 108 109 1010 1011 1012

Carsonella ruddii

DNA length (bp)

An endosymbiont of psyliids, which parasitize hackberry. The smallest known genome of any free-living organism.

36

105 106 107 108 109 1010 1011 1012

Plasmodium falciparum

DNA length (bp)

The human malaria parasite.

37

105 106 107 108 109 1010 1011 1012

Tetrodon fluviatilis

DNA length (bp)

Green-spotted pufferfish

38

105 106 107 108 109 1010 1011 1012

Miniopterusschreibersii

DNA length (bp)

Schreiber's long-wing bat

39

105 106 107 108 109 1010 1011 1012

Homo sapiens

DNA length (bp)

40

Great crested newt

105 106 107 108 109 1010 1011 1012

Triturus cristatus

DNA length (bp)

41

105 106 107 108 109 1010 1011 1012

Ophioglossum petiolatum

DNA length (bp)

Stalked adder's tongue (fern)

42

105 106 107 108 109 1010 1011 1012

200 times morethan me?

Amoeba dubia

DNA length (bp)

43

44

105 106 107 108 109 1010 1011 1012

DNA length (bp)

45

Crepis laciniata Cuminum cyminum

Blatta orientalis

Papaver tauricolaUca pugilator

Homo sapiens

Salvelinus fontinalis

46

CC-value paradox: Complexity -value paradox: Complexity does not correlate with does not correlate with ggenome sizeenome size..

3.4 109 bpHomo sapiens

6.8 1011 bpAmoeba dubia

1.5 1010 bpAllium cepa

47

3.3.

48

It is very difficult to estimate It is very difficult to estimate accuratelaccurately the number of protein-y the number of protein-coding genes in the genome of coding genes in the genome of eukaryotes.eukaryotes.

Reason 1: the large number and Reason 1: the large number and large size of the introns.large size of the introns.

Reason 2: the low density of Reason 2: the low density of genes.genes.

49

Example 1: factor-IX geneExample 1: factor-IX gene

Only about 4%4% of the sequence actually encode the protein.

50

Dystrophin has 79 exons and spans over 2.4 million base pairs of DNA..

Example 2: dystrophin geneExample 2: dystrophin gene

Only about 0.3%0.3% of the sequence encodes the protein.

51

atggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacgatggactggatgccccaggaaaaggaaagaggtataaccataaccgttgcaacgaccgcatgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttc

52

atggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacgatggactggatgccccaggaaaaggaaagaggtataaccataaccgttgcaacgaccgcatgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttc

1

53

gatggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggaatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagtt

2

54

ttgatggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaag

3

55

atggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttc

4

56

gagagaggtcctataagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttccctttga

5

57

tggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaaattccccttctcaggaggctgtgggaattcagggtggataaccccgaagagttccagtcaggtcaacagctcaaagtggaagacgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttcgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaa

6

58

TheThe DDNNAA asas text...text... TheThe DDNNAA asas text...text...

59

nnsnnhumbmjfdiooospfptyrewnzxcmopleprotiuwwrqdngjklsmsnabmjfdioobnmppoewqasdtratyusisosmamgkkpsretwyospfptyrewnzxcmopleprotiuwwrqdngjklsmsnakeytivvkvldtpgppvnvtvkeiskdsayvtweppiidggspiinyvvqkrdaerkswstvttecsktsfrvanleegksyffrvfaeneygigdpgetrdavkasqtpgpvvdlkvrsvsksscsigwkkphsdggsriigyvvdflteenkwqrvmkslslqysakdltegkeytfrvsaenengegtpseitvvarddvvapdldlkglpdlcylakensnfrlkipikgkpapsvswkkgedplatdtrvsvessavnttlivydcqksdagkytitlknvagtkegtisikvvgkpgiptgouqxbzzzzzpikfdevtaeamtlkwappkddggseitnyilekrdsvnnkwvtcasavqkttfrvtrlhegmeytfrvptydumsaenkygvgeglksepivarhpfdvpdappppnivdvrhdsvsltwtdpkktggspitgyhlefkernsllwkranktpirmrdfkvtgltegleyefrvmain1lagvgkpslpsepvvaldpidppgkpevinitrnsvtliwtepkydgghkltgyivekrdlpskswmkanhvnvpecaftvtdlveggkyefrirakntagaisapsestetiickdeyeaptivldptikdgltikagdtivlnaisilgkplpksswskagkdirpsditqitstptssmltikyatrkdageytitatnpfgtkvehvkvtvldvpgppgpveisnvsaekatltwtppledggspiksyilekretsrllwtvvsediqscrhvatkliqgneyifrvsavnhygkgepvqsepvkmvdrfgppgppekpevsnvtkntatvswkrpvddggseitgyhverrekkslrwvraiktpvsdlrckvtglqegstyefrvsaenragigptysappseasdsvlmkdaayppgppsnphvtdttkksaslawgkphydggleitgyvvehqkvgdeawikdttgtalritqfvvpdlqtkekynfrisaindagvgepavipdveiveremapdfeldaelrrtlvvraglsirifvpikgrpapevtonawatwtkdninlknranientesftlliipecnrydtgkfvmtienpagkksgfvnvrvldtpghjiuopzxnllmt

60

nnsnnhumbmjfdiooospfptyrewnzxcmopleprotiuwwrqdngjklsmsnabmjfdioobnmppoewqasdtratyusisosmamgkkpsretwyospfptyrewnzxcmopleprotiuwwrqdngjklsmsnakeytivvkvldtpgppvnvtvkeiskdsayvtweppiidggspiinyvvqkrdaerkswstvttecsktsfrvanleegksyffrvfaeneygigdpgetrdavkasqtpgpvvdlkvrsvsksscsigwkkphsdggsriigyvvdflteenkwqrvmkslslqysakdltegkeytfrvsaenengegtpseitvvarddvvapdldlkglpdlcylakensnfrlkipikgkpapsvswkkgedplatdtrvsvessavnttlivydcqksdagkytitlknvagtkegtisikvvgkpgiptgouqxbzzzzzpikfdevtaeamtlkwappkddggseitnyilekrdsvnnkwvtcasavqkttfrvtrlhegmeytfrvptydumsaenkygvgeglksepivarhpfdvpdappppnivdvrhdsvsltwtdpkktggspitgyhlefkernsllwkranktpirmrdfkvtgltegleyefrvmain1lagvgkpslpsepvvaldpidppgkpevinitrnsvtliwtepkydgghkltgyivekrdlpskswmkanhvnvpecaftvtdlveggkyefrirakntagaisapsestetiickdeyeaptivldptikdgltikagdtivlnaisilgkplpksswskagkdirpsditqitstpptytssmltikyatrkdageytitatnpfgtkvehvkvtvldvpgppgpveisnvsaekatltwtppledggspiksyilekretsrllwtvvsediqscrhvatklisaqgneyifrvsavnhygkgepvqsepvkmvdrfgppgppekpevsnvtkntatvswkrpvddggseitgyhverrekkslrwvraiktpvsdlrckvtglqegstyefrvsaenragigppseasdsatonawavlmkdaayppgppsnphvtdttkksaslawgkphydggleitgyvvehqkvgdeawikdttgtalritqfvvpdlqtkekynfrisaindagvgepavipdveiveremapdfeldaelrrtlvvraglsirifvpikgrpapevtwtkdninlknranientesftlliipecnrydtgkfvmtienpagkksgfvnvrvldtpghjiuopzxnllm

humpty dumpty sat on a wall...

61

It is very difficult to estimate It is very difficult to estimate accuratelaccurately the number of protein-y the number of protein-coding genes in the genome of coding genes in the genome of eukaryotes.eukaryotes.

Reason 1: the large number and Reason 1: the large number and large size of the introns.large size of the introns.

Reason 2: the low density of Reason 2: the low density of genes.genes.

62

From 23 genes per million base pairs on From 23 genes per million base pairs on chromosome 19 (chromosome 19 (3%3%) to only 5 genes per ) to only 5 genes per million base pairs on chromosome 13 (million base pairs on chromosome 13 (0.7%0.7%).).

There are gene-dense (urban centers) and There are gene-dense (urban centers) and gene-poor (deserts) chromosomesgene-poor (deserts) chromosomes

63

64

Gene Numbers:Gene Numbers:

Pre-draft and post-Pre-draft and post-draft predictions.draft predictions.

65

66

Two months laterTwo months later

Correction:Correction:Nature Genet. 25, 239– 240 (2000)Nature Genet. 25, 239– 240 (2000)

““These improved estimates provide a lower bound of These improved estimates provide a lower bound of 56,960 and an upper bound of 81,273 genes in the 56,960 and an upper bound of 81,273 genes in the human genome.”human genome.”

67

1515February February

20012001

1st draft

68

69

July 2000

Bets: 165Mean: 61,710Lowest: 27,462Highest: 153,478

Bets: 281 Median: 61,302 Lowest: 27,462 Highest: 212,278

July 2001

The gene number game: The gene number game:

GenesweepGenesweep©©

70

finished sequence

21 21 October October 20042004

71

Ensembl (October 2004): Ensembl (October 2004):

20,13420,134 protein-coding genesprotein-coding genes

72

Genebuild last updated: October 2008 Known protein-coding genes: 21,343Novel protein-coding genes: 73Pseudogenes: 9,899RNA-specifying genes: 5,732Exons: 297,252RNA transcripts: 62,877SNPs: 15,040,632

73

NN-value paradox: Complexity -value paradox: Complexity does not correlate with does not correlate with protein-coding gprotein-coding gene numberene number..

~25,000 genes~25,000 genes ~25,000 genes~25,000 genes ~60,000 genes~60,000 genes

74

Summary: Summary: 3 genomic paradoxes3 genomic paradoxes

KN

C

75

Lack of correspondence between Lack of correspondence between measures of gmeasures of genome sizeenome size and the and the presumed amount of genetic presumed amount of genetic information “needed” by the information “needed” by the organism (its “organism (its “comcompplexitlexity”).y”).

Genomic paradox:Genomic paradox:

76

What is complexity?What is complexity?

77

959 cells959 cells 1,031 cells1,031 cells

19,000 genes19,000 genes 13,600 genes13,600 genes~10~1088 cells cells

78

If humans are the indeed the pinnacle of If humans are the indeed the pinnacle of creation, they should have the creation, they should have the bibiggggestest, , larlarggestest, , tallesttallest, & , & fattestfattest Texas-sizeTexas-size genome. genome.

They don’t, so they aren’t.They don’t, so they aren’t.

79

The human genome is The human genome is disappointing…disappointing…

80

The human genome is:The human genome is:

•smallsmall•emptyempty•repetitiverepetitive•unoriginalunoriginal•inelegantinelegant

top related