genome organisation and evolution level 3 molecular evolution and bioinformatics jim provan page and...

24
Genome organisation Genome organisation and evolution and evolution Level 3 Molecular Evolution and Level 3 Molecular Evolution and Bioinformatics Bioinformatics Jim Provan Jim Provan Page and Holmes: Sections 3.1.4/5 and 3. Page and Holmes: Sections 3.1.4/5 and 3.

Upload: alisha-pope

Post on 26-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Genome organisation Genome organisation and evolutionand evolution

Level 3 Molecular Evolution and Level 3 Molecular Evolution and BioinformaticsBioinformatics

Jim ProvanJim Provan

Page and Holmes: Sections 3.1.4/5 and 3.3Page and Holmes: Sections 3.1.4/5 and 3.3

Page 2: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

The eukaryotic genomeThe eukaryotic genome

Coding DNACoding DNA

Non-codingDNA

Non-codingDNA

Single-copy proteinSingle-copy proteincoding genescoding genes

Multigene familiesMultigene families

Regulatory sequencesRegulatory sequences

DispersedDispersed

Tandemly repeatedTandemly repeated

Tandemly repeatedTandemly repeatedDNADNA

Transposable elementsTransposable elementsAnd retrovirusesAnd retroviruses

Spacer DNASpacer DNA

Satellite DNASatellite DNA

MinisatellitesMinisatellites

MicrosatellitesMicrosatellites

Page 3: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

The C-value paradoxThe C-value paradox

The amount of DNA per The amount of DNA per haploid genome is haploid genome is known as the known as the C-valueC-valueContrary to Contrary to expectation, the expectation, the amount of DNA is not amount of DNA is not correlated with correlated with complexity:complexity:

The protist, The protist, Amoeba Amoeba dubiadubia has about 200 has about 200 times more DNA times more DNA (670,000,000 kbp) than (670,000,000 kbp) than humans (3,300,000 kbp)humans (3,300,000 kbp)

Cannot be explained by Cannot be explained by differences in gene differences in gene numbernumber

0

2

4

6

8

10

12

0

2

4

6

8

10

12

Myc

oplasm

a pn

eum

oniae

Myc

oplasm

a pn

eum

oniae

Esch

erichi

a co

li

Esch

erichi

a co

li

Sacc

haro

myc

es cer

evisi

ae

Sacc

haro

myc

es cer

evisi

ae

Caeno

rhab

ditis

elega

ns

Caeno

rhab

ditis

elega

ns

Droso

phila

melan

ogas

ter

Droso

phila

melan

ogas

ter

Mus

mus

culu

s

Mus

mus

culu

s

Xenop

us la

evis

Xenop

us la

evis

Homo

sapi

ens

Homo

sapi

ens

Pisu

m sat

ivum

Pisu

m sat

ivum

Liliu

m lo

ngifl

oriu

m

Liliu

m lo

ngifl

oriu

m

Prot

opte

rus ae

thiopi

cus

Prot

opte

rus ae

thiopi

cus

Amoe

ba d

ubia

Amoe

ba d

ubia

Page 4: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

The structure of genesThe structure of genes

There are many forms of genes:There are many forms of genes:Those which produce a protein, a tRNA or an rRNA are Those which produce a protein, a tRNA or an rRNA are referred to as referred to as structural genesstructural genesThose which control how and when genes are Those which control how and when genes are expressed are calledexpressed are called regulatory genes regulatory genesSome Some housekeeping geneshousekeeping genes need to be expressed in all need to be expressed in all tissues e.g. those involved in protein synthesistissues e.g. those involved in protein synthesisOther, Other, tissue-specific genestissue-specific genes, are only expressed in a , are only expressed in a particular cell or tissue type e.g. the insulin gene is only particular cell or tissue type e.g. the insulin gene is only expressed in the pancreatic β-cellsexpressed in the pancreatic β-cells

Whatever their function, all genes contain a Whatever their function, all genes contain a coding region which specifies a polypeptide or an coding region which specifies a polypeptide or an RNA moleculeRNA molecule

Page 5: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Regulation of gene expressionRegulation of gene expression

Coding regions of genes are usually flanked by Coding regions of genes are usually flanked by regulatory regions which control gene regulatory regions which control gene expression through transcription and translationexpression through transcription and translation

Upstream Upstream promoter regionspromoter regions::– In bacteria, there is a In bacteria, there is a Pribnow boxPribnow box (TATAAT) about 10 bp (TATAAT) about 10 bp

upstream from where transcription starts, the upstream from where transcription starts, the ‘-35 site’‘-35 site’ (TTGACA) about 35 bp upstream and the (TTGACA) about 35 bp upstream and the Shine-Dalgarno Shine-Dalgarno boxbox (AGGAGG) about 7 bp before the initiation codon (AGGAGG) about 7 bp before the initiation codon

– In eukaryotes, as well as the In eukaryotes, as well as the TATA boxTATA box, some promoter , some promoter regions contain a regions contain a CAAT boxCAAT box about 40 bp before initiation about 40 bp before initiation codon and a codon and a GC boxGC box (GGGCGG) about 110 bp upstream (GGGCGG) about 110 bp upstream

Downstream elements such as the Downstream elements such as the polyadenylation polyadenylation signalsignal (AATAA) signify the end of transcription and (AATAA) signify the end of transcription and increase stability of RNA transcriptsincrease stability of RNA transcripts

Page 6: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Structure of a typical gene - Structure of a typical gene - alcohol dehydrogenase (alcohol dehydrogenase (AdhAdh))

Promoter regionPromoter region• TATA boxTATA box• CAAT box (in mammals)CAAT box (in mammals)• GC box (GGGCGGG)GC box (GGGCGGG)

Initiation codonInitiation codon Stop codonStop codon

PolyadenylationPolyadenylationsignalsignalAATAAAATAA

Exon 1Exon 1 Exon 2Exon 2 Exon 3Exon 3 Exon 4Exon 4

Intron 1Intron 1 Intron 2Intron 2 Intron 3Intron 3

5’5’ 3’3’

EukaryoteEukaryote

Initiation codonInitiation codon Stop codonStop codon

Promoter regionPromoter region• Shine-Dalgarno box (AGGAGG)Shine-Dalgarno box (AGGAGG)• Pribnow box (TATAAT)Pribnow box (TATAAT)• -35 site (TTGACA)-35 site (TTGACA) ProkaryoteProkaryote

5’5’ 3’3’

Page 7: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

IntronsIntrons

Occur frequently within eukaryotic genomes Occur frequently within eukaryotic genomes and make up most of the length of very long and make up most of the length of very long genesgenesNumber, size and organisation of introns varies:Number, size and organisation of introns varies:

Histones have no introns: chicken pro-Histones have no introns: chicken pro-22-collagen -collagen gene has over fiftygene has over fiftySV40 virus contains an intron of 31 bp: human SV40 virus contains an intron of 31 bp: human dystrophin gene has an intron of over 210,000 bpdystrophin gene has an intron of over 210,000 bpSome introns have genes contained within them - the Some introns have genes contained within them - the AdhAdh gene in gene in DrosophilaDrosophila is located within the intron of is located within the intron of the the outspreadoutspread gene gene

Strong conservation of intron-exon boundaries - Strong conservation of intron-exon boundaries - nearly always begin with GT and end with AGnearly always begin with GT and end with AG

Page 8: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Types of intronsTypes of introns

Most introns in eukaryotes are Most introns in eukaryotes are spliceosomal spliceosomal intronsintrons (‘nuclear introns’) because they are (‘nuclear introns’) because they are spliced by a spliced by a spliceosomespliceosome of proteins and RNA of proteins and RNASome introns can splice without the aid of Some introns can splice without the aid of proteins (“proteins (“self-splicing intronsself-splicing introns”):”):

One class - One class - group I intronsgroup I introns - are sometimes mobile because - are sometimes mobile because they encode proteins such as DNA endonucleases. They are they encode proteins such as DNA endonucleases. They are found in mitochondrial and chloroplast genomes, rRNAs of found in mitochondrial and chloroplast genomes, rRNAs of some eukaryotes and in T4 bacteriophagesome eukaryotes and in T4 bacteriophageGroup II intronsGroup II introns are found in organelles and their bacterial are found in organelles and their bacterial ancestors and contain reverse transcriptase-like sequencesancestors and contain reverse transcriptase-like sequencesGroup III intronsGroup III introns are found in a few protists and are similar to are found in a few protists and are similar to group II introns with the central portion removedgroup II introns with the central portion removed

Page 9: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

The evolution of intronsThe evolution of introns

There are two competing hypotheses for the There are two competing hypotheses for the evolution of spliceosomal introns:evolution of spliceosomal introns:

The The introns-earlyintrons-early hypothesis, proposed by Walter hypothesis, proposed by Walter Gilbert, suggests that introns mark the boundaries Gilbert, suggests that introns mark the boundaries between ancient genes which encoded distinct between ancient genes which encoded distinct proteins.proteins.

Throughout evolution these once-independent proteins Throughout evolution these once-independent proteins have been put together in new combinations to have been put together in new combinations to produce more complex proteins by produce more complex proteins by exon shufflingexon shuffling

An alternative hypothesis (introns-late) suggests that An alternative hypothesis (introns-late) suggests that introns only invaded eukaryote genomes fairly recentlyintrons only invaded eukaryote genomes fairly recently

Page 10: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

The evolution of introns The evolution of introns (continued)(continued)

A crucial prediction of the introns-early A crucial prediction of the introns-early hypothesis is that spliceosomal introns delineate hypothesis is that spliceosomal introns delineate structural or functional units within proteins:structural or functional units within proteins:

Introns are found in the same places in all known globin Introns are found in the same places in all known globin genes, including myoglobin and plant leghaemoglobinsgenes, including myoglobin and plant leghaemoglobinsMore frequently, however, introns do not appear to More frequently, however, introns do not appear to separate functionally distinct parts of proteinsseparate functionally distinct parts of proteins

Other problem with introns-early hypothesis is Other problem with introns-early hypothesis is absence from Archaea and Bacteria:absence from Archaea and Bacteria:

Massive intron loss has been postulated but does not Massive intron loss has been postulated but does not explain why they are found in nuclear copies of explain why they are found in nuclear copies of organelle genes but not in the genes of the organelles organelle genes but not in the genes of the organelles or their precursorsor their precursorsExon shuffling has probably been a factor in later Exon shuffling has probably been a factor in later eukaryoteseukaryotes

Page 11: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Multigene familiesMultigene families

Many genes are found not as individual copies but as Many genes are found not as individual copies but as part of part of multigene familiesmultigene families, larger families of related , larger families of related genes:genes:

Important evolutionary innovation: proteins with similar Important evolutionary innovation: proteins with similar function can be arranged so that they are regulated function can be arranged so that they are regulated efficientlyefficientlyVertebrates have a variety of multipolypeptide globin genes, Vertebrates have a variety of multipolypeptide globin genes, produced by produced by gene duplicationgene duplication, which are adapted to varying , which are adapted to varying oxygen requirements of different developmental stagesoxygen requirements of different developmental stages

Not all genes are functional:Not all genes are functional:PseudogenesPseudogenes arise through gene duplications but acquire arise through gene duplications but acquire mutations since only one copy is requiredmutations since only one copy is requiredProcessed pseudogenesProcessed pseudogenes, which lack promoters and introns, , which lack promoters and introns, have been produced by reverse transcription of mRNAhave been produced by reverse transcription of mRNA

Page 12: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Multigene families (continued)Multigene families (continued)

EmbryonicEmbryonic FoetalFoetal PseudogenePseudogene AdultAdult

00

100100

200200

Mill

ions

of

years

ago

Mill

ions

of

years

ago

Page 13: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Evolution of multigene familiesEvolution of multigene families

Most obvious way in which gene number can change Most obvious way in which gene number can change between species is through between species is through gene duplicationgene duplication::

Can arise through unequal crossing-overCan arise through unequal crossing-overMay occur by duplication of entire genomes (May occur by duplication of entire genomes (polyploidypolyploidy):):

– Common in plants: around 50% of angiosperms are polyploidCommon in plants: around 50% of angiosperms are polyploid– Xenopus laevisXenopus laevis is tetraploid: normal meiosis is possible is tetraploid: normal meiosis is possible– Other members of the genus Other members of the genus XenopusXenopus have chromosome have chromosome

numbers ranging from 20 to 108numbers ranging from 20 to 108

Another mechanism of geneAnother mechanism of gene duplication is duplication is transpositiontranspositionFate of new gene depends on function: redundancy Fate of new gene depends on function: redundancy vs. natural selectionvs. natural selectionGenes can also acquire new functions without Genes can also acquire new functions without duplication e.g. duplication e.g. -crystallin and LDH-crystallin and LDH

Page 14: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Gene duplication in the Gene duplication in the HoxHox gene gene familyfamily

Homeotic genes Homeotic genes control the control the development of body development of body plan in animalsplan in animalsIn both vertebrate In both vertebrate HoxHox and invertebrate and invertebrate HOMHOM genes, there is a highly genes, there is a highly conserved protein conserved protein motif known as a motif known as a homeoboxhomeoboxMutations in Mutations in HoxHox//HOMHOM genes can drastically genes can drastically affect the organisation affect the organisation of body partsof body parts

Page 15: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Although Although HoxHox//HOMHOM genes are related, their genes are related, their organisation differs between organisms:organisation differs between organisms:

In vertebrates, there are multiple clusters of In vertebrates, there are multiple clusters of HoxHox genes: genes: the mouse has four clusters, each located on a different the mouse has four clusters, each located on a different chromosome and covering over 100 kbchromosome and covering over 100 kbHOMHOM genes in genes in DrosophilaDrosophila are found in two clusters, are found in two clusters, Antennipedia and Bithorax, on the same chromosomeAntennipedia and Bithorax, on the same chromosomeIn amphioxus – a class of marine invertebrates which are In amphioxus – a class of marine invertebrates which are the closest relatives to the vertebrates – there is a single the closest relatives to the vertebrates – there is a single cluster of at least 10 cluster of at least 10 HoxHox genes each of which is genes each of which is homologous to a different homologous to a different HoxHox gene in vertebrates: origin gene in vertebrates: origin of vertebrates coincided with a series of gene of vertebrates coincided with a series of gene duplicationsduplications

Example of a Example of a disperseddispersed gene family in vertebrates gene family in vertebrates

Gene duplication in the Gene duplication in the HoxHox gene gene familyfamily

Page 16: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Gene duplication in the Gene duplication in the HoxHox gene gene familyfamily

GeneGeneDuplicationsDuplications(four clusters)(four clusters)

AmphioxusAmphioxus

HypotheticalHypotheticalCommonCommonAncestorAncestor

lablab pbpb DfdDfd ScrScr AntpAntp UbxUbx AbdAAbdA AbdBAbdB

DrosophilaDrosophila

Page 17: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Tandem arraysTandem arrays

Tandem arrays contain Tandem arrays contain multiple copies of genes multiple copies of genes with the same functionwith the same function

Good example is the Good example is the rDNA array:rDNA array:

18S18S 5.8S5.8S 28S28S

NTSNTS

ETSETS ITS1ITS1 ITS2ITS2

Large quantities of rRNA Large quantities of rRNA requiredrequired

Genes and spacers co-Genes and spacers co-transcribed and separated transcribed and separated by non-transcribed spacerby non-transcribed spacer

Variation in size of arrays:Variation in size of arrays:– 1 copy in 1 copy in TetrahymenaTetrahymena

– 19,300 copies19,300 copies in in AmphiumaAmphiuma

Page 18: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Evolution of rDNA arraysEvolution of rDNA arrays

Because they contain both highly conserved (18S) Because they contain both highly conserved (18S) and highly variable (NTS) regions, rDNA sequences and highly variable (NTS) regions, rDNA sequences have been used frequently in molecular systematicshave been used frequently in molecular systematicsDespite this, they do not evolve in a simple manner:Despite this, they do not evolve in a simple manner:

Although there is a high degree of sequence similarity within Although there is a high degree of sequence similarity within species, there is great divergence between themspecies, there is great divergence between themDue to unequal crossing-over and gene conversion, Due to unequal crossing-over and gene conversion, concerted evolutionconcerted evolution can take place which allows genes to can take place which allows genes to evolve together by spreading mutations throughout evolve together by spreading mutations throughout membersmembersThis makes phylogenetic analysis difficult since it is not easy This makes phylogenetic analysis difficult since it is not easy to discern which genes are truly homologousto discern which genes are truly homologousOften leads to “mosaics” of sequences, each with different Often leads to “mosaics” of sequences, each with different phylogenetic historyphylogenetic history

Page 19: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Non-coding repetitive DNANon-coding repetitive DNA

Satellite DNASatellite DNA Highly repetitive (>10Highly repetitive (>1044)) Tandemly repeatedTandemly repeated

Mini-/microsatelliteMini-/microsatellite Moderately repetitiveModerately repetitive Tandemly repeatedTandemly repeated

Transposable elementsTransposable elementsModerately/highly repetitiveModerately/highly repetitiveDispersedDispersed

ClassClass Copy numberCopy number OrganisationOrganisation

Page 20: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Tandemly repeated DNATandemly repeated DNA

Much of the non-coding repetitive DNA in eukaryotes Much of the non-coding repetitive DNA in eukaryotes consists of tandem repeats of short sequence motifs:consists of tandem repeats of short sequence motifs:

Satellite DNASatellite DNA is located mainly in the heterochromatin and is located mainly in the heterochromatin and consists of motifs up to 40 kb in length:consists of motifs up to 40 kb in length:

– The The -satellite DNA of primates based on a 171 bp motif -satellite DNA of primates based on a 171 bp motif repeated for hundreds of kilobasesrepeated for hundreds of kilobases

– Over 60% of the genome of Over 60% of the genome of Drosophila nasutoides Drosophila nasutoides is satellite is satellite DNADNA

MinisatellitesMinisatellites and and microsatellitesmicrosatellites are comprised of shorter are comprised of shorter motifs duplicated through unequal crossing over and DNA motifs duplicated through unequal crossing over and DNA slippage:slippage:

– Minisatellites motifs are 11 – 60 bp in length and contain a G-Minisatellites motifs are 11 – 60 bp in length and contain a G-rich “core” sequencerich “core” sequence

– Microsatellites are shorter, generally dinucleotide repeatsMicrosatellites are shorter, generally dinucleotide repeats– Both exhibit extremely high mutation rates and multiple alleles Both exhibit extremely high mutation rates and multiple alleles

are usually found in populationsare usually found in populations– Used in population genetics / forensicsUsed in population genetics / forensics

Page 21: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Transposable elementsTransposable elements

Transposable elementsTransposable elements increase copy number by increase copy number by moving around the genome making additional copies:moving around the genome making additional copies:

Around 50% of the maize genome may be transposable Around 50% of the maize genome may be transposable elementselements

10-20% of the 10-20% of the DrosophilaDrosophila genome genome

Three groups of transposable elements:Three groups of transposable elements:Class I (Class I (retroelementsretroelements) transpose through an intermediate ) transpose through an intermediate RNA stage via reverse transcriptase cf. retrovirusesRNA stage via reverse transcriptase cf. retroviruses

Class II (Class II (DNA elementsDNA elements) transpose directly from DNA to DNA) transpose directly from DNA to DNA

Little is known about Little is known about miniature inverted-repeat transposable miniature inverted-repeat transposable elementselements ( (MITEsMITEs): around 100 – 400 bp in length and ): around 100 – 400 bp in length and transpose by as yet unknown meanstranspose by as yet unknown means

Page 22: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Transposable elementsTransposable elements

Class I transposable elements (retroelements)Class I transposable elements (retroelements)

Reverse transcriptaseLTR LTRRetrotransposonsRetrotransposons

Reverse transcriptase AAAAAARetroposonsRetroposons

Class II transposable elements (DNA elements)Class II transposable elements (DNA elements)

Miniature inverted-repeat transposable elements (Miniature inverted-repeat transposable elements (MITEsMITEs))

e.g. e.g. TouristTourist and and StowawayStowaway

TransposaseAcAc-like elements-like elements

Short repeatShort repeat

Terminal repeatTerminal repeat

Page 23: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

RetroelementsRetroelements

Two subgroups:Two subgroups:RetrotransposonsRetrotransposons contain long terminal repeats at both contain long terminal repeats at both ends: example is ends: example is copiacopia element which is found 20 – 60 element which is found 20 – 60 times in the genome of times in the genome of D. melanogasterD. melanogasterRetroposonsRetroposons have no LTR and have a poly-A tail: have no LTR and have a poly-A tail:

– Long interspersed nuclear elementsLong interspersed nuclear elements ( (LINEsLINEs) are 6 – 8 kb in ) are 6 – 8 kb in length and present in thousands of copies: the length and present in thousands of copies: the L1L1 family is family is present in 590,00 copies in the human genome (17% of total)present in 590,00 copies in the human genome (17% of total)

– Short interspersed nuclear elementsShort interspersed nuclear elements ( (SINEsSINEs) do not produce ) do not produce reverse transcriptase and so are not considered true reverse transcriptase and so are not considered true retroelements: they vary in size from 130 – 300 bp and have retroelements: they vary in size from 130 – 300 bp and have copy numbers from 50,000 to over 1,000,000copy numbers from 50,000 to over 1,000,000

– Originally derived from RNA transcriptsOriginally derived from RNA transcripts

Endogenous retrovirusesEndogenous retroviruses are proviruses which have are proviruses which have been integrated into the germ-line of eukaryotesbeen integrated into the germ-line of eukaryotes

Page 24: Genome organisation and evolution Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 3.1.4/5 and 3.3

Class II (DNA) elementsClass II (DNA) elements

Possess terminal repeats but unlike retrotransposons Possess terminal repeats but unlike retrotransposons these are short (generally < 100 bp) and usually these are short (generally < 100 bp) and usually invertedinvertedEncode a special transposase proteinEncode a special transposase proteinBest known types:Best known types:

Mariner Mariner elements in animalselements in animalsHobo Hobo and and P P elementselements in in DrosophilaDrosophila::

– PP elements can move between species and affect host phenotype elements can move between species and affect host phenotype– Increased infertility due to chromosome breakage (hybrid Increased infertility due to chromosome breakage (hybrid

dysgenesis) occurs in dysgenesis) occurs in D. melanogaster.D. melanogaster. PP elements are not found elements are not found in closely related species (in closely related species (D. simulansD. simulans, , D. sechelliaD. sechellia, , D. D. mauritaniamauritania) but are found in more distantly related species e.g. ) but are found in more distantly related species e.g. D. willistoni D. willistoni group: transferred after group: transferred after D. melanogasterD. melanogaster split from split from sibling speciessibling species

– Insertion can have “knock-out” effect on phenotype e.g. Insertion can have “knock-out” effect on phenotype e.g. whitewhite gene in flies lacking red eye pigmentgene in flies lacking red eye pigment