http://cs273a.stanford.edu [bejeranofall13/14] 1 mw 12:50-2:05pm in beckman b302 profs: serafim...

45
http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejeran TAs: Harendra Guturu & Panos Achlioptas CS273A Lecture 9: Repetitive Elements

Upload: scot-george

Post on 22-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 1

MW  12:50-2:05pm in Beckman B302

Profs: Serafim Batzoglou & Gill Bejerano

TAs: Harendra Guturu & Panos Achlioptas

CS273A

Lecture 9: Repetitive Elements

Page 2: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 2

Announcements

• HW1 done.• HW2 enroute.

Page 3: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

The Functional Genome

http://cs273a.stanford.edu [BejeranoFall13/14] 3

Type # in genome % of genome

genes 25,000 2%

ncRNA 15,000 1%

cis elements 1,000,000 >10%

Page 4: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG

4

Page 5: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 5

One Cell, One Genome, One Replication

Every cell holds a copy of all its DNA = its genome.

The human body is made of ~1013 cells.

All originate from a single cell through repeated cell divisions.

cell

genome =

all DNA

chicken ≈ 1013 copies(DNA) of egg (DNA)

chicken

eggegg

egg

cell

division

DNA strings =

Chromosomes

Page 6: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 6

Every Genome is Different

DNA Replication is imperfect – between individuals of the same species, even between the cells of an individual.

...ACGTACGACTGACTAGCATCGACTACGA...

chicken

egg...ACGTACGACTGACTAGCATCGACTACGA...

functionaljunk

TT CAT

“anything

goes”

many changes

are not tolerated

chicken

This has bad implications – disease, and good implications – adaptation.

Page 7: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 7

Drift, Negative & Positive Selection

Neutral Drift Positive SelectionNegative Selection

Time

Page 8: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

Human Mutation Rate

• 10-9 per base pair per generation

• This refers to mutations that are not repaired

• Thus, there are at least six new mutations in each child that were not present in either parent

• Mutations range from the smallest possible (single base pair change) to the largest – whole genome duplication.

• Selection does not tolerate all of these mutation, but it sure does tolerate some.

chicken

egg

chicken

8

Page 9: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG

9

Page 10: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

Why this cartoon?

http://cs273a.stanford.edu [BejeranoFall13/14] 10

Page 11: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

Sequences that repeat many times in the genome

• Take up cumulatively a whooping half of the genome• Come in two major, very different, flavors

http://cs273a.stanford.edu [BejeranoFall13/14] 11

I

II

Page 12: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 12

I. Interspersed Repeats / TEs

[Adapted from Lunter]

Page 13: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 13

Page 14: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 14

DNA Transposons

Page 15: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 15

Genomic Transmission

For repeat copies to accumulate through the generations they must make it into the germline cells (eggs & sperms).

Equally true for any genomic mutation.

cell

genome =

all DNA

chicken ≈ 1013 copies(DNA) of egg (DNA)

chicken

eggegg

egg

cell

division

DNA strings =

Chromosomes

Page 16: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 16

LINE & SINE Elements

Page 17: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 17

Retrovirus-like Elements

Page 18: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

TE composition and assortment vary among eukaryotic genomes

20%

40%

60%

80%

100%

Slim

e m

old

Budd

ing

yeas

t

Fiss

ion

yeas

tN

euro

spor

aAr

abid

opsi

sR

ice

Nem

atod

eD

roso

phila

Mos

quito

Fugu

Mou

seH

uman

DNA transposons

LTR Retro.

Non-LTR Retro.

Feschotte & Pritham 2006

18http://cs273a.stanford.edu [Bejerano Fall09/10]

Page 19: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 19

Repeat Ages

Page 20: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

Figure from Ryan Gregory (2005)

INTERSPECIES VARIATION IN GENOME SIZE WITHIN VARIOUS GROUPS OF ORGANISMS

20

Page 21: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

The amount of TE correlate positively with genome size

Pla

smod

ium

Slim

e m

old

Buddin

g y

east

Fiss

ion y

east

Neu

rosp

ora

Ara

bid

opsi

sBra

ssic

aRic

eM

aize

Nem

atod

e

Dro

sophila

Mos

quito

Sea

squirt

Zeb

rafish

Fugu

Mou

seHum

an

0

500

1000

1500

2000

2500

3000 Genomic DNA

TE DNA

Protein-codingDNA

Mb

Feschotte & Pritham 2006

21http://cs273a.stanford.edu [Bejerano Fall09/10]

Page 22: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

TEs

Protein-coding genes

The proportion of protein-coding genes decreases with genome size, while the proportion of TEs increases with genome size

Gregory, Nat Rev Genet 2005 22

Page 23: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 23

Page 24: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 24

Page 25: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 25

Repeat Insertions Can Break Things

Page 26: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 26

Repeat Insertions Can Become Functional

Page 27: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 27

Regulatory elements from obile Elements

[Yass is a small town in New South Wales, Australia.]

Co-option event, probably due to favorable genomic context

Page 28: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 28

Britten & Davidson Hypothesis: Repeat to Rewire!

Enhancer structure reminder

Page 29: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

The Road to Co-Option

http://cs273a.stanford.edu [BejeranoFall13/14] 29

Transposition Event

Random Mutations

Neutral decay

PotentialCo-OptionStates

Page 30: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 30

Inferring Phylogeny Using Repeats

[Nishihara et al, 2006]

Page 31: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 31

Assemby Challenges

Page 32: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 32

Transposons as Genetics Engineering Tools

Human Gene Therapy

Page 33: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 33

II. Simple Repeats

•Every possible motif of mono-, di, tri- and tetranucleotide repeats is vastly overrepresented in the human genome.

•These are called microsatellites,Longer repeating units are called minisatellites,The real long ones are called satellites.

•Highly polymorphic in the human population.•Highly heterozygous in a single individual.•As a result microsatellites are used in paternity testing, forensics, and the inference of demographic processes.

•There is no clear definition of how many repetitions make a simple repeat, nor how imperfect the different copies can be.

•Highly variable between species: e.g., using the same search criteria the mouse & rat genomes have 2-3 times more microsatellites than the human genome. They’re also longer in mouse & rat.

AAAAAAAAACACACACACCAACAACAA

Page 34: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 34

DNA Replication

Page 35: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 35

Simple Repeats Create Funky DNA structures

Page 36: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 36

These Bumps Give The DNA Polymerase Hiccups

Page 37: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 37

Expandable Repeats and Disease

Page 38: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

Restriction Enzymes• Restriction enzymes recognize and make a cut within

specific DNA sequences, known as restriction sites. • This is usually a 4-6 base pair palindromic sequence.• Naturally found in different types of bacteria• Bacteria use restriction enzymes to protect themselves

from foreign DNA • Many have been isolated and sold for use in lab work

http://cs273a.stanford.edu [BejeranoFall13/14] 38

blunt end

sticky end

Page 39: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

DNA Fingerprint Basics

DNA fragments of different size will be produced by a restriction enzyme that cuts at the points shown by the arrows.

39

Page 40: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

DNA fragments are then separated based on size using gel

electrophoresis.

40

Page 41: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

DNA Fingerprinting can be used in paternity testing or

murder cases.

41

Page 42: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 42

There are Tracks for it

Page 43: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 43

Interspersed vs. Simple Repeats

From an evolutionary point of view transposons and simple repeats are very different.

Different instances of the same transposon share common ancestry (but not necessarily a direct common progenitor).

Different instances of the same simple repeat most often do not.

Page 44: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

Categories are NOT mutually exclusive• We already discussed repeat instances that became

• Coding exons• Enhancers

• There are known genomic loci that• Code for protein coding exons and act as enhancers• Ditto for non-coding RNA + enhancer

• There are bi-direction exons• Coding in both directions• Coding and anti-sense non-coding• Both non-coding

http://cs273a.stanford.edu [BejeranoFall13/14] 44

Page 45: Http://cs273a.stanford.edu [BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos

http://cs273a.stanford.edu [BejeranoFall13/14] 45