http://cs273a.stanford.edu [bejerano fall10/11] 1

35
http://cs273a.stanford.edu [Bejerano Fall10/11] 1

Post on 22-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 1

Page 2: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 2

Lecture 15:

HW2 Feedback

Ultraconservation

Page 3: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 33

GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAG

Ultraconserved Elements in theHuman Genome: The Hip & The Hype

Dept. of Developmental BiologyDept. of Computer Science

Stanford University

Gill Bejerano

Page 4: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 4

Sequence Conservation implies Function

(but whichwhich function/s?...)

human

mouse

mammalianancestor

...CTTTGCGA-TGAGTAGCATCTACTATTT...

...ACGTGGGACTGACTA-CATCGACTACGA...

functional region!

Comparative Genomics of related species highlights:

Page 5: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 5

HumanGenome:

3*109 letters

Human Genome full of Conserved Non-Coding Elements

1.5%known

function >50%junk

3x more functional DNA than known!

compare to other species

>5% human genome functional

~106 genomic loci do not code for protein

What do they do then?

Page 6: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 6

Conserved elements in the Human Genome

all human-mouse alignmentshuman-mouse ancestral repeats alignment

Difference: 5% of

Human Genome

[Mouse consortium, Nature 2002]

election

human-mouse ancestral repeats alignment

85%id on average

Page 7: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 7

Conserved elements in the Human Genome

all human-mouse alignmentshuman-mouse ancestral repeats alignment

Difference: 5% of

Human Genome

election

human-mouse ancestral repeats alignment

85%id on average

UltraconservationUltraconservation

Page 8: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 8

Typical DNA Conservation levels

Conserved elements between human and mouse are on average 85% identical. [mouse consortium, 2002]

(dot = base identical to human)

Page 9: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 9

Ultraconserved Elements

[Bejerano et al., Science 2004]

fish

481 elements perfectly conserved (100%id) over

200bp or more between human, mouse and rat.

using2 vs. 3species

Page 10: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 10

Contamination

Page 11: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 11

What exactly is an Ultraconserved Element?

Aha!!

using3 vs. 43species

Page 12: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 12

Ultraconservation as a Phenomenon

Few species More and more species

Hmmm….

Page 13: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 13

Ultraconserved Elements: Why?

Hundreds of long genomic regions identical between amniotes they must have rejected many different changes.

But... all functions we understand in our genome are encoded using redundant codes.

**

*

**

CDS ncRNA TFBS

seq.

Page 14: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 14

Conserved elements in the Human Genome

all human-mouse alignmentshuman-mouse ancestral repeats alignment

Difference: 5% of

Human Genome

election

human-mouse ancestral repeats alignment

85%id on average

UltraconservationUltraconservation

Why did I Why did I look at the tail?look at the tail?

Page 15: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 15

...ACGTACGACTGACTAGCATCGACTACGA........TCTGACTAGCATCGACTACGA...

DNA Replication is Imperfect

It’s imperfect on all scales: small, medium and large.

In particular it begets novel functional entities:

...ACGTACGACTGACTAGCATCGACTACGA...

...ACGTACGACTGACTAGCATCGACTACGA........TCTGACTAGCATCGACTACGA...

functionaljunk

functionalfunctional

functional’’ functional’

regionalduplication

functionaldivergence

Protein & RNA gene families come to life this way. What else does?

Page 16: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 16

Computational Approach I

Group them into paralog families of human functional regions of common origins: • Annotated members induce function on all. • Examine core, substitutions in family. • Test for “guilt by association”. [Bejerano et al., ISMB 2004]

.....ACGTGCATGACTGACTAGCATCAGACGACTAC..GATAATACGCTACGACTAGCTAC.....human DNA

...TGACTAGCATCGACTAC..GATAATACGAC... ...CATCGACTAC..GATAATACGACGGTTGGT...AC T

~400bp

Page 17: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 17

Functional Annotation by Families

[Bejerano et al., ISMB 2004]

Puzzling News:96% of the 700,000appear unique(!)

Good News:We still find12,027 families

novel putative ncRNAs, cis-regulatory elements, etc.

After removing from top 5% Human all annotated regions, and more:

700,000 elements, covering 3.5% Human Genome

Page 18: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 18

human

mouse

rat

related genesrelated elements(75%id over 200bp)

same element96%id over 200bp

same element95%id over 200bp

Computational Approach II

Classical Biological approach: experiment to understand these regions

Computational approach: how many regions like this or “better” are there?

Page 19: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 19

Out popped the Ultraconserved Elements

Puzzling News:96% of the 700,000

conserved non-codingelements appear

unique(!)

Same with Ultras

Page 20: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 20

What could ultras be doing?

•exonic•non•possibly

Page 21: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

Associating distal peaks in a gene-based context is statistically inappropriate

21

Gene transcription start site

Ultraconserved Element

Ontology term (e.g. ‘development’)

http://cs273a.stanford.edu [Bejerano Fall10/11]

N = 8 genes in genome

K = 3 genes annotated with

n = 3 genes selected by proximal peaks

k = 2 selected gene annotated with

P = Pr(k ≥1 | n=2, K =3, N=8)

1.Set gene regulatory domain.

2.Associate Ultras with genes.

3.Per ontology term, count annotated genes selected.

4.Rank terms by enrichment hypergeometric p-value.

Evolved into

http://great.stanford.edu/

Page 22: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

Enrichment Association of Ultraconserved Elements

22

Exo

nic

Ultr

asN

on

-exo

nic

Ultr

as

http://cs273a.stanford.edu [Bejerano Fall10/11]

Page 23: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 23

Ultras are Functional

Back in 2004 we hypothesized:

481 ultraconserved elements

exonic subset –

post transcriptional regulation

[Ni et al., Genes Dev.; Lareau et al., Nature, 2007]

“nonexonic” subset –

transcriptional regulators

[Pennacchio et al., Nature, 2006]

Page 24: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 24

Ultraconserved Non-coding RNA

[Calin et al, Cancer Cell, 2007]miRNA complementarity

About 1/3 of all ultras are expressed.

Some are predicted to provide microRNA targets.

A few are anti-correlated with miRNAexpression levels.

A few even act as oncogenes.

Page 25: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 25

Ultras are Under Strong Human Selection

Ultra DAF NonSyn DAF

[Katzman et al, Science ,2007]

Mutational cold spots? NO. Rare (new) mutations are introduced to the population.

Fierce purifying selection? YES. Very few of these get anywhere near fixation.

chimpA

humans

G AAA

Page 26: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 26

Touch an Ultra And You - DIY

[Ahituv et al., PLoS Biology, 2007]

Page 27: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 27

What can’t we measure in the lab?

sN

s

e ee

esN 21

1),|fixationPr(

Ne is population size, s selective dis/advantage.Both of which are VERY wrong in the lab.

Page 28: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 28

So it can happen – but does it FIX?

tDNA element

mouse

Page 29: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 29

Count Fraction Lost, Binned by %id

t human

macaque

dog

mouse

rat

100bp

sliding

window

count_all

count_hole

bin

by

%id

humandog rat mouse

macaque

Page 30: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 30

Quite Some Time Later

[McLean & Bejerano, Genome Res., 2008]

Page 31: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 31

Ultras are Fiercely Retained through Evolution

Ultras are

>300 fold

more

persistent

than

neutral DNA(25% deleted)

the genomic deletiongenomic deletion is

100%id primates-dog: 1,691,090bp

rodents deleted: 1,447bp (0.086%)

sN

s

e ee

esN 21

1),|fixationPr(

Page 32: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 32

How special are the Ultras?

election

UltraconservationUltraconservation

Page 33: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

http://cs273a.stanford.edu [Bejerano Fall10/11] 33

Ultraconservation as a Phenomenon

Few species More and more species

Hmmm….

We do not see a bump in the curve

Page 34: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

Ultraconserved Elements: What do we know?

• Excessive sequence conservation exists.• Set is heterogeneous from a functional perspective.• Four can be KO-ed with no clear phenotype.• Yet, the set is under extreme selection in natural

populations, both for mutations and deletions.• Most ultras have deep orthology, and no paralogy.• One ultra comes from a mobile element co-option events.• Others may have come from similar events.• Ultras appear the tip of a continuum, not a unique peak.

http://cs273a.stanford.edu [Bejerano Fall10/11] 34

Page 35: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1

Ultraconserved Elements: What we don’t

• What maintains so much conservation?

http://cs273a.stanford.edu [Bejerano Fall10/11] 35

**

*

**