tools for comparative sequence analysis ivan ovcharenko lawrence livermore national laboratory

46
Tools for Comparative Sequence Analysis www.dcode.org Ivan Ovcharenko Lawrence Livermore National Laboratory

Upload: elaine-mckinney

Post on 13-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Tools for Comparative SequenceAnalysis

www.dcode.orgIvan Ovcharenko

Lawrence Livermore National Laboratory

Page 2: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

A set of problems:http://www.dcode.org/bioquest.php

 1. Browsing genomes using synteny links 2. Aligning sequences to vertebrate genomes  3. Aligning sequences to identify evolutionary conserved regions  4. Assigning function to regulatory elements  5. Decoding gene regulation using microarray data  

Page 3: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

zPicture:Dynamic Alignment of

Megabase-long Sequences and Genomes

http://zpicture.dcode.org

Page 4: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

zPicture http://zpicture.dcode.org/

Automated sequence extraction and gene annotation

I. Ovcharenko, G. Loots, R.C. Hardison, W. Miller, and Lisa StubbsGenome Research, 14(3), 472-477 (2004)

Page 5: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

>hg16_dna range=chr16:55400000-55800000 Tataatggctacctatttggagtgcctaccatgtattagtcattgtgcta actgatgtataggcatctcatttacagttcaactcatttgaacctaaatg aagaatagttgtttgtcccttattttatttaacaaaatttaaaactattt ctaagtcgctcattaaatgacaaagcttaaaccaaattttgtctgattgt aaaggccatacttttAATCATTTATATAAAACAACGCAGCCATATTTAAC TTCTGCCATATATTTTCTTACCGATGAATGATATATATCAAATGTTGACT TAGTTTTTAAATGGAAGACAGAAGCGGTTTAGAATGGCCTATTTTCAGTC AGCCAAAAATGTCAAAACCTTCTGTGAGTAGTCCAGGTACTGGAAATCAG ACAATTTGAACTTCAGGATACTACAATAATTTTTTCCTTTGTGGGTAGTG GTGGAGCATGAATTCTCTACTTCTTATTGGTCCTTCTGCTATGATGGCCC TTTCAGTCACACCTCTGTTCTCAAAATAAGAATATAATCAATAAAGTAGA GTTTGAGGGAACGGAGGACTAAGTCAAAAGTGGGATACCTAGGACTTCAT TCTAGttactgtggaattatctcctttgcttttcttcctgtttgtgcttt ttctatcctgttaattctcctgccttatggaaagcacagtgattgtttca cagcataaaccagacatcacttttccagtttaattttttttcaaaggccc ccattgcattttggaaaaaattcaaaatattcaacatggcctacaaagcc ctgtcacccttaaatagtgtgttgagtctggctcctacccacagtctaaa tctcaactgtctccaatcttctccctcactaaactcctaccagcaaatct tttcttcaaactggctaatgccctattctagcctcagagttttgtgctgc tgttctcttaggtacagtgtttttccccaagatttttatctggctttctc ttcttcatttagacttttaaacaaacagcttcatgaattacttgagatgt aattaatatacatacaatttacccatttaaggtatacattttaatgtttt tattatattcacagagttgtacaaccatcacactctaatttcagaacgtt ttcatcttgattcagattttaaatcaaatgtcacatcatccagtaggaac tccagtcactaattagaaatacccattatgtttttacacacattctcaat cccactacctgtttgttattgcacttgaacttacatgaaactatttactt gtttatacatttattgtctGTTATTCCTAGCACATAGAAGGTATGTCTGG CACATAGCAAACACTCGATCTTTGATGAATGAATGAATAATGATAACATT AACTTTTTTGCTTATTCTGCCTTGTATTGTGTAAGATTAGAGACaatcct tacaacaaacttgaaaacccagacttaacgatctctaaaactcacatgta agttaaggctcagagaagtttcatcacttgctcagagttacgtaactggt gaataccgaggctagatttcaaacccaaggctgcccggctctaaaTGAGG GGATATTTGATTAGGCCAAAGTAACCTGAACCCTTAAAATAACcaggctt taacttccagaaacatgggaactagataacctaagaacctgctggccacg aaacccctagaatactgaacacaatatcacaaacatattttgaaatgcat agatgagcatgtaaaatactgagggaactcctcaatggccaaaagtggaa agcagatgaaaaccagaactgtgtaaaagcctgaaagttacagtcgtcct gcagacatttgtcaatctcagtaacaaagggacttagtattttttggcta tggaagacaaaaacaagctttttgtataaggtgggaatgttgaactgaga cctcatgggagaaaaagcagatgaagggttagaggctcagtaaaagaatg aactggaaaaatccatcttctgacaaagaaagacaatgaggaaacttttc tgtcttgggctgggtgCTTGGTTGGAGCAGGGGGAAAGAATCTCTGATTT

> 69149 115179 SLC6A2 69149 69197 UTR 69198 69471 exon 82066 82197 exon 84439 84676 exon 97643 97781 exon 104518 104652 exon 106610 106713 exon 107878 108002 exon 108825 108937 exon 110497 110625 exon 111069 111168 exon 112154 112254 exon 112739 112906 exon 114463 114534 exon 114923 114946 exon 114947 115179 UTR > 173279 186382 CESR 173279 173321 UTR 173322 173373 exon 177416 177623 exon 180095 180239 exon 182703 182836 exon 184865 185018 exon 185907 186077 exon 186078 186382 UTR > 173303 203537 CES1 173303 173321 UTR 173322 173373 exon 177419 177623 exon 180095 180239 exon 182703 182836 exon 184865 185018 exon 185907 186014 exon 186747 186851 exon 189424 189462 exon 193343 193483 exon 195380 195460 exon 195723 195870 exon 199927 200058 exon 202790 202862 exon 203159 203342 exon 203343 203537 UTR < 212212 242464 CES1 212212 212406 UTR 212407 212590 exon 212887 212959 exon 215691 215822 exon 219879 220026 exon 220289 220369 exon 222266 222406 exon 226287 226325 exon 228898 229002 exon 229735 229842 exon 230731 230884 exon 232913 233046 exon 235514 235658 exon 238133 238337 exon 242394 242445 exon 242446 242464 UTR < 229367 242488 CESR 229367 229671 UTR 229672 229842 exon 230731 230884 exon 232913 233046 exon 235514 235658 exon 238133 238340 exon 242394 242445 exon 242446 242488 UTR < 255598 284772 FLJ31547 255598 255832 UTR 255833 256064 exon 256150 256222 exon 262265 262412 exon 265761 265829 exon 268931 269071 exon 270794 270898 exon 272730 272834 exon 275344 275497 exon 279013 279146 exon 281027 281165 exon 283235 283439 exon

Automated sequence and gene annotation extraction http://zpicture.dcode.org/

chr16:55,400,000-…

Page 6: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory
Page 7: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

zPicture: dynamic & interactive alignments visualization tool. http://zpicture.dcode.org/

Dynamic rotation from Pip- to Smooth- plots

Interactive parameter changes

Page 8: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory
Page 9: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

zPicture: dynamic annotation

Page 10: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

zPicture: dynamic selection of conservation parameters

100bps/70%

500bps/85%

Page 11: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Mycobacterium leprae vs. Mycobacterium tuberculosis.

Conservation of genes:

NONhypothetical genes – 97% are conserved

Hypothetical genes -- ∼20% are conserved

zPicture: Aligning complete microbial genomes

Page 12: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

rVista 2.0:Identification of Evolutionarily Conserved Transcription Factor Binding Sites

http://rvista.dcode.org

Page 13: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

rVista 2.0 http://rvista.dcode.org/ Identification of Evolutionarily Conserved Transcription Factor Binding Sites

http://zpicture.dcode.org

http://ecrbrowser.dcode.org

http://globin.cse.psu.edu/gala

Page 14: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Human ACTTTCCTACATCTATCTATA |||||::|||||||:||||||Mouse ACTTTGATACATCTCTCTATA

Human ACTTTGATACATCTATCTATA ||||||||||||||:||||||Mouse ACTTTGATACATCTCTCTATA

Human -----GATACATCTATCTATA ||||| Mouse ACTTTGATAC-----------

Human ACTTTGATACATCTATCTATA |||||Mouse ACTTT----------------

Page 15: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory
Page 16: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Seq ASeq B

(2) zPicture

(1) blastz (3) ECR Browser

New/ Pre-computed Alignments

Select Transcription Factors/ Matrix Similarity•Biobase matrices •User defined consensus sequences

Figure 1A

B

C

Page 17: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

zPicture-rVista 2.0 interconnection

zPicture

rVista 2.0

Page 18: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

ECR Browser:Tool for Browsing Genome

Conservation Profileshttp://ecrbrowser.dcode.org

Page 19: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory
Page 20: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

http://ecrbrowser.dcode.org

Page 21: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

http://ecrbrowser.dcode.org

Page 22: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

http://ecrbrowser.dcode.org

Page 23: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

http://ecrbrowser.dcode.org

Page 24: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

http://ecrbrowser.dcode.org

Page 25: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

http://ecrbrowser.dcode.org

Page 26: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

http://ecrbrowser.dcode.org

Page 27: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

http://ecrbrowser.dcode.org

Grab ECR :: direct access to a conserved element

Page 28: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Genome Alignment:Align your sequence to a

vertebrate genome

Page 29: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Genome Alignment

AC146831

Page 30: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Genome alignment: Output page

Page 31: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

ECR Browser contains rVista portal

Page 32: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Figure 2A

B

C

CardiacEnhancer

Human GGAATGTCATTAATGCGCTGGGGAGACGTCCATTGGAGACAGGCGGCGTTATCCG|||||||||||||||||| ||||||||||||||||||||||||| ||||||||||

Mouse GGAATGTCATTAATGCGCCGGGGAGACGTCCATTGGAGACAGGCAGCGTTATCCG…… ……

Smad Smad

…AGACAGGCA… …AGCCCGGGA…Wild-type Smad-mutation

Page 33: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

eShadow:Phylogenetic Shadowing

of Closely Related Speicieshttp://eshadow.dcode.org

Page 34: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

eShadow: Phylogenetic Shadowing

http://eshadow.dcode.org

Page 35: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory
Page 36: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Phylogenetic shadowing on multiple (10-14) primate sequences

Apo-B 

 

  

   

 

Plasminogen

LXR-alpha

CETP

Boffelli et al., Science, 2003

Page 37: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory
Page 38: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

CREME:Using Microarray Data

to Decode Genome Regulation

http://crem.dcode.org

Page 39: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory
Page 40: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory
Page 42: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

TFBS in Promoter ECRs of RefSeq genes

Testing Motif Abundances

•Identify enriched motifs in a gene set relative to a background set.•Take into account length of promoters

Filtering Similar PWMs

•TRANSFAC contains many redundancies:–Different PWMs for the same TF.–Similar PWMs for TFs from the same family.

•Filtering strategy:–For two PWMs that tend to co-occur in a very small window (4bp), remove the less enriched one.

Page 43: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Human Cell Cycle

16 enrichedPWMs

1089modules

336 genes,Whitfield et al. 02.

7 significantmodules

5 coherentlyexpressed

E2F, NFY, CREB…

Page 44: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Human Cell CycleDELTAEF1, EVI1, GR: 11 genes, p=0.01

Page 45: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Validation on a known module

• NFAT-AP1:– 10 known genes containing multiple regulatory

elements. In all NFAT is upstream of AP1.– CREME reported the correct module only (p=0.01).– CREME correctly identified the correct orientation

of the TFBS.– The module was identified even after adding 10

random promoters to the gene set.

Page 46: Tools for Comparative Sequence Analysis  Ivan Ovcharenko Lawrence Livermore National Laboratory

Colleagues and collaborators

Lawrence Livermore National Laboratory

UC, Berkeley

Stanford

Lawrence Berkeley National Laboratory

Pennsylvania State University

www.dcode.org

Gaby Loots

Lisa Stubbs

Roded Sharan

Asa Ben-Hur

Ross Hardison Webb Miller

Marcelo Nobrega

Dario Boffelli

Sha Hammond