http://cs273a.stanford.edu [bejerano fall10/11] 1 hw1 due this fri 10/15 at noon. ta q&a: what...

39
http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 1

HW1 Due This Fri 10/15 at noon.

TA Q&A: What to ask, How to ask

Page 2: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 2

Lecture 8

Chains & Nets cont’d

Genome Reconstruction

Genomic Conservation & Function

Page 3: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Mutations

•Chromosomal Mutations:–Deletion

–Inversion

–Translocation

–Duplication

–(Nondisjunction)

3http://cs273a.stanford.edu [Bejerano Fall10/11]

Page 4: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 4

Gene Families

Orthologs : Genes related via speciation (e.g. C,M,H3)

Paralogs: Genes related through duplication (e.g. H1,H2,H3)

Homologs: Genes that share a common origin (e.g. C,M,H1,H2,H3)

Species tree

Gene tree

SpeciationSpeciationDuplicationLoss

single

ancestral

gene

Page 5: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 5

Chaining (Paralogs)

Protease Regulatory Subunit 3

Page 6: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 6

Netting (Ortholog)

Page 7: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 7

Convert / LiftOver"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process.

LiftOver – batch utility

Page 8: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 8

Net highlights rearrangements

A large gap in the top level of the net is filled by an inversion containing two genes. Numerous smaller gaps are filled in by local duplications and processed pseudo-genes.

Page 9: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Drawbacks• Nets relentlessly try to fill in gaps

Heuristic guessof best chain!

True dog orthologjust hasn’t beensequenced

9

Page 10: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 10

And Retrogenes

Page 11: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 11

Conservation Track Documentation

Page 12: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 12

Useful in finding pseudogenes

Ensembl and Fgenesh++ automatic gene predictions confounded by numerous processed pseudogenes. Domain structure of resulting predicted protein must be interesting!

gene

pred.

Page 13: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 13

Cautionary Note 2

Page 14: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 14

Same Region…

same in all

the other fish

Page 15: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 15

A Rearrangement Hot Spot

Rearrangements are not evenly distributed. Roughly 5% of the genome is in hot spots of rearrangements such as this one. This 350,000 base region is between two very long chains on chromosome 7.

Page 16: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Drawbacks

• Inversions not handled optimally

> > > > chr1 > > > > > > > chr1 > > >

< < < < chr1 < < < <

< < < < chr5 < < < <

Chains

Nets > > > > chr1 > > > > > > > chr1 > > >

< < < < chr5 < < < <

16

Page 17: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Drawbacks

• High copy number genes can break orthology

17

Page 18: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 18

Self Chain

Page 19: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

From pairwise to multiple alignments

http://cs273a.stanford.edu [Bejerano Fall10/11] 19

Page 20: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

• Example: in 3D (three sequences):

• 7 neighbors/cell

F(i,j,k) = max{ F(i-1,j-1,k-1)+S(xi, xj, xk),F(i-1,j-1,k )+S(xi, xj, - ),F(i-1,j ,k-1)+S(xi, -, xk),F(i-1,j ,k )+S(xi, -, - ),F(i ,j-1,k-1)+S( -, xj, xk),F(i ,j-1,k )+S( -, xj, xk),F(i ,j ,k-1)+S( -, -, xk) }

Multidimensional DP

Page 21: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Progressive Alignment

• When evolutionary tree is known:

Align closest first, in the order of the tree In each step, align two sequences x, y, or profiles px, py, to generate a new

alignment with associated profile presult

x

w

y

z

pxy

pzw

pxyzw

E.g: Blastz – Multiz shown in UCSC browser

Page 22: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Anchor based alignment

http://cs273a.stanford.edu [Bejerano Fall10/11] 22

Example:

Page 23: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Anchor based alignment

http://cs273a.stanford.edu [Bejerano Fall10/11] 23

E.g: Enredo - Pecan shown in ENSEMBL browser

Page 24: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Reconstruct the Boreoeutherian ancestor

Page 25: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Ancestral Genome Reconstruction

Given: - Genomic sequences of several mammals - Phylogenetic tree

Find: The genomic sequence of all their ancestorsARMADILLO TGCTACTAATATTTAGTACATAGAGCCCAGGGGTGCTGCTGAAAGTCTTAAAATGCACAGTGTAGCCCCTCCTCC

COW GCCTCTCTTTCTGCCCTGCAGGCTAGAATGTATCACTTAGATGTTCCAAATCAGAAAGTGTTCAGCCATTTCCATACC

HORSE GTCACAATTTAGGAAGTGCCACTGGCCTCTAGAGGGTAGAAGACAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCC

CAT GTCACAGTTTAGGGGGTACTACTGGCATCTATCGGGTGGAGGATAGGGATACTGATAATCATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCC

DOG GTCACAATTTGGGGGATACTACTGGCATCTAATGGGTAGAGGACAGGGATACTGATAATTGCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCC

HEDGEHOG GTCATAGTTTGATTATATGGGCTTCTTAGTAGACAAAGAAAAAGATGTTCTGGTAGTCATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTC

MOUSE GTCACAGTTTGGAGGATGTTACTGACATCTAGAGAGTAGACTTTAAAGATACTGATAGTCACCCCATTGTGCACCTCC

RAT GTCACAATTTGGAGGATGTTACTGGCATCTAGAGAGTAGACTTTAAGGACACTGATAATCATACTATGCTGCACTTCC

RABBIT ATCACAATTTGGGGAACACCACTGGCATCTCGGGTAGCAGGCCAGGCATGCTGGTAATTATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACC

LEMUR ATCACAATTGGGGGTGCCACGGTCCTCCAGTGGGTAGAGAACAGGGAGGCTGATAACCACCCTGCAGTGCACAGGGCAGTGCCCCACTCCCACCAC

MOUSE-LEMUR ATCACAGTTGGGGGATGCCACTGGCCTCAAGTGGGTAGAGAACAGGGAGGCTGAAAACCACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCC

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAGAAACAGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAGAAACAGGAATGCTTATAATCATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCC

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAAAAACAGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTCGACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCC

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGTGGGGATGCTTATACTCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

Mutational operations• Small-scale : Substitutions, deletions, insertions (inc. transposons)

• Large scale: Genome rearrangement, segmental/tandem duplications(*): Heterochromatin non-included

All of it: Functional,

non-functional, introns,

intergenic, repeats,

everything*!

Page 26: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Reconstruction algorithm

1) Identify orthologous regions in each species

Page 27: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Reconstruction algorithm

2) Compute multiple genome alignment

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

• Goal: Phylogenetic correctness• Two nucleotides are aligned if and only if

they have a common ancestor.

Page 28: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Reconstruction algorithm3) Reconstruct insertion/deletion history

• Find most likely explanation for gaps observed

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

Page 29: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Reconstruction algorithm

3) Reconstruct insertion/deletion history • Find most likely explanation for gaps

observedARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

Page 30: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Reconstruction algorithm

3) Reconstruct insertion/deletion history – Find most likely explanation for gaps

observed

• This defines the presence/absence of a base at each position of each ancestor

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

NNNNNNNNNNNNNNNNNNNNNNNNNNNN-----N-NNNNN-NNNNNNN-NN-NNNNNNNNNNNNNNNNN----------NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Page 31: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Reconstruction algorithm

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

GTCACAATTTGGGGGATGCTACTGGCAT-----C-TAGTG-GGTAGAG-AA-CAGGGATGCTGATAATC----------ATCCTACAGTGCACAGGACAGTGCCCCCACCCCCACTCCAACAACAAAGAATTATCCGGCCCAAAATGCCAATA--------GT--GCCCAGG

4) Infer max.-like. nucleotide at each position

• Ancestral sequences are inferred!

Page 32: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Reconstructing Cancer Genomes

• GleevecTM (Novartis 2001) targets BCR-ABL oncogene.

• Change gene structure and regulatory “wiring” of the genome.

• Create “bad” novel fusion genes and break “good” old genes.

• Example: translocation in leukemia.

promoter

promoter ABL gene

BCR genepromoter

Chromosome 9

Chromosome 22

BCR-ABL oncogene

Page 33: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Complex Tumor Genomes

1) What are detailed architectures of tumor genomes?

2) What rearrangements/duplications produce these architectures and what is the order of these events?

3) What are the novel fusion genes and old “broken” genes?

Page 34: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

Tumor Genomes Projects

Tumor genomeHuman genome

1) Identify recurrent aberrations

2) Identify temporal sequence of aberrations

3) Use these data for tumor diagnostics and therapeutics

Mutation, selection

Tumor genome 2

Tumor genome 4

Tumor genome 3

Page 35: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 35

Meet Your Genome contd.

[Human Molecular Genetics, 3rd Edition]

Page 36: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 36

Sequence Conservation implies Function

•(but which function/s?...)

human

anotherspecies

commonancestor

...CTTTGCGA-TGAGTAGCATCTACTATTT...

...ACGTGGGACTGACTA-CATCGACTACGA...

functional region!

Comparative Genomics of Distantly related species:

Note: the inverse “no conservation no function”is a much weaker statement given current knowledge

Page 37: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 37

Vertebrates: what to sequence?

[Human Molecular Genetics, 3rd Edition]

you are here

, Opossum

, Lizard

, Stickleback

too far

sweet spot

too close

Which species to compare to?

Too close and purifying selection will be largely indistinguishable from the neutral rate.

Too far and many functional orthologs will diverge beyond our ability to accurately align them.

Page 38: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 38

The Dawn of Whole Genome Comparative Genomics

2001 2002

40% DNA alignable95% coding genes shared

Page 39: Http://cs273a.stanford.edu [Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask

http://cs273a.stanford.edu [Bejerano Fall10/11] 39

More Species Have Joined Since