of sea urchins, birds and men algorithmic functions of computational biology – course 1 professor...

Download Of Sea Urchins, Birds and Men Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

If you can't read please download the document

Upload: patience-bradford

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

The Father of All Dot Plots Algorithmic Functions of Computational Biology – Course 1 Professor Istrail The Human Genome

TRANSCRIPT

Of Sea Urchins, Birds and Men Algorithmic Functions of Computational Biology Course 1 Professor Istrail Darwins Finches 2 and Coco The Father of All Dot Plots Algorithmic Functions of Computational Biology Course 1 Professor Istrail The Human Genome The Synteny Problem Between distant species can reveal function Conservation reveals selective pressure Between near species Conservation reveals evolutionary history Between similar or the same species Recent events in subpopulations Phenotypic differences Algorithmic Functions of Computational Biology - Course 1 Professor Istrail Matching, Chaining, Extension Extension Phase Chaining Phase Algorithmic Functions of Computational Biology Course 1 Professor Istrail Matching Phase Dot Plots 101 a,b,c,d stand for letters A,B,C,D for words Where letters match, put a dot Where words match, put a line (words can be rc-ed) Algorithmic Functions of Computational Biology Course 1 Professor Istrail Dot Plots 101 When words line up Reversed Misplaced Something gained (relative to horizontal) Something lost (relative to horizontal) Algorithmic Functions of Computational Biology Course 1 Professor Istrail Some large reversals in GP Algorithmic Functions of Computational Biology Course 1 Professor Istrail NCBI has more of the centromere than anyone else (or is that Ns?) Algorithmic Functions of Computational Biology Course 1 Professor Istrail Many reversals in GP, a piece of the end is re-ordered to the middle, celera assemblies boringly good. Algorithmic Functions of Computational Biology Course 1 Professor Istrail Again everyone misses the first 10MB (or are those Ns) of NCBI31 Algorithmic Functions of Computational Biology Course 1 Professor Istrail Rube Goldbergs Innovation GENOMIC REGULATORY SYSTEMS Mixed character of the problem : continuous mathematics discrete mathematics Open window (A) and fly kite (B). String (C) lifts small door (D) allowing moths (E) to escape and eat red flannel shirt (F). As weight of shirt becomes less, shoe (G) steps on switch (H)which heats electric iron (I) and burns hole in pants (J). Smoke (K) enters hole in tree (L), smoking out opossum (M) which jumps into basket (N),pulling rope (O) and lifting cage (P), allowing woodpecker (Q) to chew wood from pencil (R), exposing lead. Emergency knife (S) is always handy in case opossum or the woodpecker gets sick and can't work. Rube Goldberg s Pencil Sharpener invention A Tale of Two Networks Sea Urchin Drosophila Algorithmic Functions of Computational Biology Course 1 Professor Istrail A Proposal for Nobel Prize Programs built into the DNA of every animal. Eric H. Davidson Genomic Regulatory Systems One gene, 30 years of study, 300 docs and postdocs The Dogma Algorithmic Functions of Computational Biology - Course 1 Professor Istrail Genomic Regulatory Regions Algorithmic Functions of Computational Biology Course 1 Professor Istrail TF Binding Site Complexity Algorithmic Functions of Computational Biology Course 1 Professor Istrail Genome Complexity 1 Billion DNA bases 20,000 Genes cis-Regulatory Modules Complexity 200,000 cis-Modules Algorithmic Functions of Computational Biology - Course 1 Professor Istrail The DNA program that regulates the expression of endo16 in sea urchin THE FIRST GENE THE FIRST NETWORK The View from the Genome Algorithmic Functions of Computational Biology Course 1 Professor Istrail The View from the Nucleus Algorithmic Functions of Computational Biology Course 1 Professor Istrail Building Protein-DNA Assemblies Inter-cismodule linkage Insulation Communication cismodule DNA Cooperativity Linear-amp Gates Potentiality Algorithmic Functions of Computational Biology - Course 1 Professor Istrail The Building Blocks Protein Free Energy DNA Protein-DNA Binding (free energy) Free energy is the GLUE Algorithmic Functions of Computational Biology - Course 1 Professor Istrail Information Processing Algorithmic Functions of Computational Biology - Course 1 Professor Istrail Boolean Circuit Synchronous input and output Completely defined gates 0 Algorithmic Functions of Computational Biology - Course 1 Professor Istrail Synchronous input and output Asynchronous input and output Completely defined gates Incompletely defined gates Boolean Circuit Boolinear Circuit 00 1.1 OR AND NOT OR 1 IF (x1 = 1 AND x2= 1) THEN .. GTAGGATTAAG ... CATCCTAATTC . GTATCTAGAAG . Web page : edu/~chyuh/cathy- mirsky-info.html Caltech, Davidson Lab October 2004 Introduction SNPs, HAPLOTYPES A SNP is a position in a genome at which two or more different bases occur in the population, each with a frequency >1%. GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG The most abundant type of polymorphism The two alleles at the site are G and T Single Nucleotide Polymorphism (SNP) tttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggc ctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcag agttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatc attatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggcc atcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaat ctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccac tcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgc atataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgtt gagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagctt actgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttatt attttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggag ggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttg acgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagca ctttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaataga aaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcgg agcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaag aagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagct aacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactg gatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtgg acatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttga ggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctca tctc gaga gaga gaga gaga gaga gcgc gcgc gcgc tctc gaga gaga gaga gaga gaga tctc tctc tctc tctc gaga gaga gaga tctc gcgc tctc tctc tctc Human Genome contains ~ 3 G basepairs arranged in 46 chromosomes. Two individuals are 99.9% the same. I.e. differ in ~ 3 M basepairs. SNPs occur once every ~600 bp Average gene in the human genome spans ~27Kb ~50 SNPs per gene G C T C G A C A A C A G G T T C G T C A A C A G Two individuals C A G Haplotypes T T G SNP Haplotype Mutations Infinite Sites Assumption: Each site mutates at most once Haplotype Pattern C A G T T T G A C A T G C T G T At each SNP site label the two alleles as 0 and 1. The choice which allele is 0 and which one is 1 is arbitrary. G T T C G A C T A T T A G T T C G A C A A C A T A C G T A T C T A T T A Recombination G T T C G A C T A T T A G T T C G A C A A C A T A C G T A T C T A T T A The two alleles are linked, I.e., they are traveling together ? Recombination disrupts the linkage Recombination Variations in Chromosomes Within a Population Common Ancestor Emergence of Variations Over Time timepresent Disease Mutation Linkage Disequilibrium (LD) Time = present 2,000 gens. ago Disease-Causing Mutation 1,000 gens. ago Extent of Linkage Disequilibrium A Data Compression Problem Select SNPs to use in an association study Would like to associate single nucleotide polymorphisms (SNPs) with disease. Very large number of candidate SNPs Chromosome wide studies, whole genome-scans For cost effectiveness, select only a subset. Closely spaced SNPs are highly correlated It is less likely that there has been a recombination between two SNPs if they are close to each other. Disease Associations Association studies Disease Responder Control Non-responder Allele 0Allele 1 Marker A is associated with Phenotype Marker A: Allele 0 = Allele 1 = Evaluate whether nucleotide polymorphisms associate with phenotype TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G Association studies TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G Data Compression ACGATCGATCATGAT GGTGATTGCATCGAT ACGATCGGGCTTCCG ACGATCGGCATCCCG GGTGATTATCATGAT A------A---TG-- G------G---CG-- A------G---TC-- A------G---CC-- G------A---TG-- Haplotype Blocks based on LD (Method of Gabriel et al.2002) Selecting Tagging SNPs in blocks Real Haplotype Data Two different runs of the Gabriel el al Block Detection method + Zhang et al SNP selection algorithm Our block-free algorithm A region of Chr Caucasian samples