tools for understanding the sequence, evolution, and function of the human genome. jim kent and the...
TRANSCRIPT
- Slide 1
- Tools for understanding the sequence, evolution, and function of the human genome. Jim Kent and the Genome Bioinformatics Group University of California Santa Cruz
- Slide 2
- The Goal Make the human genome understandable by humans.
- Slide 3
- Step 1 Sequence the human genome
- Slide 4
- Idealized Hierarchical Shotgun Sequencing
- Slide 5
- Mapping 300,000 BAC Clones Were Digested and Run on Agarose Gels Cari Soderlunds FPC and Wash U Pathfinders Made Fingerprint Map Contigs Genetic and radiation hybrid maps placed contigs on chromsomes Bob Waterston escaping management
- Slide 6
- Sequence and Assembly BAC Clones shotgun sequenced at high throughput to 4x draft. Assembled with Phil Greens Phrap
- Slide 7
- GigAssembler Jim Kent David Haussler (meanwhile Celera working on whole genome shotgun version)
- Slide 8
- The Truth +-?++?+-?--?+-?+-?++?+-?--?+-? Keeping strands straight is the hard part + light - darkness
- Slide 9
- Finishing Sequence Using primers to end of contigs close gaps. Checking automatic assembly especially near tandem repeats. Checking in-silico restriction digest of BAC matches actual digest. Time consuming - 1 year to draft genome, 2 years to finish. Human finished. Mouse will be finished (currently half finished). Other genomes may stay at draft stage, though draft stage can be very good these days.
- Slide 10
- Now What? TGGCTTTTGAAGGGAGTTCTGTTTATATATACGTCAACATCCAGTTGGAGGTGAAAAGGTTAGCACTTGACCCAGGAAGTATCCATGTTTGTTTCAAAAA TAAATCTGCTTCATAAATTTCTTCATCAGTCTTTTTTTCCATTATGAGCTTTGATTATAATAAAGGAGCTGTTATTAACTTTTATTCAAGAAAAGGCCCA TCTCTTTGAAAATATTTACCACCCTTCTCCCTTTCCCCTCATGAAATGTGCCAACTTCATAGGAATTAACAAATTGTAGCCCAGCCAAATACACGGATGC TTAAGCATACCTGAAACTTGAGTATATTTATTTATTACAGACATCCTAAGACCCGTAAACTCTGCTCTGGATCATATCACTCCAGGATCTCAGAGCTGTT CATGATTGTACAGGAAATGGGGAATATCATAGGCTCACAAAGGATAACTGATAGAACTCAGTGTGGTACTTTGGGGACATCAAACATTGTGCGACATGCA AAAGACTATTCACGAATAACACAAAATATACATTCATTGTGCCATCCATCACATTAACAATTGAGCTGAAAATACATTATATCCAGCTAAGATAACTGTG GAAGGAAGAAATTGGTTTGAATAATACTTTTAGGTTCTGAATAACCCAGCACAAATTTTAAACAGAGGGTGGCCCGAGAAGAAAGGGGTAGAGATTGGGA AAGACTTAGCACAGGAAGCCGGGTTTCTGAAGTTTGTGCTCTGCAGGGCTTCTTAACTGTAAGAACAAATCAAGGCTACCCTCTGAGGCATCTGATTGGG TTTAAATGAGGGAATTTTTTCTTTCACCTATAAAATTGTACCAGTTTAGAGAGTTTGCCCACCCTGTTTTAGTAACCTAAACATTTCTAGAAAATCTGTA TAAAGATAAATCTCTTAGGACAAAGTATTTACAACCAGCAAACTCACACACATGAAAATGACTTAAATTAAGGGATGAATTAATTGTGTAAACATATAGT GCATCTCTTCTTCCTGAGCTCCTGGACTCGCCTTTCGCTATATCCTACTTTCAAGGACAAGGGAGGGGAGAGCTGTACATATAGTTAGATAAAAGATGAG AAGATTCCTTCTGGCATGTTTCTGTTGGCAAAGGGAACTATTTTCCAAAAGGTCATCTGAAAGGAACAGTAGGTTCTGTGAATTCTCCTAAAAGCAGGAG GGATGTTAAGGCCCACCAGAAAATGTATGCTGGCACCCAATCTGGATGAAGGTGTTAACCCCGCACCAAGTCTCTGGTCCAGAATTATCTGCAAATATAT TATCCTGGCCAGGAGCTCCCCAGATAGGATTAGAAAGGAAGAAAGAGACTGTAAATGGAAAGAAAGATAAGCTAAGCATGTGCTTTGGGTAAGAAGTCCC AGCCCAAGGAGATGCCTGGGCTGTTGTCTGGGGCTGGAGCCGCCTCAGTGGGAGGTAGTCAGAGTGTCTGAGGTAGAAGACCCCGGGGAAGGAACGCAGG GCGAAGAGCTGGACTTCTCTGAGGATTCCTCGGCCTTCTCGTCGTTTCCTGGCGGGGTGGCCGGAGAGATGGGCAAGAGACCCTCCTTCTCACGTTTCTT TTGCTTCATTCGGCGGTTCTGGAACCAGATCTTCACTTGGGTCTCGTTGAGCTGCAGGGATGCAGCGATCTCCACCCTGCGGGCGCGCGTCAGGTACTTG TTGAAGTGGAACTCCTTCTCCAGTTCCGTGAGCTGCTTGGTAGTGAAGTTGGTGCGCACCGCGTTGGGTTGACCCAGGTAGCCGTACTCTCCAACTTTCC CTGGGGCAAAGTGGGAAGCCATGAGACGGAAATGTAAAAATTTTTAAATCGACTTGAGATTCCCCACACGCTTCATGGCAACACTCAGGTAAAGAAAAGA TCAAGAACTCAGCACAAATCGGGCTGTGGAGGGTGAGTGATGAGGTGTAAAGTGTTAACCTGATGTAAACCATTAGCATGGTCAGACCGGTGATTAATGG AGCCTCAAGATATTAACAGAACACTACCGTCACAATAACCACCCCCACATACTTCCTATTTCCCAAATGTATAAAATCCTTGAAAACACACCAATCCCTG AGACTTCTTTGCCCCAACACCTCTGGGCACCCTCTCCATGCACTACAACACTAGTCTGATACAAAAGCCTTTTAAAAAAAAGATCATTATTAATTTCCTT GGAAATTAAGCATACCAGCTCCTTCCAGAATAATCAAGGAGCATCCACCAACCAGCAGGACTGACCTGTTTTGGGAGGGTTTCTTTTGACTTTCATCCAG TCAAAAGTCTGCGCTGGAGAAGATGTCTCCGATGCGGGGGAGCGACAGGCTTCTTGGTGGCTGGCGTGGAGAGGGGACAAGGAGTTATTATACGTAGCCA GGGCCAGGCTCTGGTGCTCCTGTCCATATGAGTGGTGAATGTATTGAGGCGAGCCCACCGCGCCCCCAGCATAACCCTGGTGGTGGTGGTGATGCTGGAC CATGGGAGATGAGAGATTTCCAGAGTAAACAGCGGGAGCGCACTGGGGGTACCCACCACTTACGTCTGCTTCCTGATTTAACGCGTAGGGGCTGTAAGGC GCACTGAAGTTCTGTGAGCCATAGCTTGGACCACAACTTGAGTGGGAGTAGGACACCCCCAGGTTCCCGGAAGTCTGGTAGGTAGCCGGCTGGGGGTGGC GATGGTGGTGGTGGTGGTGGTGGTGGGGCGAACCGATCTGCACCCCCCTGCCCACTAGGAAGCGGTCGTCGCCGCCGCAACTGTTGGCGCTGACCGCGCA CGACTGGAAAGTTGTAATCCTATGGTCCGAGGGGTAGGCTCGGGCTGAGCAGGTCCCCGAGTCGCCACTGCTAAGTATGGGGTATTCCAGGAAGGAGTTC ATTCTTGCATTGTCCATCTGTCACTGAGTGACCTGGTCCTGCGAAGCCCGGCGTGACTGTGCCAACTTTCTCACTTCCTC
- Slide 11
- Finding the Genes Dr. Blat helping a gene find itself.
- Slide 12
- SIGLEC7 - a gene with some transcriptional complexity. Sialic Acid Binding/Ig-like Lectin 7 displayed in UCSC Genome Browser
- Slide 13
- Genes: Lines of Evidence Full length human mRNA (the best!) Protein homology with other species. EST evidence - 1st step for much mRNA. Evidence from genome/genome alignments HMM based gene finders
- Slide 14
- Transferrin Receptor in UCSC Genome Browser
- Slide 15
- Clicking on a known gene brings up a large page of information on the gene. Transferrin
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Current state of human genome ~99% of human genome sequenced. Last 1% will still be a challenge. ~85% of human genes located. Substantial resources are being devoted to last 15%. ~20% of human genes with any depth of functional annotation. Curation and integrated database are key to progress.