summer 2005 show me the cpg islands! alicia laughton (mathematics ‘06) jessica minnier...
Post on 21-Dec-2015
215 views
TRANSCRIPT
Summer 2005
Show Me the CpG Islands!
Alicia Laughton (Mathematics ‘06)
Jessica Minnier (Mathematics ‘07)
Guided by Yung-Pin Chen (Mathematics/Statistics)
(With Statistical Significance)
Summer 2005
Outline
• DNA Overview
• CpG Islands
• Methods– Traditional Method
– Our Method
• Future plans
Summer 2005
DNA
• Deoxyribonucleic acid
• Double-helix• Chain of nucleotide
subunits• Contains genetic
information
Summer 2005
Nucleotides
• Made up of sugar, Phosphate, and bases
• Four bases– Adenine (A)
– Cytosine (C)
– Guanine (G)
– Thymine (T)
• CpG represents a C directly followed by a G in the DNA sequence
Summer 2005
Methylation
• Causes C to turn into T• Accounts for low
occurrence of CpG dinucleotides in vertebrates– Expectation is 6.25%
randomly
– Actually 1% of total sequence (Bird 1986)
Summer 2005
Sequence AL031723
• Human DNA sequence on chromosome 16• 3 known CpG Islands• Percentage of Content:
– A - 22.7%
– C - 29.5%
– G - 28.3%
– T - 19.5%
– CpG - 3.1%
Summer 2005
CpG Islands
• “regions of DNA with a high G + C content and a high frequency of CpG dinucleotides relative to the bulk genome” -- Gardiner-Garden and Frommer (1987)
Summer 2005
CpG islands & Genes
Gene
5’ end
CpGi
Gene
Promoter CpG islands
Gene CpG islands in body
Gene 3’ end CpG islands
Summer 2005
What is important about CpG Islands?
• Useful in identifying protein-coding regions (Yoon and Vaidyanathan, 2004)
– Associated with “housekeeping genes” and 40% of tissue-specific genes
• Aberrant methylation of CpG sites may cause silencing of tumor-suppressor genes (Deng, Zhou et al, 2002)
Summer 2005
aggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttgagacggagtcttgctctgtcacccaggctggagtgtagtggtgcaatctcggctcactgcacctctgcctcccgggttcaagcgattttcctgcctcagcccccggagtagctgggattacaggtgcccgccaccacacccagctaatttttgtatttttagtagagacggggtttcgccatgttggccaggctggtctcgaactcctgacctcaggtgatccgcctgcctcagcctcccaaagtgctgggattacagacgtgagccactgcggctggcctctctccccgtctttaactgtagccctgtgaattctcatcagcctgggcctggactcagcaggccaaaaagttaccagcagagcccagcacatgtgaggaaagtcggagacgtggcggcgccggccggaggatccttcccaagaccctgggccgctgtggccccctagatcttgcaggttgccagggtgccaggccagggagggggcctttctgagattctcctcattctgacacaggagaggagggcactgacccagtcccaaggtcccgggggaatcagccgaccacagcccaggactgtcccacctgggcagagagcccattctgggtgcccagcccgggcaggcccaggcacccccagcagtgccccgggcagcacctgccagccaggtagtgcagggtgaggttgggcagggcagggcgtggtaggtcagctgagcaaacagctcggagggagagctggggagggctgggaactaggtcgatagaaacacagggactgtgttagggaggggatgccttgccagtcacgcccagccctgactcctgccctctgagggggcttcccccacccctgctgacagccccaggaccggcccctgccaggaggctgacctgccaggagtgaccgccccagacttgagcccttgggaggcaggttctgagtccccttttcctgctcagacccccagggaaacgcaggctgggccagaggcagctgcacagacccctgcagtggggtgctcggtggagagcgctggaggtgggagggaggatgtgtgaggcagcgggagagaatccaggcttcccccacaacacccaccatgagcggtgcagagtaggggtgggcggcacgggagccttcccaccccgcagaaccaggccctgggcagagctggcctacagacgataccggacaagtcctcctccgtcttggtgacagagggagctgggactccctccacccacccactgccacttcagaagcagccacagggagactgggaggggcaggggtgctggggatgagcgtggggctcagccctccctcttcccaccctggagggctgcctccttccagcccacctggaagggtggtgtcagtcccagagcccctgcactccccgccccacctcctgcagctggaacccgcgtgggagccgcacccagcgtcccagggacaaacacagaggccttgggtggtggcggtaccaaggtctgaggcctggcagctcaggggcacccccgtccctgagagaggtcaagaaggggaggcaccaccccccaccacgggacctcgctgacgatgcccatagagagaaaccaggccagtgctgggaggggaaagaccccaggcctcatgagaagtcactgcctgcttttcccctcggccaggaaggaagccccaggcccttccctcccgtctcgggcatactgaccccaggcaccaagcgagaccaggagcccacccctttcctttcccagatggcacaccagtgactctgaatatcggagcgcacccctgctccctgggaggcaggatatcgtgccgctgctccctggggcgcacgataccctccccaggaaggcgccggtcagggcggacgggccagggtgctcaccggtaccaggcgaggccgcgctcgtagcacctgtcgaagaagtggggctcagagcccagcgcgcggacgtcggggtgcagccgcagaaactccagcagggcgcgcgtgccgcccttcttcacgccaacgatgagcgcttgcgggaagcgccggcggccgggaccgctggccaaaggcaggccgggtgctcccgggcggtggacggagctggacggctcggagggcgcgggggccggcgcgggggcgcgggcggccggcgggcagcggccggggagggcgcagaggcagtaggcgccgagcaccagggccacgagcagcatcggcgcgcgggacgcccgcagagcggccccttgcccggcccctgcgccctggccgcccccggccccgccgcccaggccgccgctacctgccatggggtcgcgccgctccaggcccgggagcgggggcagcaggcgggcgcgcatctcggcccgcgcgccgctcagtccgtgggtgcccggcttgtgctctgcgcccggcggtcccgcagcctgggagcgggcgcggggcgggaccgggggcggggtctggacgccctcccccctccccctcccccgcccactccgcctccgaggccactgcctgggctggacccgccggcagccgccaccacccgggcgcgactcgagctgccgggaccaccaggacgctcctgctccgagatcccaggccctggctcgcttgactccggcatcttcacctctgcgcggggaggatgcggcggcggtggccgttcgggacgcagggcagggacagggcggcgcgcgggcctcgggaccctctgtttgaagaccgatccccttccccccccaccccactccgggacgtgcgcggcaggtgcataggccaagccttggcctgcaggagcgggagcctcatcgccaggccaaggggacccaggaaaagcgtcgatccgggcactcggcctgccaagggagaaagaggccgggacagcaccctagtgtgcagagagggatcccagaacgtgtggggggagtctgcggccgggaatggcgtgcgcctcctcttcctgcctgctggagggaccagcaccaaaacaggaaagttcaccctgccaggccttctctccaaagagtcagagggagctccgtagggggatggggttcccggaccccctgccgtggaaggggagtgggaacacagacaggcggcaagggctttcgaggccccctcttgcacaaaccagctcagagatcggagatctttgggatcaattactttccctccccaggcatccgaagcctatcctagcccaggtgtggatgagggtgggagagacgggggaggagggagaggagcaggactggacccccgtgtgacaaacatctgacaagttgctctgaggactgcccccctccttgtggagcccacctcatctggtgtgcatttccctgcggctttcatccagccctgggcgaccctccctcctccatctcagcctccctcctcctgccccacacctcaggcctgggactcgcagatgccaaaagggcctggcagatgccaaagccagaaagtgcagggggactgcatcccccacaggagaccgggttcttccccactacatactcagaccccactccctgcacccactgctcttgcaaaccaggaactaaggggttcccctacccaccccgctccttgcctcctcttgcttttcttttgttttgtttgtttttgagacagagctgcactccagctgactcttgtcgcccaggctggagtgcagtggcacaatctcagctcactacaacctctgcctcccgggttcaagcgattctcctgcctcagcctcccaagtagctgggaatacaggcacccatcaccacgcctggctaatttttgtatttttagtagagatggggtttcaccatgttagtcaggctggtctcaaactcctgacctcaggtaatctgcccacctcagcctcccaaagggctgggattacaggcgtgagccactgtgccccaccctcctcttgcttttctaaaagatgatggtcaaagtacagcccccatttgcccccagacagggcacccttcccagatcgagaccttggggagtctgcgtgacccccacacctggcagacacaggtgcttcactagtgggggaacggctgagcatgtgctgagctcgggggcactagtgggctacagtccccaagtgggaggcccctcaagagcctggatgagctgactgacggtggagaggagggaaggagggcctatggccaaagtcaatccaggacccaactgccgaggccacaggaaggccgggtcaccgcctggaactaggtcggtcacagcccagtgggagccgtggcccggagactcaactgggggccctggttactctgctcgcctccccgcgtcggcacccagaacagagcttgcaggcactgggggcccagtccagggtctcaagagcagacaatgctgccttgcagttggggaaactgagacagggtgagaactttcagaggctcattgcaggctcctagcaggctgaaaggacggaggcacaggcacctaggagcacaccagccccacgtggccacggcccctcggagagcatgaggacacttgcaatgcggaagctcagcaggcccagctctactggctctgcaccgcccagtgaggggtcagcacagttggtccaagggacaataccagattaatgaggcagaagccacgggactgaccccttggaattctccacacccacactgtgcatccttaacccaaagcttctagcttggtagcccctcctaccctcctccctgcagcagggattagggatgcattctgacccctgcctgccgtcaggggagtgaggtctctccctggagcctgagctgaggatgcccaattcagccaggtgagccccgggatggactccatgtcccctagccaccacctgacttccccagcaccccacactggcaccagcccttcagatctcagaagcgagccaccctattctcacggagccccttcctgcctgccctccaaacccaagagtagttttagtacaaaaggcaaagttaacaaataggggtaggcgtcagggaaggaagaggatcagaggatcgggaacggagaaactggagcacctggagaagcgtctgggtcctgccacccccactgactccccaactggccttgggcagggtcctctctgcaggcgctgggtccaagcttggggatgagcagccaccagcgcgggctgcttcagctgaggctgccgcacccccacgtccatcctgggtagaggcaggacagccacagagccccatgcacggggctggactcaccctgggcactcacctaaaggcagtctcctcctttccaaagcccagactttctccggactcccaggaccaccaacaagggttcctgtgcgcagactcgggggtcttggggaggaaggacgctttctaggtggctgcctggaacctggaggcccctttctacagtacctggccagcggtcggtcacacctgagtgcccagagtgagcgggcggcagaggcatttctgacgctgccaggtaatcccacgggctggaaacgacctctgggctgggaagccaccgcctcccccagtcctgctgggtccctcagcagagagaacggaaccggggctttccccacagttttcaaagtttcagggaatcctagccaagtatcattccttcttccggagccgggaccccaggtcaagcctggggcccccacagggcggtcccaaccccactgcccggagcgcacccctgctccctgggaggcaggatatcgtgccgctgctccctggggcgcacgataccctccccaggaaggcgccggtcagggcggacgggccagggtgctcaccggtaccaggcgaggccgcgctcgtagcacctgtcgaagaagtggggctcagagcccagcgcgcggacgtcggggtgcagccgcagaaactccagcagggcgcgcgtgccgcccttcttcacgccaacgatgagcgcttgcgggaagcgccggcggccgggaccgctggcgtttccctcccaggggcccagtggtgaactgaattcaggcctgagacatactctgtctactaagtcaccccatctgcccagccttggtccacctggcactgcccagagacatcagtgatgcatttcggaagctggcaaagtggaccccactggagtacaaaggactcagggacccctgtgctggggaagagaaggagcccaggacctcccccaggggctgcctctgaggggcgtgagattcaggggcctctcgggtgggacctgcgggggccgctagacactgcgggaacttcacatccccaacgcccagcagcagcctgcagggaaggcaggggaggcgagccgggctcagagagggcgagcaacttgccccatccgaaggcaaaggtggtatgagacccgggtcctctctccacctctgccccagccttcctggccacagggctggcgccaggcaggcacggcacaggctcccggcagaggccacggtctcagccatccccacggtctcaggagtccccacggtctcagccgtccccacggtctgagtccccacggtctcagctgttcccacggtctcaggagtccccacaggttcagcagtccccacggtctcagccatccccacggtctcagccgtccccacagtctcagccatccccacggtctcagcagtccctactcaggacttgaaattccagcactggttccgtgatggctcctccagccccctgcccagcccagcatggtcatttccatctcctggcctttccgctgccgtctctctgctggatgctttatccttagtccccgctgagggcagaaggactttccaggaggaattgaccagaacgcagaacagcaggatgtggaatggactggggacagggagagagagatgcagggaccaggagtcggctcggagggttctcctggaagctgacccctccctccatcaggcactcggctgacggtggctacacacctcggggcgcccaggatggcagcactggggctgttcattcaccagtggatccccagcacctaacagagcctggcacgcagtggacattccattaatgtcgctcagtggaagggtatacgtgggaggagaggtcgggaaggctttctggaggtgacggccaggtgaagacgaggagaacagcattccaggccaaggaaccgtgtgggtgaaggctcagcagcagagagcccgggcagtagaggatggggtggagcttaaggccctgcgggaacaggggcggggcttagagtctggcctgaggctggtccagccccgcctcctcctcaggctcccaccaactctgagccaccagaccctcctttgtaaaatgaagacctcagtcatgactcgcatgagtctctgaagagtaacagctttattgtgatgtaattcacacaccactcaatccagccatttgtcgcatgcaaatcaatggttttcagtatattcatagtcgtgcaatcacaatcaattttagaacatttctatcaccccaaaaagaaatcctgtgtccattagcaatgacgccctcttctccccttcccacagcccctggcaaccacgaatctactttctgtctctatgggtttgcctattctggacatttcacaaaaagagaatcattgcttgaagccaggagttcaagaccaacctgggcaacaaagcgagaaccccgtctgtacaaaatattttaaatttagccaggcacagtggcgcacaccagtagtcccagcactttggaagtctgaggcaggaggttcacttgaggcggggaattcaaaaccagcctgggcaacatagggagtaccagtctctacaaaaaatttcaaaatttgccaagcgtgatggtatgcacctatagtcctagcttactcaggaggctgaggtgggaggatcgcttgagcccaggagtacgaggctgcagtgagccatgatcataccactgcattccagcctgggcgacagagtgagagcccatctctaaaacagaaagaaagaaagaaagaaatatggccagtcacagtggctcatgcctgtaatcccagcattttgggaggccaaggcaggtggatcacttgaggtcaggagttcgagaccagcctggccaacatggtgaaaccctgtctctaccaaaaatacataaattagccaggtgtgggccaggcgccatggcttacacttgtaatcccagcactttgggaggccgaggtgggcagatcacctgaggttgagagttcgagaccagcctgaccaacatgaagaaaccctgtctctactaaaaatacaaaaaattagctgggtgtggtggtgcatgcctgtaatctcagctacttgggaggctgaggaaggagaatggcttgaacccgggaggcagaggttgtggtgagccgagatcgcgcgattgcactccagcctgggcaacaacagcaaaactccatctcaaataataataataataaattagccaggtgtggtggtgcacgcctgtagtcccagctactcgggaggctgaggcacaagaaacccttgaacccgggaggcagaggttgcagtgaagctgaaattgcaccattccactccagcctgggagacagagtgagacaccatctctaaaatgaaaaaaaaaaaagagaatcatacaatgttcgtccttttgtgtctgggtctcttactcagcatgttctccaggttcatcaacactgtggcatgtgccagtacctccttcctcttcctgactgagtaatactccatcgtatggatggaccaccttttgttgattccctcattcgttgatggacatctaggttgtttccactgcggggttcttagtaacggtattacagggaaccatagattaccaggtatt
How do you locate the CpG island in a DNA sequence?
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.492 Observed/Expected: 0.548
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.501 Observed/Expected: 0.568
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.500 Observed/Expected: 0.560
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.712 Observed/Expected: 0.604 200 steps later…
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
C+G Content: 0.598 Observed/Expected: 0.421 600 steps later…
Summer 2005
Just a couple formulas…
G+C content =
(# of C’s) + (# of G’s) length of window
Obs/Exp ratio =
Observed # of CpGs # of CpG’s in windowExpected # of CpGs (# of C’s)x(# of G’s) length
=
From window
Summer 2005
Traditional Methods• Gardiner-Garden and Frommer (1987)
– Window size 100 bp and Shift size 1bp– Criteria
• At least 200 base pairs• G + C content greater than 50%• Expected portion of the Obs/Exp ratio calculated over the window• Obs/Exp ratio greater than 0.6
• Takai and Jones (2002)– Window size 200 bp and Shift size 1bp– Criteria
• At least 500 base pairs• At least 7 CpG dinucleotides in 200 base pair sequence• G + C content greater than 55%• Obs/Exp ratio calculated in same fashion as above method• Obs/Exp ratio greater than 0.65
Summer 2005
The Traditional Method
C+G content Obs/Exp ratio
C+
G c
onte
nt/O
bs-E
x p r
ati o
Base Position
Sequence AL031723
Summer 2005
• Modifying the traditional methods
– Window size 200 bp and Shift size 1 bp
– Expected portion of the Obs/Exp ratio is based on whole sequence
• And….
Our Method
Observed # of CpGs # of CpG’s in windowExpected # of CpGs (# of C’s)x(# of G’s) length
=
From entire sequence
Summer 2005
• Cutoffs greater than 97th percentile of observed sequence
Obs/Exp Ratio G+C Content
Mean: 0.0018
Standard Deviation: 0.0014
97th percentile: 0.0058
Mean: 0.5815
Standard Deviation: 0.0818
97th percentile: 0.7350
G+C ContentObs/Exp Ratio
Num
ber
of O
bser
vati
ons
Num
ber
of O
bser
vati
ons
Sequence AL031723
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
Kullback-Leibler: 0.508 Our Obs/Exp: 0.0029 C+G: 0.492
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
Kullback-Leibler: 0.509 Our Obs/Exp: 0.0030 C+G: 0.501
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
Kullback-Leibler: 0.507 Our Obs/Exp: 0.0029 C+G: 0.500
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
200 steps later…Kullback-Leibler: 0.520 Our Obs/Exp: 0.0033 C+G: 0.712
Summer 2005
agaattgcttgaaccgggaggcggaggttgcaatgagctgagatcacaccactgcactccagcatggtgacagagcaagactccatctcaaatcgagtaaaaaaaaaaaaatagctgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggcgggtagatcacgaggtcaggagatcgtagccatcctggctaacacggtgaaaccccgtctctactaaaaatacaaaaagaaattagctgggcgtggtggtgggcgcctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagtcgagatcacgccactgcactccagcctgggcgacagagcgagactcgatctcaaaaaaaaaaaaaaaaaaaaagtgcgacacgaggcacacagtcagtgcccagtggagttcgctgatatggttaccacatccctggggacagcgcctccaccctccaacctcgaggtttgtggaaaaatctgggtccaagctttatttcttaaatattcctctctgcccagcatgtgcacgcagcccgctctggccaggcgagcgggtgtcaatcaaggtgctgagcatccccagggtgccgctcagccccagccgaagtcctggcccgtcatctggtagaacctgcggttgaagggccggtagaactcctgcaggcgccggaccagggcctggggcacgcgtgggtgtggccggcccttggacttgcccaggcagcggggacggctgccgccctgggccttcttgaggcaggggaagcccttggtggcgttgaagtagaagtgcttgtccgtgacgacccgtttcaggcccaggaagtcctgcacgcggccgacctctccggccgggtcgctgaccagacgctccccgctgacgaacaggaagtgggacagggggaagtagcgcagccagtggtccaggtgctgggcgtacaggccgatgcggacggcgctccaggctgtgtccacggggcccaggccgtggcggaaggccagggcgcggaagctgggcaggcccggggtcttggagagcgtctgggcgtagtcggagatggcccgggtcacggggttccgcaccaccacgatcagcttcgtgtccggggacatggcgtggatgcggcggggggcctctcgcgtcacgaagtagctgggggtcttctccatggtgatctgcccatccagggttcggggcatcagactcctgcgggacgggtgcaaggagagggggcctgagcctccccagccctagaccggcccccaggggcccgggaccaaggcccccttatgcccgggaagcccaggcctccagggcgagcaagtcttcctccctgctcgggcccacccctgctagcgtgcgcggctgggcagcctggaacatggactgtgagggtgcccagcccggcacctgcctgcagcccggcctgttccgccggcctgccccgcctgctgctgcactgaggattagggtgacggtcgctggtcgggaggcccaaatgctcctcaccacccacatatcttccctgtgcaatccctgccgtcctcgcttccagagccagctccctcccaccggacccacactttcctggaactaggctgcccccagctcctttctcatcccagaccaagtaccccgaggcccgcccgcctagatcacttgaggtcacccgttcactcagtggctgacagcatcccctaaatcagcccttcaccaattattgacagtgtgtcctcaaccaaaagtagtcctccctgctccctccctcccctgatgtaattacatctcttcccatctttatttattttttg
600 steps later…Kullback-Leibler: 0.510 Our Obs/Exp: 0.0030 C+G: 0.598
Summer 2005
Our MethodK
L D
iver
gen c
e*1 2
/Obs
-Exp
rat
io*1
6 0/C
+G
Co n
tent
Base Position
Kullback-Leibler Divergence Observed/Expected Ratio C+G Content
Sequence AL031723
Summer 2005
Comparison of AL031723
Traditional
Method
Possible CpG Islands
3878-4534 5849-6136
6541-6820 8479-8698
10745-11049 18435-19580
25131-26359 35182-35441
36245-36576 36827-37606
Actual CpG Islands
18928-19547
25201-26371
36997-37693
Summer 2005
Comparison of AL031723
Our Method
Possible CpG Islands
19227-19435
25197-26147
36982-37420
Actual CpG Islands
18928-19547
25201-26371
36997-37693
Summer 2005
Cons
• Traditional Method
– Criteria not stringent enough
– If the expected part of the Obs/Exp ratio is unusually high then a high CpG count may not bring ratio above the cutoff
• Our Method
– Criteria sometimes too stringent
Summer 2005
Future Plans• CpG Islands
• Linkage Disequilibrium and SNPs
–Statistical analysis of the linkage disequilibrium coefficient
–Kullback-Leibler Divergence II