computational molecular biology biochem 218 – biomedical informatics 231 discovering...
TRANSCRIPT
![Page 1: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/1.jpg)
Computational Molecular BiologyBiochem 218 – BioMedical Informatics 231
http://biochem218.stanford.edu/
Discovering Transcription FactorBinding Sites in Co-Regulated Genes
Doug BrutlagProfessor Emeritus
Biochemistry & Medicine (by courtesy)
![Page 2: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/2.jpg)
Motivation
Searching for conserved sequencemotifs regulating the expression
MicroArray analysis of whole genome gene expression
Clustering of genes based on their expression pattern
![Page 4: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/4.jpg)
T Cells Signaling
DNA Damage
Fibroblast Stimulation
B Cells Signaling
CMV Infection
Anoxia
Polio InfectionMonocytes Signaling IL4
Hormone
Human Gene Expression Signatures
![Page 5: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/5.jpg)
Upstream Regions Co-expressed
Genes
GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC
CACATCGCATCACGTGACCAGT...GACATGGACGGC
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG
CGCTAGCCCACGTGGATCTTGA...AGAATGACTGGC
Finding Transcription Factor Binding Sites
Pho 5
Pho 8
Pho 81
Pho 84
Pho …
Transcription Start
![Page 6: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/6.jpg)
Upstream Regions Co-expressedGenes
GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC
CACATCGCATCACGTGACCAGT...GACATGGACGGC
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG
CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT
Finding Transcription Factor Binding Sites
![Page 7: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/7.jpg)
Upstream Regions Co-expressedGenes
ATGGCTGCACCACGTTTATGC...ACGATGTCTCGC
CACATCGCATCACGTGACCAGT...GACATGGACGGC
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
TTAGGACCATCACGTGA...ACAATGAGAGCG
CGCTAGCCCACGTTGATCTTGT...AGAATGGCCTAT
Pho4 binding
Finding Transcription Factor Binding Sites
![Page 8: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/8.jpg)
Three Algorithms
• BioProspectoro Presented in 2000o Extends Gibb’s sampling (stochastic
method)o For any cluster of sequences
• MDScano Deterministic approacho Enumerativeo Very fasto For sequences with some ranking
information• MotifCut and MotifScan
o Graph-basedo Does not use PSSMso Novel and sensitive
![Page 9: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/9.jpg)
Representing Ambiguous DNA Motifs
• Sequence Patterns (Regular expressions)
• IUPAC nomenclatures for DNA ambiguities
Consensus motif: CACAAAADegenerate motif: CRCAAAW
A/TA/G
![Page 10: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/10.jpg)
Weight Matrix for Transcription Factor Binding Sites
A DNA Motif as a position specific frequency weight matrix
SitesATGGCATG
AGGGTGCG
ATCGCATG
TTGCCACG
ATGGTATT
ATTGCACG
AGGGCGTT
ATGACATG
ATGGCATG
ACTGGATG
Pos A C G T1 9 0 0 12 0 1 2 73 0 1 7 24 1 1 8 05 0 7 1 26 8 0 2 07 0 3 0 78 0 0 8 2
Alignment Matrix Frequency weight MatrixPos A C G T Con
1 0.9 0 0 0.1 A2 0 0.1 0.2 0.7 T3 0 0.1 0.7 0.2 G4 0.1 0.1 0.8 0 G5 0 0.7 0.1 0.2 C6 0.8 0 0.2 0 A7 0 0.3 0 0.7 T8 0 0 0.8 0.2 G
![Page 11: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/11.jpg)
Weight Matrix with Consensus Sequence & Logotype with Degenerate
Consensus
TTWHYCGGHY
Weight Matrix or Position Specific Scoring Matrix
![Page 12: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/12.jpg)
BioProspector Initialization
Gather together upstream regulatory regions
![Page 13: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/13.jpg)
BioProspector Initialization
a1
a2
a3
a4
ak
Actual Location of Regulatory Motifs is Unknown
![Page 14: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/14.jpg)
BioProspector Initialization
Initial Motif
Randomly initialize the beginning motif
a3'a4'ak'
a2'a1'
![Page 15: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/15.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
a1'
BioProspector Iterative Update
Take out one sequence at a time with its segment
![Page 16: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/16.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Se
gm
en
t S
core
Segment (1-6): 1.5 Sequence 1
BioProspector Iterative Update
Score each segment with the current motif
![Page 17: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/17.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
Segment (2-7): 3
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of SegmentS
eg
me
nt
Sco
re
Sequence 1
BioProspector Iterative Update
Score each segment with the current motif
![Page 18: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/18.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
Segment (3-8): 2.7
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Se
gm
en
t S
core
Sequence 1
BioProspector Iterative Update
Score each segment with the current motif
![Page 19: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/19.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
Segment (4-9): 9.0
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Se
gm
en
t S
core
Sequence 1
BioProspector Iterative Update
Score each segment with the current motif
![Page 20: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/20.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
Segment (5-10): 3.2
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Se
gm
en
t S
core
Sequence 1
BioProspector Iterative Update
Score each segment with the current motif
![Page 21: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/21.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
Segment (6-11): 27.1
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Se
gm
en
t S
core
Sequence 1
BioProspector Iterative Update
Score each segment with the current motif
![Page 22: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/22.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
Segment (7-12): 11.2
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Se
gm
en
t S
core
Sequence 1
BioProspector Iterative Update
Score each segment with the current motif
![Page 23: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/23.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
Segment (8-13): 2.9
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Se
gm
en
t S
core
Sequence 1
BioProspector Iterative Update
Score each segment with the current motif
![Page 24: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/24.jpg)
a3'a4'ak'
a2'
Motif Withouta1' Segment
Segment (9-14): 9.1
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Se
gm
en
t S
core
Sequence 1
BioProspector Iterative Update
Score each segment with the current motif
![Page 25: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/25.jpg)
a3'a4'ak'
a2'
Segment Scores of Sequence 1
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Se
gm
en
t S
core
Motif Withouta1' Segment
a1"
Candidate Motif
BioProspector Iterative Update
Score sequence 1 in all possible alignments
![Page 26: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/26.jpg)
a3'a4'ak'
a2'
a1"
BioProspector Iterative Update
Repeat the process until convergence
Motif Withouta2' Segment
![Page 27: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/27.jpg)
Challenges for BioProspector http://bioprospector.stanford.edu/
• Variable (0-n) motif sites per sequence• Motif enriched only in upstream
sequences, not in the whole genome • Some motifs could have two conserved
blocks separated by a variable length gap
• Motifs are not highly conserved (~50%)• Some motifs show a palindromic
symmetry• Assign motifs a measure of statistical
significance
![Page 28: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/28.jpg)
Thresholds Allow forVariable Motif Copies
• Sequences that do not have the motif• Sequences with multiple copies of motif
Sampling with Two Threshold
0
5
10
15
20
1 2 3 4 5 6 7 8 9 0 11 12 13 14 15 16 17 18 19 20
Starting Position of Segment
Seg
men
t S
coreTH
TL
Sample
Discard
Keep
![Page 29: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/29.jpg)
BioProspector Finds Motif With Two Blocks
Two-block motifs: GACACATTACCTATGC TGGCCCTACGACCTCTCGC
CACAATTACCACCA TGGCGTGATCTCAGACACGGACGGC
GCCTCGATTACCGTGGTA TGGCTAGTTCTCAAACCTGACTAAA
TCTCGTTAGATTACCACCCA TGGCCGTATCGAGAGCG
CGCTAGCCATTACCGAT TGGCGTTCTCGAGAATTGCCTAT
![Page 30: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/30.jpg)
BioProspector Finds MotifsWith Two Blocks
Two-block motifs
Sequence
Min Gap
Max Gap
blk1 block222.426.530.118.9
97.9
Sample Block2 start
![Page 31: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/31.jpg)
BioProspector Finds Motif With Inverse Complementary Blocks
Two-block motifsPalindrome motifs:
AATGCG
GCGTAA
![Page 32: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/32.jpg)
• B. subtilis transcription best studied• 136 σA-dependent promoter sequences [-
100, 15]
• Look for w1 = w2 = 5, gap[15, 20] two-block motif
• Correctly identified motif [TTGACA, TATAAT]and 70% of all the sites
• Occasionally predicted two promoters““Correct” siteCorrect” site Second site Second site
abrBabrB TTGACG TTGACG TACAATTACAAT
vegveg TTGACA TTGACA TATAATTATAAT
f105f105 TTTACA TTTACA TACAATTACAAT
BioProspector Results:B. subtilis two-block promoter
![Page 33: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/33.jpg)
BioProspector Web Server:http://bioprospector.stanford.edu/
![Page 34: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/34.jpg)
BioProspector Web Server:http://bioprospector.stanford.edu/
![Page 35: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/35.jpg)
Compare Prospectorhttp://compareprospector.stanford.edu/
Liu et al, 2004, Genome Res 14(3): 451-458.
![Page 36: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/36.jpg)
Compare Prospectorhttp://compareprospector.stanford.edu/
1 kb
Liu et al, 2004, Genome Res 14(3): 451-458
Regions conserved between two species
Motif
![Page 37: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/37.jpg)
Compare Prospectorhttp://compareprospector.stanford.edu/
Gene 1
Gene 2
Gene 3
Gene 4
Gene 5
Gene n
Biased sampling:
Initial iterations: Tch
Later iterations: Tcl
Tch
Tch
Tcl
Tcl
Liu et al, 2004, Genome Res 14(3): 451-8,
![Page 38: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/38.jpg)
Compare Prospectorhttp://compareprospector.stanford.edu/
(Liu Y et al, Nucleic Acids Res 32:W204-7)
![Page 39: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/39.jpg)
Compare Prospectorhttp://compareprospector.stanford.edu/
(Liu Y et al, Nucleic Acids Res 32:W204-7)
![Page 40: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/40.jpg)
Yeast Rap1 Sequences
• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment
![Page 41: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/41.jpg)
Cross link protein-DNA interaction
Yeast Rap1 Sequences
• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment
![Page 42: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/42.jpg)
Cross link protein-DNA interactionShear DNA
Yeast Rap1 Sequences
• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment
![Page 43: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/43.jpg)
Immunoprecipitation
Yeast Rap1 Sequences
• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment
![Page 44: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/44.jpg)
PCR amplify and label DNA
Yeast Rap1 Sequences
• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment
![Page 45: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/45.jpg)
Hybridize with microarray and measure reading
Yeast Rap1 Sequences
• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment
![Page 46: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/46.jpg)
Cross link protein-DNA interaction
Shear DNAImmunoprecipitation
Purify DNAPurify DNA
PCR amplify and label DNA
Hybridize with microarray and measure reading
Yeast Rap1 Sequences
• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment
![Page 48: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/48.jpg)
Yeast Rap1 Sequences
• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment
• Rap1 IP Enriched 727 DNA fragmentso 45% are intergenico Average length 1-2 KBo Some are false positiveso Some have multiple Rap1 sites
![Page 49: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/49.jpg)
Useful Insights
• In ChIP-array experiments, highly enriched sequences are usually the real targets
• Transcription factor binding sites occurs more abundantly in these real targets
• Search TF sites from high-confidence sequences first before examine the rest sequences?
Motif Discovery Scan (MDscan)
![Page 50: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/50.jpg)
MDscan Algorithm:Define m-matches
For a given w-mer and any other random w-mer
TGTAACGT 8-mer
TGTAACGT matched 8
AGTAACGT matched 7
TGCAACAT matched 6
TGACACGG matched 5
AATAACAG matched 4
m-matches for an 8-mer
Pick a reasonable m, e.g. in yeast
![Page 51: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/51.jpg)
MDscan Algorithm:Finding candidate motifs
TopSeqs
Seed 1
All IP enriched sequences
m-matches
![Page 52: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/52.jpg)
MDscan Algorithm:Finding candidate motifs
TopSeqs
Seed 2
All IP enriched sequences
m-matches
![Page 53: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/53.jpg)
MDscan Algorithm:Finding candidate motifs
TopSeqs
Seed 3
All IP enriched sequences
m-matches
![Page 54: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/54.jpg)
MDscan Algorithm:Scanning sequences with top motifs
• Keep 30-50 top scoring candidate motifs:
Motif Signal Abundance
ConservedPositions
Specificity(unlikely in genome)
![Page 55: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/55.jpg)
MDscan Algorithm:Scanning sequences with top motifs
• Keep 30-50 top scoring candidate motifs:
• Scan the rest of the sequences with the candidate motifs
Motif Signal Abundance
ConservedPositions
Specificity(unlikely in genome)
![Page 56: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/56.jpg)
MDscan Algorithm:Finding All Motif Instances
TopSeqs
Seed 3
All IP enriched sequences
m-matches
![Page 57: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/57.jpg)
MDscan Algorithm:Refine the motifs
TopSeqs
Seed 3
All IP enriched sequences
m-matches
X
X
X
![Page 58: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/58.jpg)
MDscan Simulation
• Nine motif matrix models with 3 widths and 3 degeneracy
GACTCCCAGATTGCCTGGCTACCTGACTACCAGAGTACCAGACTATCTGAGTACCAGGCTCCCAGACTCCCA
W8S1More
Conserved
W8S3Less
Conserved
GACTCCGAGGGAACCAGCTTCCAAGACTACCACAGTACGAGGCTAGCAGACTGCCGGACTACCAGACTCCCG
![Page 59: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/59.jpg)
MDscan Simulation
Each test set:• 100 sequences of 600 bases from
yeast intergenic • Motif segments generated and
inserted according to the following abundance:
Higher confidenceMotif more abundant
![Page 60: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/60.jpg)
MDscan Simulation
• 100 tests for 3 widths3 strengths4 abundances
3600 tests
![Page 61: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/61.jpg)
MDscan Simulation
• 100 tests for 3 widths3 degeneracy4 abundance
3 X Consensus• MDscan speed 14 X
BioProspector27 X AlignACE
3600 tests
![Page 65: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/65.jpg)
MDscan Biological Tests
• Gal4 & Ste12 [Ren et al. Science 2000]o Gal4: galactose metabolismo Ste12: responds to mating pheromones
![Page 66: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/66.jpg)
MDscan Biological Tests
• SBF & MBF [Iyer et al. Nature 2001]o SBF: Swi4 + Swi6 budding, membrane, cell
wall biosynthesiso MBF: Mbp1 + Swi6 DNA replication and
repair
![Page 67: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/67.jpg)
MDscan Biological Tests
• Rap1 [Lieb et al. Nature Genetics 2001]o Repressor activatoro 37% pol II events in exponentially growing
cells
![Page 68: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/68.jpg)
TAMO: Tools for the Analysis of Motifs
http://fraenkel.mit.edu/TAMO/
![Page 73: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/73.jpg)
Single Microarray Determination of Transcription
Factor Motifs
One microarray experiment, no clustering needed
Basic idea: more affectedsequences may contain
moremotif TF sites
Exp
ress
ion
log
rati
o
Genes
Induced
Repressed
![Page 74: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering Transcription](https://reader035.vdocuments.us/reader035/viewer/2022062318/55168a1e550346a2698b617e/html5/thumbnails/74.jpg)
Summary
• BioProspector is stochastic• BioProspector can get trapped in local
maxima• BioProspector must be run multiple times to
discover the true globally optimal motif• BioProspector is slow• MDScan is deterministic• MDScan always gives the same answer with
the same data• MDScan is fast• MDScan uses rank order data to accelerate
the search process and to allow it to be deterministic
• MDScan is fast enough to search intergenic regions from entire genomes.
• MDScan is not as sensitive as BioProspector