motifclick: cis-regulatory k - length motifs finding in cliques of 2(k-1)- mers shaoqiang zhang ...
TRANSCRIPT
MotifClick: cis-regulatory k-length motifs finding in cliques of 2(k-1)-mers
Shaoqiang Zhang
http://bioinfo.uncc.edu/szhang
April 3, 2013
Gene regulation in prokaryotes
TSS
+1
Promoterregion
Transcription factor binding
sites Terminator
mRNA
Transcription-10-35-300
3’ UTR
cis-regulatory elements
TF1
TF2
Gene1 Gene2 Gene3
Operon
Transcription Factor binding sites (TFBS)
Gene4Gene5
Gene1 Gene2 Gene6Gene3
TF
BS1
BS2
BS3
Co-regulated genes (Regulon) in a single genome
BS1BS2BS3
Cis-regulatory motif / binding site motif.
GeneGeneGeneGeneGene
Orthologous genesGenome1Genome2Genome3Genome4Genome5
Phylogenetic footprinting technique
TGTGAGATAGATCACACATGATTTAAATCGCA……………………………TGTGATCAACATCACA
motif
logo
BS1BS2
BS3
MotifsTTGTTACGTTATAACACGGTTATATTATAACACGGTTATGTTATAACATGGTTATGTTATAACATGGTTATGTTATAACATGGTTATGTTATAACA TGGTTATGTTATAACACGGTTATGTTATAACATGGTTATGTTATAACATTGTTATGTTATAACGATGTTATATTATTACATTGTTATGTTATAACATTGTTATGTTATAACATTGTTATGTTATAACATTGTTATGTTATAACATTGTTATGTTATAACATTGTTATAGTATAACATTAAAATGTTATAACATTAAAATGTTATAACATTAATATGTTATAACATTGTTATAATATAACAATGTTACATTATAACAATGTTACATTATAACAATGTTACATTATAACAATGTTACATTATAACACGGTTATGTTATAACATGGTTATGTTATAACATGGTTATGCTATAACATTAAAATGTTATAACATTAATATGTTATAACA
A -0.839 -5.231 -0.839 -0.839 -1.531 1.688 -5.231 -0.187 -2.909 -5.231 1.688 -5.231 1.639 1.688 -5.231 1.639
C -0.607 -4.695 -4.695 -4.695 -4.695 -4.695 -0.302 -4.695 -2.373 -4.695 -4.695 -4.695 -4.695 -4.695 2.224 -4.695
G -4.611 0.88 2.047 -4.611 -4.611 -4.611 -4.611 1.864 -2.289 -4.611 -4.611 -4.611 -4.611 -4.611 -4.611 -2.289
T 1.235 1.093 -5.174 1.484 1.594 -5.174 1.484 -5.174 1.594 1.745 -5.174 1.745 -2.852 -5.174 -5.174 -5.174
A 5 0 5 5 3 3 0 8 1 0 30 0 29 30 0 29
C 4 0 0 0 0 0 5 0 1 0 0 0 0 0 30 0
G 0 11 25 0 0 0 0 22 1 0 0 0 0 0 0 1
T 21 19 0 2 27 0 25 0 27 30 0 30 1 0 0 0
LibfF 411 )),((Motif Frequency matrix
L
L bq
ibpb,iP
41
1411 )(
),(log))(Prf(
Motif profile matrix (Position weight matrix)
Motif finding from co-regulated/orthologous genes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35
All MEME BioProspectorCUBICMotifSamplerMDscan
Top number of output motifs
Cov
erag
e of
kno
wn
BS
s
Weeder
CONSENSUS
A lot of motif finding programs have been developed such as MEME, BioProspector, MotifSampler, MotifCut, MDscan, Weeder, CONSENSUS etc.
We have also developed a motif finding program -------MotifClick
http://motifclick.uncc.edu
The binding sites of a TF may be divided into distinct sub-motifs.
Merge cliques
MotifClick: sub-motifs
Previous works
• Graph construction: G=(V,E) un-weighted graph, whereV={candidate motif segments}E={for each pair of input sequences, top 10 pairs of segments with the largest numbers of conserved segments in the input seqs}
• Finding clique from an edge• Expand each clique to a closu
re by adding candidate segments
• Sort motif closures in the p-value order
• Graph construction:G=(V,E,W) weighted graphV={all k-mers}E={each pair of k-mers}W={the probability that two k-mers belong to the same motif under the nucleotide background distribution}
• Maximum density subgraph finding (max-flow min-cut algorithm)
• Refine density subgraph • Sort motifs in the order of cons
tructing maximum density graphs.
BOBRO MotifCut
Main idea
• Weighted graph: reduce constructed graph scale by using 2(k-1)-mers.
• Edge weight: use match number and consider the background.
• Clique finding: use the program we designed in GLECLUBS (find clique from each node).
• Expansion: expand cliques into quasi-cliques to include more segments.
• Rank: based on the size of cliques.
Graph construction: Vertex set
s1
si
sN
Input a set of N sequences
2(k-1)
k-1
step length = k-1
Each k-mer is located in exactly one 2(k-1)-mer
size of the last one is in[k,2(k-1)]
Graph construction: Edge setFor each pair of 2(k-1)-mers M’ and M”, calculate the maximum match number:
),(max )(
barmatchNumbeMb
Mamerk
a
b
M
M
k-mer
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0 0.1 0.2 0.3 0.4 0.5
2
},,,{
))()((),(
TGCAb
bqbpbgbsSSD
Probability of each base in a binding site
Sum of squared distance
E coli known binding sites
0.02
0.2
If max match number >=cutoff and the two k-mers a and b with the max matches have
],0(),(
],0(),(
bgbSSD
bgaSSD
Then link M’ and M” with an edge.
How to select cutoffs and ?
05
1015
2025
3035
40
6 8 10 12 14 16
Random
Randomly select a k-mer in the input seqs set, find a k-mer having max matches with it in each seq.
5%
Keep 95% k-mers by deleting min ones and calculate the average match number of the 95%
k-mer with max matches
s1
sisN
=average match number
Sampling times=max{10, N/4}
NOTE: the cutoff can be amended later
Graph construction: G=(V,E)
s1
si
sN
sj
MotifCut: max density subgraphs
BOBRO: maximal clique starting from an edge
MotifClick: maximal cliques starting from each node
1: We can correct the cutoff by calculating the graph density. If the graph density>100, set until density<=100.And update the graph.
Graph construction: G=(V,E)
Cutoff=10Cutoff=11
Break ties by deleting the vertex with minimum sum of weights in the induced subgraph
Neighbor graph of vertex v
Cliques finding
},......,2,1{ CliqueMCliqueCliqueMax sum of matches
Min sum of matches
Top 1 motif: Clique1 (core) + Other cliques (expansion)
CliquesGroup=
Merge other cliques into Clique1
5
3
|1|
|1|
Clique
eotherCliquClique
4
3
||
|1|
eotherCliqu
eotherCliquClique5-clique
4-clique
or
After merging some other cliques into clique1, update the cliques group by removing clique1 and the cliques merged into clique1.
5
3
|}||,1min{|
|1|
eotherCliquClique
eotherCliquClique?????
Gapless alignmentsK-mer
discarddiscard
Cutoff= average match number
Max number of neighborsFor all k-mers in the quasi-clique of 2(k-1)-mers, find the k-mer with max number of neighbors.
MUSCLE4.0: too strict to get ideal results
Final alignment
Main steps
1. Read input fasta file into a matrix2. Calculate background3. Select match cutoff by estimating average match number 4. Build graph of 2(k-1)-mers5. Calculate graph density6. Update graph by deleting edges with matches=cutoff if gr
aph density > density cutoff7. Find all cliques associated with each vertex8. Select the clique with max sum of matches and merge it
with other cliques9. Do gapless alignments on the expanded quasi-clique.10.Update clique group, and go back step 8.
Flowchart of MotifClickEstimate average match number
Set match cutoff=average match num+1
Build graph of 2(k-1)-mers
Graph density<100
Yes
No
Update graphSet match cutoff=cutoff+1
Find all cliques associated with each vertex
Select the clique with max sum of matches and merge it with other cliques
Gapless alignments using average match number as cutoff
Update clique group
Improvement• How many kinds of nucleotides appear in a binding site?
Yeast SGD
1 0
2 4.4%
3 32.4%
4 63.2%http://www.yeastgenome.org
E.Coli RegulonDB
1 0
2 1%
3 14%
4 85%
SGD (S. cerevisiae Genome Database)
So, we only search the k-mers containing at less 3 kinds of nucleotides
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RegulonDB
SGD
Improvement
TTTTTTCA 0.75
Percent of max length of single-nucleotide segments in BSs
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
SGD
RegulonDB
Sum of squared distance
0.02
0.060.10
0.02
0.14
0.180.22
SSD cutoff=0.2
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
SGD
RegulonDB
DBTBS
Redfly
JASPARPer
cen
tag
e
SSD
Command-line options
*********USAGE:*********MotifClick <dataset> [OPTIONS] > OutputFile
<dataset> file containing DNA sequences in FASTA format
OPTIONS:-w motif width (default=16)-n maximum number of motifs to find (default=5)-b 2 if examine sites on both of DNA strands (default=1 only forward)-d upper bound of graph density (default=100)-s 0 if want more degenerate sites (default=1 if want fewer sites)
********* -s 1: match cutoff=average match number+1-s 0: match cutoff=average match number
Coded by standard C++ and compiled by GNU C++ compiler under Linux and Mac, and by MinGW (Minimalist GNU for Windows) under Windows(32bits).
http://bioinfo.uncc.edu/szhang/computing.htm
Synthetic data test
Compare with Motif finding tools: MEME, BioProspector, Weeder and MotifCut
Hu et al. have used RegulonDB database to evaluate five algorithms, AlignACE, MEME, BioProspector, MDscan, and MotifSampler, for the prediction of prokaryotic binding sites, and found that MEME often achieved the best sensitivity, and BioProspector often achieved the highest specificity.
Tompa et al. have used TRANSFAC database to assess 13 computational tools for the discovery of transcription factor binding sites in eukaryotes and found that Weeder was the best, and MEME were also good.
We test programs for k-mer sizes 8, 12, and 16.
Weeder can only find motifs with length 6,8,10,12 (parameters: small (6,8), medium(6,8,10), large(6,8,10,12), extra(6-12, mainly 8,10)
Shaoqiang Zhang et al find MEME and Bioprospector cover true BSs,Then CUBIC, MDscan, MotifSampler, consensus,
Synthetic data test
• Sensitivity : Sn=TP/(TP+FN)=(number of correctly predicted BSs)/(number of actual BSs)
• Specificity: Sp=TP/(TP+FP)=(number of correctly predicted BSs)/(number of predicted BSs)
• Performance coefficient: PC=TP/(TP+FP+FN)= )=(number of correctly predicted BSs)/(number of {actual U predicted BSs})
• F-measure/Harmonic mean: F=2*Sn*Sp/(Sn+Sp)
Binding sites level accuracy:
Synthetic data test
A motif containing 20 binding sites
The motif instance of 20 BSs was randomly seeded into a synthetic fasta file of 20 seqs, not necessarily one BS per seqs.
We generated synthetic sets of background sequences using 3rd-order Markov model.
Motif seqs setSynthetic background seqs set
We will test on 400 length X 20 seqs, 600X20, 800X20, and1000X20.
Meme inputfile.fasta –dna –mod anr –w 8 –nmotifs 1 –text > file.meme.out
Synthetic data test (8-mer/Octamer)
weederTFBS.out –f inputfile.fasta –W 8 –O SC –e 3 –R 50 –M –T 1adviser.out inputfile.fasta S
BioProspector –i inputfile.fasta –W 8 –d 1 –r 1 –o file.biop.out
Motif_cuts.exe inputfile.fasta 8 1
MotifClicker inputfile.fasta –w 8 –n 1 –s 1 >file.motifclick.out
Synthetic background seqs: the dependencies of 3rd-order Markov were estimated from all intergenic seqs of the yeast genome.
Motifs containing 20 BSs with information contents of 12 bits( at most 6 positions are conserved) were chosen from SGD database.
MotifClicker inputfile.fasta –w 8 –n 1 –s 0 >file.motifclick.out
Yeast background: AT: 0.65 GC:0.35
67
197
1132
498 492
297
202
82
330
4 5 0 16 5 10
200
400
600
800
1000
1200
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
SGD
binding site length
Bin
din
g s
ites
countWeederlaucher.out inputfile SC medium M T1
Number of mutations allowedUnfair to other tools
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
SGD
RegulonDB
Sum of squared distance
0.02
0.060.10
0.02
0.14
0.180.22
Background seqs sets size 400*20, 600*20, 800*20, 1000*20,Seed motifs into 100 instances of each size
Synthetic data test (8-mer)Average SSD=0.06
Average SSD=0.10
100 instances of 400*20 seq sets
Note: Weeder did not output any results on the two motifs after setting number of ouput motifs as “T1”, so we decided to use “T2” and only consider top 1 motif of “T2”.
52
63
40
5761
56
41
64
1 1 1 1
1015
612
17 20
1018
75
33 30
46
0
10
20
30
40
50
60
70
80
Sensitivity Specificity Performance
coefficient
F-measure
MotifClicker(-s 1) MotifClicker(-s 0) MEME
MotifCut BioProspector Weeder
400*20
48
65
38
5559
55
40
57
7380
62
7778
91
73
84
4957
36
53
76
4540
57
0
10
20
30
40
50
60
70
80
90
100
Sensitivity Specificity Performance
coefficient
F-measure
MotifClicker(-s 1) MotifClicker(-s 0) MEME
MotifCut BioProspector Weeder
PC F-measure
K-mer size 8 (using two motifs with SSD=0.06 and SSD=0.10, respectively, on 100 datasets)
50 48
40 39
6055
4641
37 36 3431
4440
373233
30 2825
7571
6763
0
10
20
30
40
50
60
70
80
400*20 600*20 800*20 1000*20
MotifClicker(-s 1)
MotifClicker(-s 0)
MEME
MotifCut
BioProspector
Weeder
6460
5652
5550 48
4340 40 38 36
5350 48
45
39 38 36
30
39 37 35
29
0
10
20
30
40
50
60
70
400*20 600*20 800*20 1000*20
MotifClicker(-s 1)
MotifClicker(-s 0)
MEME
MotifCut
BioProspector
Weeder
3937
3533
43
3834
3231 3028
26
3432 31 30
2825
2220
3533 32
30
0
5
10
15
20
25
30
35
40
45
50
400*20 600*20 800*20 1000*20
MotifClicker(-s 1)
MotifClicker(-s 0)
MEME
MotifCut
BioProspector
Weeder
5653
47 45
5752
4742
38 38 3633
4844
423736 34 32
27
5149
46
40
0
10
20
30
40
50
60
70
400*20 600*20 800*20 1000*20
MotifClicker(-s 1)
MotifClicker(-s 0)
MEME
MotifCut
BioProspector
Weeder
Sensitivity Specificity
Dodeca-mer (12-mer)• Synthetic background seqs: the dependencies of 3rd-order Markov were
estimated from all intergenic seqs of the E. coli K12.
• Motifs containing 20 BSs with information contents of 14 bits( at most 7 positions are conserved) and the average SSD=0.02 between each BS and background were chosen from RegulonDB database.
• Seed motifs into 100 background seq sets.
• Test on 400*20, 600*20, 800*20, and 1000*20
• We abandoned Weeder, because it can only set motif length as “small” (length 6 with 1 mutation,length 8 with 2 mutations), “medium” (like small, plus length 10 with 3 mutations, “large” (like medium,plus length 12 with 4 mutations), and “extra”(length 6 with 1 mutation, length 8 with 3 mutations, length 10 with 4 mutations, length 12 with 4 mutations).
That is, Weeder only accepts motif length even values between 6~12.and for length 12 only accepts at most 4 mutations.
K-mer size 12, seed into 100 background seqs sets
7066
61 59
7770 69
55
70
6258
42
6865
6056
66
5245
39
0
10
20
30
40
50
60
70
80
90
400*20 600*20 800*20 1000*20
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
8579 77 7674
6865
57
81
7167
50
8580 79
7683
6863
49
0
10
20
30
40
50
60
70
80
90
400*20 600*20 800*20 1000*20
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
62
5652 50
61
52 50
39
60
5046
30
61
5552
48
58
4238
28
0
10
20
30
40
50
60
70
400*20 600*20 800*20 1000*20
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
7772
68 66
7569 67
56
75
6662
46
75 7268
64
73
5953
43
0
10
20
30
40
50
60
70
80
90
400*20 600*20 800*20 1000*20
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
Sn Sp
PC F-measure
12-mer, add noise70
67 64
77 757070 68 6768
60 6166
62
49
0
10
20
30
40
50
60
70
80
90
400*20 400*25 (add 25%noi se)
400*30 (add 50%noi se)
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
8580 79
7470 68
81 78 76
8579 79
8379
61
0
10
20
30
40
50
60
70
80
90
400*20 400*25 (add 25%noi se)
400*30 (add 50%noi se)
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspectorAl i gnACE
6257 55
6156
53
6057 56
61
52 5358
53
37
0
10
20
30
40
50
60
70
400*20 400*25 (add 25%noi se)
400*30 (add 50%noi se)
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
7773 71
75 72 6975 73 71
7568 68
7369
54
0
10
20
30
40
50
60
70
80
90
400*20 400*25 (add 25%noi se)
400*30 (add 50%noi se)
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
Sn Sp
PC F-measue
16-mer
• Synthetic background seqs: the dependencies of 3rd-order Markov were estimated from all intergenic seqs of the E. coli K12.
• Motifs containing 20 BSs with information contents of 16 bits( at most 8 positions are conserved) and the average SSD=0.02 between each BS and background were chosen from RegulonDB database.
• Seed motifs into 100 background seq sets.
• Test on 400*20, 600*20, 800*20, and 1000*20
16-mer
82 81 80 8088 85 87 85
77 76 75 7268 68 68 6767 64
5043
0
10
20
30
40
50
60
70
80
90
100
400*20 600*20 800*20 1000*20
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
88 85 837978 75 75 74
88 85 82 79
96 93 9086
94 91
78
66
0
10
20
30
40
50
60
70
80
90
100
400*20 600*20 800*20 1000*20
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
73 71 6966
7167 68 66
7067
6460
66 65 63 6164
60
44
36
0
10
20
30
40
50
60
70
80
400*20 600*20 800*20 1000*20
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
85 83 81 7983
80 81 7982 80 78 75
80 79 77 7578
75
61
52
0
10
20
30
40
50
60
70
80
90
400*20 600*20 800*20 1000*20
Mot i f Cl i cker( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
Sn Sp
PC F-measure
16-mer,add noise82 82 80
88 86 87
77 76 7668 67 6967 65 63
0
10
20
30
40
50
60
70
80
90
100
400*20 400*25 (add 25%noi se)
400*30 (add 50%noi se)
Mot i f Cl i cker ( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
88 85 8578 76 76
88 85 86
96 94 9294 92 91
0
10
20
30
40
50
60
70
80
90
100
400*20 400*25 (add 25%noi se)
400*30 (add 50%noi se)
Mot i f Cl i cker ( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
73 71 717168 6970 68 6866 65 6564 62
59
0
10
20
30
40
50
60
70
80
400*20 400*25 (add 25%noi se)
400*30 (add 50%noi se)
Mot i f Cl i cker ( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
85 83 8283 81 8182 80 8180 78 7978 76 74
0
10
20
30
40
50
60
70
80
90
100
400*20 400*25 (add 25%noi se)
400*30 (add 50%noi se)
Mot i f Cl i cker ( - s 1)Mot i f Cl i cker( - s 0)MEMEMot i f CutBi oProspector
Sn Sp
PC F-measure
Motif finding in Yeast (8-mer)Motif finding tools Top 1 Top 5 Top 10 Top 15 Top 20 Top25
MotifClick 67/585
7158
0.081
85/1200
24916
0.048
92/1638
41752
0.039
95/1923
55084
0.035
95/2107
65852
0.032
96/2222
74820
0.030
MEME 70/754
10107
0.074
85/1202
34010
0.035
87/1615
49958
0.032
92/1931
60805
0.031
95/2087
69709
0.030
95/2198
77405
0.028
MotifCut 65/474
7632
0.062
85/1189
28974
0.041
86/1641
47583
0.034
93/1893
61107
0.031
95/1983
67017
0.030
95/1998
67503
0.030
BioProspector 79/780
10049
0.078
84/1145
20418
0.056
86/1465
31935
0.046
89/1701
42305
0.040
92/1911
52296
0.037
92/2038
61564
0.033
Weeder 77/969
23417
0.041
88/1698
56440
0.030
92/2063
81374
0.025
94/2255
96046
0.023
94/2396
106872
0.022
96/2483
113346
0.022
*At least 3 orthologous genes for each intergenic sequence set. http://www.yeastgenome.org
Motif finding in 5137 intergenic sequence sets of orthologous genes, which contain 99 TFs, belonging to 2932 BSs in SGD.
Motif finding in Ecoli K12 (16-mer)Tools Top 1 Top 5 Top 10 Top 15 Top 20 Top 25
MotifClick331/8525750.129
793/10877060.103
1055/114110560.095
1186/114127790.093
1262/117135920.093
1296/117140260.092
MEME298/8333520.089
877/109142430.062
1134/115209990.054
1202/117239120.050
1233/117254120.049
1254/117262010.048
MotifCut241/7519420.124
487/8947630.102
544/9665520.083
640/10291450.070
744/107104080.071
836/108112120.074
BioProspector354/8549500.072
743/10376780.097
953/112100900.107
1056/112112870.094
1150/116123060.093
1181/116130410.091
MotifClick+MEME
474/98 1029/114 1259/118 1335/120 1357/120 1377/120
BioProspector+MEME
472/92 1051/115 1258/118 1312/119 1339/119 1367/119
Ecoli K12: 2313 operon groups, RegulonDB v6.0: 122 TFs, 1411 BSs.Weeder and Consensus are the worst because they need high-quality input seqs set.
Tools Top 1 Top 5 Top 10 Top 15 Top 20 Top 25
MotifClick 331/85 793/108 1055/114 1186/114 1262/1171296/117
MEME 298/83 877/109 1134/115 1202/117 1233/117 1254/117
MotifCut744/10710408
836/10811212
BioProspector 354/85 743/103 953/112 1056/112 1150/116 1181/116
CUBIC 242/75 563/98 791/108 905/109 999/111 1062/114
MDscan 355/82 552/96 634/99 684/102 758/107 793/109
MotifSampler 168/61 486/92 612/102 729/102 792/107 831/108
Weeder 179/65 350/85 452/92 494/94 532/94 552/94
Consensus 168/63 186/68 200/74 210/76 214/76 220/76
MotifClick+MEME
474/98 1029/114 1259/118 1335/120 1357/120 1377/120
BioProspector+MEME
472/92 1051/115 1258/118 1312/119 1339/119 1367/119
Conclusions
• Synthetic data:MotifCut has highest specificity. MotifClick have highest sensitivity. MotifClick has the most complements with other tools.
• Yeast data and Ecoli dataMotifClick and MEME have close numbers of true predictions and more true predictions than other tools.MotifClick has the most complements with other tools.