Download - Regular Meeting February 26, 2009
![Page 1: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/1.jpg)
Regular Meeting
February 26, 2009
Mark BorodovskyIvan Antonov
![Page 2: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/2.jpg)
GATech 2
Topics
1. What have been done
2. Results for adjacent genes using bigger gap length
3. Results for adjacent genes using RBS site threshold
4. Future work
![Page 3: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/3.jpg)
GATech 3
What have been done
1. A small bug in calculating gene statistics found
2. Bigger threshold on gap length in adjacent genes is used
3. RBS site score threshold is implemented
![Page 4: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/4.jpg)
Bug-free statistics
![Page 5: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/5.jpg)
GATech 5
Typical genes distribution (old)
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 27 15 357
Adjacent genes
167
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <60
114
Gap len >60
53
![Page 6: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/6.jpg)
GATech 6
Typical genes distribution (new)
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 34 26 339
Adjacent genes
149
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <60
114
Gap len >60
35
![Page 7: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/7.jpg)
Reducing number ofFalse Negatives
among adjacent genes
by increasing upper bound threshold on gap length
![Page 8: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/8.jpg)
Choosing upper bound threshold
GATech 8
0
5
10
15
20
25
30
35
0 -1
010
-20
20 -
3030
-40
40 -
5050
-60
60 -
7070
-80
80 -
9090
-10
010
0 -1
1011
0 -1
2012
0 -1
3013
0 -1
4014
0 -1
5015
0 -1
6016
0 -1
7017
0 -1
8018
0 -1
9019
0 -2
0020
0 -2
1021
0 -2
2022
0 -2
3023
0 -2
4024
0 -2
5025
0 -2
6026
0 -2
7027
0 -2
8028
0 -2
9029
0 -3
00>
300
Num
adj
acen
t ge
nes
Gap length
Gap lengths in all 149 FS adjacent genes
Old Threshol
d 60
New Threshol
d 16029 FS adjacent
genes more
![Page 9: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/9.jpg)
GATech 9
FS genes distribution
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 34 26 339
Adjacent genes
149
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <160
143
Gap len >160
6
![Page 10: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/10.jpg)
GATech 10
FSMark-GM prediction
GeneMark Output
Gene Overlap
s
Adjacent Genes
366 (190)
1238 (143)
256 (145)
418 (103)
FSMark applied
Numbers of FS genes are in
brackets
![Page 11: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/11.jpg)
Reducing number ofFalse Positives among
adjacent genes
by introducing threshold on maximum value of RBS site score
![Page 12: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/12.jpg)
GATech 12
Downstream gene RBS site score distribution
0
100
200
300
400
500
600
700
-2-1
.8-1
.6-1
.4-1
.2 -1-0
.8-0
.6-0
.4-0
.2 00.
20.
40.
60.
8 11.
21.
41.
61.
8 22.
22.
42.
62.
8 33.
23.
43.
63.
8 4
Freq
uenc
y
RBS site score
TP_sum
FP_sum
![Page 13: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/13.jpg)
GATech 13
Downstream gene RBS site score distribution
0
50
100
150
200
250
300
350
400
450
500
-2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Freq
uenc
y
RBS site score
TP_sum
FP_sum
![Page 14: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/14.jpg)
GATech 14
FS genes distribution
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 34 26 339
Adjacent genes
149
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <160
126
Gap len >16023
![Page 15: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/15.jpg)
GATech 15
FSMark-GM prediction
GeneMark Output
Gene Overlap
s
Adjacent Genes
176
FSMark applied
FPTP
190
111 145
501 126
131 92
![Page 16: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/16.jpg)
GATech 16
Today’s FSMark-GM performance
New approach
Ovlp AdjOthe
r Total Prev. Total
TP
Gap 160nt
145 103 0 248225
RBS score 145 92 0 237
FPGap 160 111 315 0 426
394RBS score 111 131 0 242
FN
Gap 160 45 40 67 152175
RBS score 45 34 84 163
![Page 17: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/17.jpg)
GATech 17
Conclusions
• Bigger gap threshold slightly increased number of True Positives in adjacent genes
• RBS site score threshold significantly decreased number of False positives in adjacent genes
![Page 18: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/18.jpg)
GATech 18
Future work
• Try to understand why do we have so many genes with end missing
• Take closer look at FSMark results on adjacent genes
•What else?