regular meeting february 26, 2009
DESCRIPTION
Regular Meeting February 26, 2009. Mark Borodovsky Ivan Antonov. Topics. What have been done Results for adjacent genes using bigger gap length Results for adjacent genes using RBS site threshold Future work. What have been done. A small bug in calculating gene statistics found - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/1.jpg)
Regular Meeting
February 26, 2009
Mark BorodovskyIvan Antonov
![Page 2: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/2.jpg)
GATech 2
Topics
1. What have been done
2. Results for adjacent genes using bigger gap length
3. Results for adjacent genes using RBS site threshold
4. Future work
![Page 3: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/3.jpg)
GATech 3
What have been done
1. A small bug in calculating gene statistics found
2. Bigger threshold on gap length in adjacent genes is used
3. RBS site score threshold is implemented
![Page 4: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/4.jpg)
Bug-free statistics
![Page 5: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/5.jpg)
GATech 5
Typical genes distribution (old)
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 27 15 357
Adjacent genes
167
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <60
114
Gap len >60
53
![Page 6: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/6.jpg)
GATech 6
Typical genes distribution (new)
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 34 26 339
Adjacent genes
149
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <60
114
Gap len >60
35
![Page 7: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/7.jpg)
Reducing number ofFalse Negatives
among adjacent genes
by increasing upper bound threshold on gap length
![Page 8: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/8.jpg)
Choosing upper bound threshold
GATech 8
0
5
10
15
20
25
30
35
0 -1
010
-20
20 -
3030
-40
40 -
5050
-60
60 -
7070
-80
80 -
9090
-10
010
0 -1
1011
0 -1
2012
0 -1
3013
0 -1
4014
0 -1
5015
0 -1
6016
0 -1
7017
0 -1
8018
0 -1
9019
0 -2
0020
0 -2
1021
0 -2
2022
0 -2
3023
0 -2
4024
0 -2
5025
0 -2
6026
0 -2
7027
0 -2
8028
0 -2
9029
0 -3
00>
300
Num
adj
acen
t ge
nes
Gap length
Gap lengths in all 149 FS adjacent genes
Old Threshol
d 60
New Threshol
d 16029 FS adjacent
genes more
![Page 9: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/9.jpg)
GATech 9
FS genes distribution
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 34 26 339
Adjacent genes
149
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <160
143
Gap len >160
6
![Page 10: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/10.jpg)
GATech 10
FSMark-GM prediction
GeneMark Output
Gene Overlap
s
Adjacent Genes
366 (190)
1238 (143)
256 (145)
418 (103)
FSMark applied
Numbers of FS genes are in
brackets
![Page 11: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/11.jpg)
Reducing number ofFalse Positives among
adjacent genes
by introducing threshold on maximum value of RBS site score
![Page 12: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/12.jpg)
GATech 12
Downstream gene RBS site score distribution
0
100
200
300
400
500
600
700
-2-1
.8-1
.6-1
.4-1
.2 -1-0
.8-0
.6-0
.4-0
.2 00.
20.
40.
60.
8 11.
21.
41.
61.
8 22.
22.
42.
62.
8 33.
23.
43.
63.
8 4
Freq
uenc
y
RBS site score
TP_sum
FP_sum
![Page 13: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/13.jpg)
GATech 13
Downstream gene RBS site score distribution
0
50
100
150
200
250
300
350
400
450
500
-2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Freq
uenc
y
RBS site score
TP_sum
FP_sum
![Page 14: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/14.jpg)
GATech 14
FS genes distribution
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 34 26 339
Adjacent genes
149
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <160
126
Gap len >16023
![Page 15: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/15.jpg)
GATech 15
FSMark-GM prediction
GeneMark Output
Gene Overlap
s
Adjacent Genes
176
FSMark applied
FPTP
190
111 145
501 126
131 92
![Page 16: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/16.jpg)
GATech 16
Today’s FSMark-GM performance
New approach
Ovlp AdjOthe
r Total Prev. Total
TP
Gap 160nt
145 103 0 248225
RBS score 145 92 0 237
FPGap 160 111 315 0 426
394RBS score 111 131 0 242
FN
Gap 160 45 40 67 152175
RBS score 45 34 84 163
![Page 17: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/17.jpg)
GATech 17
Conclusions
• Bigger gap threshold slightly increased number of True Positives in adjacent genes
• RBS site score threshold significantly decreased number of False positives in adjacent genes
![Page 18: Regular Meeting February 26, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814726550346895db4607a/html5/thumbnails/18.jpg)
GATech 18
Future work
• Try to understand why do we have so many genes with end missing
• Take closer look at FSMark results on adjacent genes
•What else?