Transcript
Page 1: Regular Meeting February 26, 2009

Regular Meeting

February 26, 2009

Mark BorodovskyIvan Antonov

Page 2: Regular Meeting February 26, 2009

GATech 2

Topics

1. What have been done

2. Results for adjacent genes using bigger gap length

3. Results for adjacent genes using RBS site threshold

4. Future work

Page 3: Regular Meeting February 26, 2009

GATech 3

What have been done

1. A small bug in calculating gene statistics found

2. Bigger threshold on gap length in adjacent genes is used

3. RBS site score threshold is implemented

Page 4: Regular Meeting February 26, 2009

Bug-free statistics

Page 5: Regular Meeting February 26, 2009

GATech 5

Typical genes distribution (old)

400 typical genes with

FS

End/start

missing

End missin

g

Start missin

g

End/start

present1 27 15 357

Adjacent genes

167

Gene overla

p190

Green squares – where “All

others” principle was used

Gap len <60

114

Gap len >60

53

Page 6: Regular Meeting February 26, 2009

GATech 6

Typical genes distribution (new)

400 typical genes with

FS

End/start

missing

End missin

g

Start missin

g

End/start

present1 34 26 339

Adjacent genes

149

Gene overla

p190

Green squares – where “All

others” principle was used

Gap len <60

114

Gap len >60

35

Page 7: Regular Meeting February 26, 2009

Reducing number ofFalse Negatives

among adjacent genes

by increasing upper bound threshold on gap length

Page 8: Regular Meeting February 26, 2009

Choosing upper bound threshold

GATech 8

0

5

10

15

20

25

30

35

0 -1

010

-20

20 -

3030

-40

40 -

5050

-60

60 -

7070

-80

80 -

9090

-10

010

0 -1

1011

0 -1

2012

0 -1

3013

0 -1

4014

0 -1

5015

0 -1

6016

0 -1

7017

0 -1

8018

0 -1

9019

0 -2

0020

0 -2

1021

0 -2

2022

0 -2

3023

0 -2

4024

0 -2

5025

0 -2

6026

0 -2

7027

0 -2

8028

0 -2

9029

0 -3

00>

300

Num

adj

acen

t ge

nes

Gap length

Gap lengths in all 149 FS adjacent genes

Old Threshol

d 60

New Threshol

d 16029 FS adjacent

genes more

Page 9: Regular Meeting February 26, 2009

GATech 9

FS genes distribution

400 typical genes with

FS

End/start

missing

End missin

g

Start missin

g

End/start

present1 34 26 339

Adjacent genes

149

Gene overla

p190

Green squares – where “All

others” principle was used

Gap len <160

143

Gap len >160

6

Page 10: Regular Meeting February 26, 2009

GATech 10

FSMark-GM prediction

GeneMark Output

Gene Overlap

s

Adjacent Genes

366 (190)

1238 (143)

256 (145)

418 (103)

FSMark applied

Numbers of FS genes are in

brackets

Page 11: Regular Meeting February 26, 2009

Reducing number ofFalse Positives among

adjacent genes

by introducing threshold on maximum value of RBS site score

Page 12: Regular Meeting February 26, 2009

GATech 12

Downstream gene RBS site score distribution

0

100

200

300

400

500

600

700

-2-1

.8-1

.6-1

.4-1

.2 -1-0

.8-0

.6-0

.4-0

.2 00.

20.

40.

60.

8 11.

21.

41.

61.

8 22.

22.

42.

62.

8 33.

23.

43.

63.

8 4

Freq

uenc

y

RBS site score

TP_sum

FP_sum

Page 13: Regular Meeting February 26, 2009

GATech 13

Downstream gene RBS site score distribution

0

50

100

150

200

250

300

350

400

450

500

-2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Freq

uenc

y

RBS site score

TP_sum

FP_sum

Page 14: Regular Meeting February 26, 2009

GATech 14

FS genes distribution

400 typical genes with

FS

End/start

missing

End missin

g

Start missin

g

End/start

present1 34 26 339

Adjacent genes

149

Gene overla

p190

Green squares – where “All

others” principle was used

Gap len <160

126

Gap len >16023

Page 15: Regular Meeting February 26, 2009

GATech 15

FSMark-GM prediction

GeneMark Output

Gene Overlap

s

Adjacent Genes

176

FSMark applied

FPTP

190

111 145

501 126

131 92

Page 16: Regular Meeting February 26, 2009

GATech 16

Today’s FSMark-GM performance

New approach

Ovlp AdjOthe

r Total Prev. Total

TP

Gap 160nt

145 103 0 248225

RBS score 145 92 0 237

FPGap 160 111 315 0 426

394RBS score 111 131 0 242

FN

Gap 160 45 40 67 152175

RBS score 45 34 84 163

Page 17: Regular Meeting February 26, 2009

GATech 17

Conclusions

• Bigger gap threshold slightly increased number of True Positives in adjacent genes

• RBS site score threshold significantly decreased number of False positives in adjacent genes

Page 18: Regular Meeting February 26, 2009

GATech 18

Future work

• Try to understand why do we have so many genes with end missing

• Take closer look at FSMark results on adjacent genes

•What else?


Top Related