functional annotation of genes using hierarchical text categorization svetlana kiritchenko, stan...

13
Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili National Research Council of Canada

Upload: kerry-mcdonald

Post on 29-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Functional Annotation of Genes Using Hierarchical

Text Categorization

Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada

andA. Fazel Famili

National Research Council of Canada

Page 2: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Functional Annotation of Genes from Biomedical

Literature

Page 3: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Previous Research

• Raychaudhuri et al. (2002)

• BioCreative workshop (2004)

• No hierarchical information has been used

Page 4: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Advantages of Hierarchical Approach

• Additional, potentially valuable information– Relationships between categories

• Flexibility– High levels: general topics– Low levels: more detail

• Hierarchical evaluation– Give credit to partially correct classification

Page 5: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Hierarchical consistency

• if (dj, ci) True,

then (dj, Ancestor(ci)) True

c1

c7c6c5c4

c3c2

c1

c7c6c5c4

c3c2

consistent inconsistent

Page 6: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Hierarchical Local Approach

c1

c7c6c5c4

c3c2

c8 c9

Page 7: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Hierarchical Local Approach

c1

c7c6c5c4

c3c2

c8 c9

Page 8: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Hierarchical Local Approach

c1

c7c6c5c4

c3c2

c8 c9

Page 9: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Hierarchical Local Approach

c1

c7c6c5c4

c3c2

c8 c9

Page 10: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Hierarchical Local Approach

c1

c7c6c5c4

c3c2

c8 c9

consistent classification

Page 11: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

New Global Hierarchical Approach

• Make a dataset consistent with a class hierarchy– add ancestor category labels

• Apply a regular learning algorithm– AdaBoost

• Make prediction results consistent with a class hierarchy– for inconsistent labeling make a consistent decision

based on confidences of all ancestor classes

Page 12: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

New Hierarchical Evaluation Measure

• Precision/Recall considering all ancestors of a correct (predicted) category

• Simple, straight-forward to calculate• Based solely on a given hierarchy (no parameters to

tune)• Gives credit to partially correct classification• Discriminates by distance and depth• Allows to trade off between classification precision

and classification depth

Page 13: Functional Annotation of Genes Using Hierarchical Text Categorization Svetlana Kiritchenko, Stan Matwin University of Ottawa, Canada and A. Fazel Famili

Results

dataset level branching Flat Hier. Local Hier. Global

biol. process 12 5.41 15.06 59.27 59.31

mol. function 10 10.29 8.78 43.36 38.17

cell. component 8 6.45 44.18 72.07 73.35