extraction and reflection: early evolution of the hymenoptera anatomy ontology

20

Upload: katja-c-seltmann

Post on 13-Jun-2015

1.444 views

Category:

Education


1 download

DESCRIPTION

10 min presentation from the 2009 Entomological Society of America Meeting describing the extraction of classes and labels from OCRed journal articles found on the Biodiversity Heritage Library and the building of the Hymenoptera Anatomy Ontology (http://hymao.org)

TRANSCRIPT

Page 1: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology
Page 2: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology
Page 3: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

Formal representation of concepts within a domain and the relationships between those concepts.

ontology ≠ ontogeny

Ontology

domain = hymenopteraconcept = class (an real anatomical thing) label = words used to represent anatomical things          ****a class can contain many labels***

Page 4: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

male genitalia phallus parameres copulatoria genital capsule genital armaturephallic apparatus genital apparatusarmatura genitalis genital appendagecopulatory apparatusexternal genital organmale copulatory organ

www.hymao.org

a class can have many labels

Page 5: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

www.hymao.org

this real thing is a class

male genitalia phallus parameres copulatoria genital capsule genital armaturephallic apparatus genital apparatusarmatura genitalis genital appendagecopulatory apparatusexternal genital organmale copulatory organ

a class can have many labels

Page 6: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

male genitalia  phallus parameres copulatoria genital capsule genital armaturephallic apparatus genital apparatusarmatura genitalis genital appendagecopulatory apparatusexternal genital organmale copulatory organ

www.hymao.org

a class can have many labelsthis real thing is a class

these are all labels

Page 7: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

[Term]id: HAO:0000312name: external male genitaliadef: "The compound organ that is involved in coupling with the female genitalia and with the intromission of spermatozoa and seminal fluid." [HAO:im]synonym: "armatura genitalis" []synonym: "copulatoria" []synonym: "copulatory apparatus" []synonym: "genital apparatus" []synonym: "genital appendage" []synonym: "genital armature" []synonym: "genital capsule" []synonym: "genital organ" []synonym: "male copulatory organ" []synonym: "parameres" []synonym: "phallic apparatus" []synonym: "phallus" []relationship: part_of HAO:0000505 ! male genitaliais_a: HAO:0000024 ! compound organ

[Term]id: HAO:0000313name: external parameradef: "The anatomical cluster that is composed of the gonostipites and volsellae." [HAO:im]relationship: part_of HAO:0000312 ! external male genitaliais_a: HAO:0000041 ! anatomical cluster 

OpenBiomedicalOntologies(OBO)

Page 8: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

Nasonia spp. >50 mutants

implications: vehicle to integrate research

Page 9: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

process

http://purl.oclc.org/NET/mx-database

MX

Page 10: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

process mx software(development)

community feedback(Hymenoptera, ontology & computer science)

exposure

Page 11: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

2,682 labels

1,382 classes  865 references 3058 times referenced 1201 from just a few papers 

Page 12: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

Term extraction from BHL353 JHR

Page 13: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

number of new labels added to database: 347  number of papers: 23--------------------------------------------------------------137 PATO labels "PATO is an ontology of phenotypic qualities, intended for use in a number of applications, primarily defining composite phenotypes and phenotype annotation"(i.e. NOT ending up in the HAO) before: 464now: 601only 55 of these presently in PATO -----------------------------------------------------------210 new labels for the HAO(Do not know how many will result in new classes)

----------------------------In the 23 papers…..• 1145 distinct labels• 8802 occurrences of those terms• 4469 < 50 times• 4333 > 50 times

 

results

Page 14: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

results

highest number > 50

vein (465)

wing (282)

cell (219) wing vein (114)

Page 15: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

results

carina (187)

tergum (183)tergites (128)

propodeum (83)

sternum (65)

highest number > 50

Page 16: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

results

smooth (94)

small (87)

short (85)

highest number > 50

Page 17: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

resultsglossa (72)

highest number > 50

Page 18: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

results

cell (219) area (116) base (84)

apex (76)body (71)

highest number > 50

Page 19: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

Future

add the rest of the 353 articles

define new classes

define PATO terms 

reevaluate workflow (create better proofing tools)work with Biodiversity Heritage Library (link to label descriptions?)

capture complex information in MX and assemble information using ontology (HAO, PATO, etc..)

Page 20: Extraction and reflection: early evolution of the Hymenoptera Anatomy Ontology

funding: Advances in Biological Informatics (NSF DBI-0850223) NESCent (NSF EF-0423641)  Morphbank (NSF DBI-0446224) HymAToL (NSF EF-0337220) PEET: Monographic research on parasitic Hymenoptera (NSF DEB-0328922)  

   

intellect and enthusiasm: Fredrik Ronquist (NRM) Jim Balhoff, Hilmar Lapp, Todd Vision, Wasila Dahdul (NESCent) Rod Page, Biodiveristy Heritage Library, Rick Prelinger Paula Mabee (USD) Anne Maglia (MUS & T) International Society of Hymenopterists All the contributors

Acknowledgments