automating generation of textual class definitions from owl to english robert stevens, james malone,...

19
Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone , Sandra Williams, Richard Power

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

Robert Stevens, James Malone, Sandra Williams, Richard Power

Page 2: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.232

Summary

• Motivation• Use Case• Methods and Description Generator• Results• Evaluation• Open Questions (still)

Page 3: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.233

Motivation

• Textual definitions are• cornerstone of good practice in ontology delivery• a requirement of the OBO process• hard work to produce

• Logical definitions • make meaning explicit to the computer• help maintenance of the ontology’s structure, querying, and so on• are also hard to produce but also more difficult to understand

• The information in one form should reflect the information in the other• Need to keep textual and logical definitions synchronised • Aim to produce fluent textual definitions from logical

definitions/description in OWL

Page 4: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.234

OWL Smackdown: Computer vs Human

Page 5: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.235

Our Hypotheses

• Text = humans• Logical = computers (and future human-computer hybrids)• Textual definitions ≈ Logical definition • Textual definitions tend to be more lossy than logical

(cardinalities are often dropped, specific roles not mentioned, etc.)

• Logical definitions are often more explicit than natural language and therefore should contain sufficient content to produce a textual definition.

Page 6: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.236

EFO Use Casewww.ebi.ac.uk/efo

• Experimental Factor Ontology (EFO) is an application ontology which consumes domain ontologies to satisfy specific application focused use cases

• Primarily Gene Expression data from ArrayExpress @ EBI

Page 7: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.237

EFO @ Gene Expression Atlaswww.ebi.ac.uk/gxa

Page 8: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Related Work

• Generating descriptions from ontologies often called ‘ontology verbalisation’

• A number concerned only with ABox verbalisation (Hielkema 2009; Galanis and Androutsopoulos, 2007)

• Others produce only separate sentences, one for each OWL axiom (Kalijurand, 2007)

• Our approach has much in common but differs in;• only a subset of OWL is considered (the simple description logic

EL++) • instead of realising axioms in isolation we apply some rules for

organisation and aggregation to give more natural feel

Automating Generation of Textual Class Definitions from OWL to English

18.04.238

Page 9: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.239

Method Overview• An OWL ontology is just a “pile of axioms”• We can produce individual sentences based on a grammar that

guides transformation from OWL to English (or other natural language)

• Need to group sentences (group axioms with the same subject together)

• Need to aggregate axioms (collapse axioms with the same relationship together)

• Once grouped and aggregated, a paragraph of text can be produced sentence by sentence.

hasPart some leghasPart some bodyhasPart some head

Has parts leg, body and head

Page 10: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.2310

Processing stages

• Transcode OWL/XML to Prolog• Construct a lexicon for atomic entities – (next slide)• Group axioms by atomic entity• Aggregate axioms with similar structure• Generate sentences from aggregated axioms.

class(animal).subClassOf(class(cat), class(animal).subClassOf(class(dog), class(animal). => class(animal). subClassOf([class(cat), class(dog)], class(animal)). =>ANIMAL.A cat and a dog are both kinds of animals.

Page 11: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.2311

Description Generator• Input: OWL/XML ontology• Output: Text describing atomic entities• generation from label/URL• It is assumed that the syntax of each phrase will be severely

constrained as follows: • individuals are expressed by proper names• classes by common nouns (with singular and plural forms)• properties by transitive verbs (simple or compound) with slots for

a subject and an object.

ANIMAL.The following are kinds of animals: a cat, a duck, a giraffe, a person, a sheep, and a tiger.An animal eats a thing.If X has as pet Y then necessarily Y is an animal.

Page 12: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.2312

ResultsClass label OWL axioms (Manchester syntax) Natural Language Definition Extracted

22rv1 bearer_of some 'prostate carcinoma' derives_from some 'Homo sapiens' derives_from some prostate

A 22rv1 is a cell line. A 22rv1 is all of the following: something that is bearer of a prostate carcinoma, something that derives from a homo sapiens, and something that derives from a prostate.

HeLa bearer_of some 'cervical carcinoma' derives_from some 'Homo sapiens' derives_from some cervix derives_from some 'epithelial cell'

A he la is a cell line. A he la is all of the following: something that is bearer of a cervical carcinoma, something that derives from a homo sapiens, something that derives from an epithelial cell, and something that derives from a cervix.

Ara-C-resistant murine leukemia

has subclass b117h* has subclass b140h*

A ara c resistant murine leukemia is a cell line. A b117h, and a b140h are kinds of ara c resistant murine leukemias.

GM18507 derives_from some 'Homo sapiens' derives_from some lymphoblast has_quality some male

A gm18507 is all of the following: something that has as quality a male, something that derives from a homo sapiens, and something that derives from a lymphoblast.

*axioms placed on subclasses

Page 13: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.2313

Results• Online survey of ontology users at EBI• 10 of the 50 verbalisations were evaluated based on widest range of

axioms

0

10

20

30

40

50

60

1(meaningis clear)

2 3 4 5(meaningnot clear)

Total

Judgement

Page 14: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.2314

Findings• Finding of dodgy class;

• definition for Ara-C-resistant murine leukemia indicated subclasses b117h and b140h types of this, implying that they were diseases rather than cell lines

• Desire amongst this user group for simplicity of language – avoid ontological formality• e.g. bearer of

• Especially property names for qualities• e.g. has as quality male

• Initial verbalisation making semantics clear was not liked• Plural forms occasionally issue:

lex(class(EFO_0000322),noun, ‘cell line’, ‘cell lines’).lex(class(EFO_0002095),noun, ‘22rv1’,’22rv1s’).

Page 15: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.2315

Conclusion

• Initial results were largely well received and considered useful in most cases

• Discovery of incorrect class definition demonstrates potential as tool for class validation

• Preference for text definitions was for ‘clear and simple’ over ‘precise and complex’

• Dependent entities could become adjectival forms of the independent entities in which they inhere (cell has quality female becomes female cell)

• Formal relations/class labels reduce understanding and should be brought closer to domain language

• Many ontologies are not amenable to text mining – this is an important use case neglected by most

• Definitions now being imported into EFO

Page 16: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.2316

Next Steps

• Systematic study of acceptable wordings• Different wording styles for different users• Adjectival forms for qualities etc; the role of a upper level

ontology• Moving beyond EL++• Parsing for OBO

Page 17: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Next Steps: Round Tripping

Automating Generation of Textual Class Definitions from OWL to English

18.04.2317

Page 18: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Open Questions

• Should textual descriptions ≡ logical descriptions?• Are discrepencies acceptable?

Automating Generation of Textual Class Definitions from OWL to English

18.04.2318

Page 19: Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power

Automating Generation of Textual Class Definitions from OWL to English

18.04.2319

Acknowledgements

• Sandra Williams, Richard Power and Robert Stevens are funded by the SWAT project (EPSRC grants EP/G033579/1 and EP/G032459/1);

• James Malone is funded by EMBL and EMERALD (project number LSHG-CT-2006-037686).

• We would like to thank the members of the EBI’s ontology interest group, functional genomics group and Dr Helen Parkinson for comments and survey participation