building a rich ontology from agrovoc

35
1 Building a rich ontology from AGROVOC Dagobert Soergel College of Information Studies, University of Maryland [email protected] , www.dsoergel.com FAO Agricultural Ontology Server Workshop Beijing, April 27 - 29, 2004

Upload: aims-agricultural-information-management-standards

Post on 12-Aug-2015

331 views

Category:

Education


0 download

TRANSCRIPT

1

Building a rich ontology from AGROVOC

Dagobert SoergelCollege of Information Studies,

University of Maryland

[email protected], www.dsoergel.com

FAO Agricultural Ontology Server WorkshopBeijing, April 27 - 29, 2004

2

The problem

• AI and Semantic Web applications need full-fledged ontologies that support reasoning

• Constructing such ontologies is expensive

• While existing KOS do not provide the full set of precise concept relationships needed for reasoning,existing KOS, both large and small, represent much intellectual capital KOS = Knowledge Organization System

• How can this intellectual capital be put to use in constructing full-fledged ontologies

• Specifically: From AGROVOC to a full-fledged Food and Agriculture Ontology

3

Some applications of a Food and Agriculture Ontology

• Advice on crops and crop management (fertilization, irrigation)

• Advice on pest management

• Tracking contaminants through the food chain

• Advice on safe food processing

• Computing nutrition labels

• Advice on healthy eating

• Improved searching

4

AGROVOC relationships compared with more differentiated relationships

of a Food and Agriculture Ontology

5

AGROVOC Food and Agriculture Ontology

Undifferentiated hierarchical relationships

milk     NT cow milk     NT milk fat 

cows

     NT cow milk 

Cheddar cheese

     BT cow milk

Differentiated relationships  

milk     <includesSpecific> cow milk     <containsSubstance> milk fat

cows

     <hasComponent> cow milk*

Cheddar cheese

     <madeFrom> cow milk

Rule 1

Part X <mayContainSubstance> Substance Y

     IF Animal W <hasComponent> Part X     AND  Animal W <ingests> Substance Y

Rule 2

Food Z <containsSubstance> Substance Y

   IF Food Z <madeFrom> Part X   AND Part X <containsSubstance> Substance Y 

6

From AGROVOC to FA Ontology

1) Define the FA Ontology structure

2) Fill in values from AGROVOC to the extent possible

3) Edit manually with computer assistanceusing the rules-as-you go approach andan ontology editor:

• make existing information more precise

• add new information

7

Define ontology structureOverall model

8

Concept

Relationshipsbetweenconcepts

Lexicalization/Term

String

Relationshipsbetweenstrings

Relationshipsbetweenterms

designated by

manifested asOther information:language/culture

subvocabulary/scopeaudiencetype, etc.

Note

annotation relationship

Relationship

RelationshipsbetweenRelationships

9

Define ontology structureRelationship types

10

Isa

Relationship Inverse relationship

X  <includesSpecific>

X  <inheritsTo>  Y 

Y  <isa>  X

Y  <inheritsFrom>  X

11

Holonymy / meronymy (the generic whole-part relationship)

Relationship Inverse relationship

X  <containsSubstance>  Y 

X  <hasIngredient>  Y 

X  <madeFrom>  Y 

X  <yieldsPortion>  Y 

X  <spatiallyIncludes>  Y

X  <hasComponent>  Y

X  <includesSubprocess>  Y

X  <hasMember>  Y

Y  <substanceContainedIn>  X

Y  <ingredientOf>  X 

Y  <usedToMake>  X

Y  <portionOf>  X

Y  <spatiallyIncludedIn>  X

Y  <componentOf>  X

Y  <subprocessOf>  X

Y  <memberOf>  XY

12

Further relationship examples

Relationship Inverse relationship

X  <causes>  Y 

X  <instrumentFor>  Y 

X  <processFor>  Y 

X  <beneficialFor>  Y 

X  <treatmentFor>  Y

X  <harmfulFor>  Y

X  <hasPest>  Y

X  <growsIn>  Y

X  <hasProperty>  Y

X  <hasSymptom>  Y

X  <similarTo>  Y

X  <oppositeTo>  Y

X <hasPhase> Y

X  <ingests>  Y 

X <madeFrom> Y

Y  <causedBy>  X

Y  <performedByInstrument>  X 

Y  <usesProcess>  X

Y  <benefitsFrom>  X

Y  <treatedWith>  X

Y  <harmedBy>  X

Y  <afflicts>  X

Y  <growthEnvironmentFor>  X

Y  <propertyOf>  X

Y  <indicates>  X

Y  <similarTo>  X

Y  <oppositeTo>  X

Y <phaseOf>  X

Y  <ingestedBy>  X

Y <usedToMake> X

13

Fill in values from AGROVOC

• Fill in values from AGROVOC to the extent possible

• Arrange in structured sequence (to the extent possible based on the information in AGROVOC) to facilitate editing(The editor can deal with similar problems at the same time.)

14

Undifferentiated relationships from AGROVOC

Edited relationships

milk NT cow milk

milk NT goat milk

milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecows RT cow milk 

goats RT goat milk

ewes RT ewe milk

goat milk RT goat cheese

ewe milk RT ewe cheese

acid soils BT chemical soil types

acrisols BT genetic soil types

alkaline soils BT chemical soil types

aluvial soils BT lithological soil types

chemical soil types BT soil types

Cichorium BT Asteraceae

Cichorium endivia BT Cichorium

Cichorium intybus BT Cichorium

Cichorium intybus RT coffee substitutes

Cichorium intybus RT root vegetablesblood NT blood protein

blood NT blood lipids

15

Edit manually with computer assistance

• Use the rules-as-you-go approach andgood ontology editing software that handles large ontologies efficiently

• make existing information more precise

• add new information

Assumption:

Entity types of concepts are known from AGROVOC or other sources (Langual, UMLS, WordNet); for example

milk fat is a Substance

Asteraceae is a taxon

The editor may need to determine the entity type

16

The rules-as-you-go approachExploit patterns to automate the conversion process

Example

1.   An editor has determined that

milk NT cow milk should become milk <includesSpecific> cow milk

2. She recognizes that this is an example of the general pattern milk NT * milk milk <includesSpecific> * milk (where * is the wildcard character)

3. Given this pattern, the system can derive automatically

milk NT goat milk should become milk <includesSpecific> goat milk

Result:

17

Undifferentiated relationships from AGROVOC

Edited relationships

milk NT cow milk

milk NT goat milk

milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk 

goats RT goat milk

ewes RT ewe milk

goat milk RT goat cheese

ewe milk RT ewe cheese

acid soils BT chemical soil types

acrisols BT genetic soil types

alkaline soils BT chemical soil types

aluvial soils BT lithological soil types

chemical soil types BT soil types

Cichorium BT Asteraceae

Cichorium endivia BT Cichorium

Cichorium intybus BT Cichorium

Cichorium intybus RT coffee substitutes

Cichorium intybus RT root vegetablesblood NT blood protein

blood NT blood lipids

milk <includesSpecific> cow milk

milk <includesSpecific> goat milk

milk <includesSpecific> buffalo milk

18

The rules as you go approachExploit patterns to automate the conversion process

1.  Editor: milk NT milk fat milk <containsSubstance> milk fat

2. Pattern: Substance NT/RT Substance

Substance <containsSubstance> Substance

3. Thereforemilk RT milk protein milk <containsSubstance> milk protein

Result:

19

Undifferentiated relationships from AGROVOC

Edited relationships

milk NT cow milk

milk NT goat milk

milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecows RT cow milk 

goats RT goat milk

ewes RT ewe milk

goat milk RT goat cheese

ewe milk RT ewe cheese

acid soils BT chemical soil types

acrisols BT genetic soil types

alkaline soils BT chemical soil types

aluvial soils BT lithological soil types

chemical soil types BT soil types

Cichorium BT Asteraceae

Cichorium endivia BT Cichorium

Cichorium intybus BT Cichorium

Cichorium intybus RT coffee substitutes

Cichorium intybus RT root vegetablesblood NT blood protein

blood NT blood lipids

milk <includesSpecific> cow milk

milk <includesSpecific> goat milk

milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactose 

goat milk <containsSubstance> goat cheese

ewe milk <containsSubstance> ewe cheese

blood <containsSubstance> blood protein

blood <containsSubstance> blood lipids

20

The rules as you go approachExploit patterns to automate the conversion process

1.   Editor:

cows RT cow milk cows <hasComponent> cow milk

2. Pattern Animal RT BodyPart Animal <hasComponent> BodyPart

3. Therefore:

goats NT goat milk goat <hasComponent> goat milk

Result:

21

Undifferentiated relationships from AGROVOC

Edited relationships

milk NT cow milk

milk NT goat milk

milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk 

goats RT goat milk

ewes RT ewe milk

goat milk RT goat cheese

ewe milk RT ewe cheese

acid soils BT chemical soil types

acrisols BT genetic soil types

alkaline soils BT chemical soil types

aluvial soils BT lithological soil types

chemical soil types BT soil types

Cichorium BT Asteraceae

Cichorium endivia BT Cichorium

Cichorium intybus BT Cichorium

Cichorium intybus RT coffee substitutes

Cichorium intybus RT root vegetablesblood NT blood protein

blood NT blood lipids

milk <includesSpecific> cow milk

milk <includesSpecific> goat milk

milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactosecows <hasComponent> cow milk 

goats <hasComponent> goat milk

ewes <hasComponent> ewe milk

goat milk <containsSubstance> goat cheese

ewe milk <containsSubstance> ewe cheese

blood <containsSubstance> blood protein

blood <containsSubstance> blood lipids

22

The rules as you go approachExploit patterns to automate the conversion process

1.   Editor:

acid soils BT chemical soil types acid soils <isa> chemical soil types

2. Pattern: X BT * type* X <isa> * type*

3. Therefore:

acrisols BT genetic soil types acrisols <isa> genetic soil types

Result:

23

Undifferentiated relationships from AGROVOC

Edited relationships

milk NT cow milk

milk NT goat milk

milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk 

goats RT goat milk

ewes RT ewe milk

goat milk RT goat cheese

ewe milk RT ewe cheese

acid soils BT chemical soil types

acrisols BT genetic soil types

alkaline soils BT chemical soil types

aluvial soils BT lithological soil types

chemical soil types BT soil types

Cichorium BT Asteraceae

Cichorium endivia BT Cichorium

Cichorium intybus BT Cichorium

Cichorium intybus RT coffee substitutes

Cichorium intybus RT root vegetablesblood NT blood protein

blood NT blood lipids

milk <includesSpecific> cow milk

milk <includesSpecific> goat milk

milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactosecows <hasComponent> cow milk 

goats <hasComponent> goat milk

ewes <hasComponent> ewe milk

goat milk <containsSubstance> goat cheese

ewe milk <containsSubstance> ewe cheese

acid soils <isa> chemical soil types

acrisols <isa> genetic soil types

alkaline soils <isa> chemical soil types

aluvial soils <isa> lithological soil types

chemical soil type <isa> soil types

blood <containsSubstance> blood protein

blood <containsSubstance> blood lipids

24

The rules as you go approachExploit patterns to automate the conversion process

1.   Editor:Cichorium BT Asteraceae Cichorium <isa> Asteraceae

2. Pattern: Taxon BT Taxon Taxon <isa> Taxon

3. Therefore:

Cichorium endivia BT Cichorium Cichorium endivia <isa> Cichorium

Result:

25

Undifferentiated relationships from AGROVOC

Edited relationships

milk NT cow milk

milk NT goat milk

milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk 

goats RT goat milk

ewes RT ewe milk

goat milk RT goat cheese

ewe milk RT ewe cheese

acid soils BT chemical soil types

acrisols BT genetic soil types

alkaline soils BT chemical soil types

aluvial soils BT lithological soil types

chemical soil types BT soil types

Cichorium BT Asteraceae

Cichorium endivia BT Cichorium

Cichorium intybus BT Cichorium

Cichorium intybus RT coffee substitutes

Cichorium intybus RT root vegetablesblood NT blood protein

blood NT blood lipids

milk <includesSpecific> cow milk

milk <includesSpecific> goat milk

milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactosecows <hasComponent> cow milk 

goats <hasComponent> goat milk

ewes <hasComponent> ewe milk

goat milk <containsSubstance> goat cheese

ewe milk <containsSubstance> ewe cheese

acid soils <isa> chemical soil types

acrisols <isa> genetic soil types

alkaline soils <isa> chemical soil types

aluvial soils <isa> lithological soil types

chemical soil type <isa> soil types

Cichorium <isa> Asteraceae

Cichorium endivia <isa> Cichorium

Cichorium intybus <isa> Cichorium

blood <containsSubstance> blood protein

blood <containsSubstance> blood lipids

26

The rules as you go approachExploit patterns to automate the conversion process

1.   Editor:Cichorium intybus RT coffee substitutes Cichorium intybus <usedToMake> coffee substitutes

2. Pattern: Taxon RT FoodProduct Taxon <usedToMake> FoodProduct

3. Therefore:Cichorium intybus RT root vegetables

Cichorium intybus <usedToMake> root vegetables

Result:

27

Undifferentiated relationships from AGROVOC

Edited relationships

milk NT cow milk

milk NT goat milk

milk NT buffalo milkmilk NT milk fatmilk RT milk proteinmilk RT lactosecow RT cow milk 

goats RT goat milk

ewes RT ewe milk

goat milk RT goat cheese

ewe milk RT ewe cheese

acid soils BT chemical soil types

acrisols BT genetic soil types

alkaline soils BT chemical soil types

aluvial soils BT lithological soil types

chemical soil types BT soil types

Cichorium BT Asteraceae

Cichorium endivia BT Cichorium

Cichorium intybus BT Cichorium

Cichorium intybus RT coffee substitutes

Cichorium intybus RT root vegetablesblood NT blood protein

blood NT blood lipids

milk <includesSpecific> cow milk

milk <includesSpecific> goat milk

milk <includesSpecific> buffalo milkmilk <containsSubstance> milk fatmilk <containsSubstance> milk proteinmilk <containsSubstance> lactosecows <hasComponent> cow milk 

goats <hasComponent> goat milk

ewes <hasComponent> ewe milk

goat milk <containsSubstance> goat cheese

ewe milk <containsSubstance> ewe cheese

acid soils <isa> chemical soil types

acrisols <isa> genetic soil types

alkaline soils <isa> chemical soil types

aluvial soils <isa> lithological soil types

chemical soil type <isa> soil types

Cichorium <isa> Asteraceae

Cichorium endivia <isa> Cichorium

Cichorium intybus <isa> Cichorium

Cichorium intybus <usedToMake> coffee substitutes

Cichorium intybus <usedToMake> root vegetablesblood <containsSubstance> blood protein

blood <containsSubstance> blood lipids

28

The rules as you go approachDiscussion

Main idea: Formulate constraints to assist the editor

• Ontology may have many relationship types, perhaps > 100

• Constraints limit the relationship types that are possible in a specific case; show the editor only these

• If the constraints limit possible relationship types to 1, conversion is automatic

• Constraints may depend on Thesaurus to be converted

29

Constraints

Thesaurus Relationships

Possible ontology relationships

NT / BT <hasMember>  |  <memberOf>

<includesSpecific> |  <isa>

<hasComponent> |  <componentOf>

<spatiallyIncludes> |  <spatiallyIncludedIn>

etc.

RT <similarTo> | <similarTo>

<growsIn>  |  <EnvironmentForGrowing>

<treatmentFor> |  <treatedWith> 

<hasMember> |  <memberOf>

<hasComponent> |  <componentOf>

<madeFrom> | <usedToMake>

etc.

30

Constraints

Thesaurus Relationships

+ entity types or values

Possible ontology relationships

milk NT * milk

Substance NT Substance

X BT * type*

Taxon BT Taxon

GeogrEntity  BT GeogrEntity

BodyPart BT BodyPart

ChemSubstance BT ChemSubstance

milk <includesSpecific> * milk

Substance <containsSubstance> Substance

X <isa> * type*

Taxon <isa> Taxon

GeogrEntity  <spatiallyIncludedIn>  GeogrEntity

BodyPart <isComponentOf> BodyPart

ChemSubstance <isa> ChemSubstance

31

Constraints

Thesaurus Relationships

+ entity types or values

Possible ontology relationships

Substance RT Substance

LivingOrganism RT BodyPart

Taxon RT FoodProduct

GeogrEntity  RT GeogrGrouping

Process RT Object

ChemSubstance RT Function

Substance <containsSubstance> SubstanceSubstance <containedInSubstance> SubstanceSubstance <usedToMake> SubstanceSubstance <madeFrom> Substance

LivingOrganism <hasComponent> BodyPart

Taxon <usedToMake> FoodProduct

GeogrEntity  <isMemberOf>  GeogrGrouping

Process <performedByInstrument> ObjectProcess <affects> Object

ChemSubstance <usedFor> Function

32

Checking by editor

• Relationship instances created by editor by selecting from a constraint-generated menuare final

• Relationship instances created automatically must be presented to the editor

• If the editor determines that the relationship instances are almost always correct, she checks a box accept without checking

33

Overall conversion process

• One master editor must go through the file from start to finish,processing the relationship instances and creating patterns,creating new relationship types as needed

• Assistant editors can apply the patterns.

• In the first pass, the master editor should deal with the easy cases.

• Deal with the remaining cases later.Groups of similar relationship instances can be seen more easily in a smaller set

34

Adding new relationship types and new relationship instances

• AGROVOC does not contain all relationship types or relationship instances for AI applications

• Need to add data. For exampleOrganism X  <hasPest> Organism Y

ChemSubstance X <actsAgainst> Organism Y

Organism X <actsAgainst> Organism Y

Plant X  <growsIn>  Environment Y

FoodProduct X <suitableFor> Diet Y

35

Conclusion

The rules-as-you-go approach is a realistic method for developing a rich ontology from an existing thesaurus