kyoto (ict-211423) yielding ontologies for transition-based organization fp7: intelligent content...
TRANSCRIPT
KYOTO (ICT-211423)Yielding Ontologies for Transition-Based OrganizationFP7: Intelligent Content and Semantics
http://www.kyoto-project.eu/
Piek VossenTienjarig jubileum NL-TERM,October 2008, Amsterdam
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
KYOTO (ICT-211423) Overview • Title: Yielding Ontologies for Transition-Based Organization
• Funded: – 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics
– Taiwan and Japan funded by national grants • Goal:
– Platform for knowledge sharing across languages and cultures– Enables knowledge transition and information search across different target groups,
transgressing linguistic, cultural and geographic boundaries.– Open text mining and deep semantic search– Wiki environment that allows people in the field to maintain their knowledge and agree
on meaning without knowledge engineering skills• URL: http://www.kyoto-project.eu/• Duration:
– March 2008 – March 2011• Effort:
– 364 person months of work.
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Consortium
1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2. Consiglio Nazionale delle Ricerche (Pisa, Italy), 3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,
Germany), 4. Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5. Academia Sinica (Tapei, Taiwan), 6. National Institute of Information and Communications Technology
(Kyoto, Japan), 7. Irion Technologies (Delft, The Netherlands), 8. Synthema (Rome, Italy), 9. European Centre for Nature Conservation (Tilburg, The
Netherlands), • Subcontractors:
– World Wide Fund for Nature (Zeist, The Netherlands), – Masaryk University (Brno, Czech)
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
KYOTO (ICT-211423) Overview • Languages:
– English, Dutch, Italian, Spanish, Basque, Chinese, Japanese • Domain:
– Environmental domain, BUT usable in any domain • Global:
– Both European and non-European languages• Available:
– Free: as open source system and data (GPL)• Future perspective:
– Content standardization that supports world wide communication– Global Wordnet Grid -> database that interlinks all wordnets
in the world to a shared ontology of meaning
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
zieke, patiënt
chronisch zieke ; langdurig zieke psychisch/geestelijk zieke
HYPONYM
arts, dokter
ziekte, stoornis
genezenρ-PATIENT
behandelenρ-PATIENTSTATE
maagaandoening, nieraandoening, keelpijn
HYPONYM
ρ-CAUSE
ρ-AGENT
ρ-PROCEDURE ρ-LOCATION
fysiotherapiemedicijnenetc.
ziekenhuis, etc.
kind
co-ρ-AGENT-PATIENT
kinderarts
HYPONYM
Wordnet = network of semantic relations between words in a language
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Images
Index
Docs
URLs
Experts
Search
Dialogue
CO2 emission
water pollution
Capture
CitizensGovernorsCompanies
Domain
DomainWikyoto
Wordnets
Abstract PhysicalTop
Middlewater CO2
Substance
Universal Ontology
Process
Environmental organizations
Environmental organizations
Global Wordnet Grid
Kybots
FactMining
Tybots
ConceptMining Sudden increase
of CO2 emissionsin 2008 in Europe
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
qualifies
qualifies
Lexicon versus Ontology
Abstract Physical
H20 CO2
Element
Ontology
Process
PhysicalChange
Organism
Ecosystem services-Nature as a resource-Nature for waste absorption-State of nature-Threats to nature
rural products
sustainable products
green roof
alien invasive species
species migration
ecosystem-based drinking water production
Artifacts
green house gas
SpiderRoof
typetype
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Concepts & Facts
• Conceptual knowledge: general & generic knowledge about – ClimateChange
• physical change • affecting the climate => definition of climate• in a region• during a period of time• caused by another change• causing yet other changes
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Concepts & Facts
• Fact:– A case of ClimateChange has been observed:
• factual and significant change in the climate (temperature, humidity, wind direction, rain fall, etc.)
• in a particular region, e.g. the Alps.• Time period• Caused by CO2 emissions, North Atlantic gulf
stream• Causes decrease of biodiversity measured in
specific populations: fish, birds, insects => counts of populations
ICT-211423
System architecture
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
System components
• Wikyoto = wiki environment for a social group:– to model the terms and concepts of a domain and agree on their
meaning, within a group, across languages and cultures– to define the types of knowledge and facts of interest
• Tybots = Term extracting robots, extract term data from text corpus
• Kybots = Knowledge yielding robots, extract facts from a text corpus
• Linguistic processors:– tokenizers, segmentizers, taggers, grammars – named entity recognition– word sense disambiguation– generate a layered text annotation in Kyoto Annotation Format
(KAF)
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Capture ServerCapture Server
Document BaseLinear KAF
Document BaseLinear KAF
Tybot server(Term Extraction)
Tybot server(Term Extraction)
Extracted TermsGeneric K-TMF
Extracted TermsGeneric K-TMF
Term Editor(Wikyoto)
Term Editor(Wikyoto)
Domain OntologyOWL_DL
Domain OntologyOWL_DL
Domain WordnetK-LMF
Domain WordnetK-LMF
Kybot Server(Fact Extraction)
Kybot Server(Fact Extraction)
SemanticAnnotationSemantic
Annotation
Document BaseLinear Generic KAF
Document BaseLinear Generic KAF
Document BaseLinear KAF
Document BaseLinear KAF
Kybot EditorKybot Editor
KybotProfilesKybot
ProfilesConcept User
Fact User
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
What Tybots do...
• Input are text documents– “Green house gases, such as CO2”– “CO2 and other green house gases”
• Linguistic processors generate KAF annotation (sequential):– morpho-syntactic analysis– semantic roles– named entities– wordnet and ontology mappings
• Output are term hierarchies in TMF (generic):– structural parent relations: “CO2 is a green house gas is a gas”– quantified structural and semantic relations– statistical data– generalized semantic mappings
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Generic algorithm • Extraction of a structural term hierarchy
• Advantage: conceptual coherence
• Steps:– extraction of potential terms using the morpho-
syntactic structure– statistical selection of salient terms– conceptual selection of dominant terms– contextual selection of terms
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Terms from morpho-syntactic structure
• Words that are the syntactic head of an NP, e.g.: card, wing-player
• Word combinations (excluding determiners and adverbs) that include the syntactic head, e.g.: yellow card, yellow card for wing-player.
• The head of a compound: player as the head of wing-player, name as the head of username.
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Statistical extraction of terms
• Frequency of terms by distribution over reference corpus:– Salience = normFreq * normRef
• Where normFreq = normalized frequency of terms on the website and normRef = normalized count of website occurrence in the reference corpus:– normFreq = nTermFrequencynWords / nPages– normRef = 1-((nWebsitesnWords) /
(referenceCorpusSize))
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Statistical extraction of termsExcluded for Salience Selected for Salience
Preferred termSitefreq.
Nr. ofpages Salience Preferred term
Sitefreq.
Nr. ofpages
Salience
ressource humaine 1 1 0.014 Oxy/Conductimètre portable 2 2 1.0
sécurité 1 1 0.024 Multiparamétre portable 2 2 1.0
sites 1 1 0.0242 convection naturelle 4 2 1.0
mobilité 1 1 0.0243 Conductimètre portable 4 2 1.0
qualité produits 1 1 0.0265 Universelle convection naturelle 2 2 0.9989
produits 1 1 0.0277 Pipette graduée 4 2 0.9989
satisfaction client 1 1 0.029 Pipette 14 2 0.9989
contact 1 1 0.0294 Photomètre 4 2 0.9989
place 1 1 0.0304 Perce-bouchon 4 2 0.9989
formation professionnelle 1 1 0.0304 Nettoyants autolaveurs 2 2 0.9989
ligne 1 1 0.0308 Mini-UniPrep 12 2 0.9989
gestion 1 1 0.0315 Microscope 6 2 0.9989
conception 1 1 0.032 Micro-pipettes capillaire 2 2 0.9989
groupes 1 1 0.0323 Micropipettes 4 2 0.9989
démarche qualité 1 1 0.0323 Loupes binoculaire 2 2 0.9989
environnement 1 1 0.0324 l'enseignement primaire 3 3 0.9989
moyens 1 1 0.0331 Incubateurs réfrigérés 2 2 0.9989
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Structure to relation table for termsTerm phrase Structure Role Type
populations of terrestrial species of Part Species
populations of vertebrate species: of Part Species
populations of 1313 vertebrate species fish, amphibians, reptiles, birds, mammals from all around the world
of Part Species
the restoration of wild species populations andtheir habitats
of Patient Restore
The increase in the footprint is driven by modest rates of growth in both population and demand for biocapacity
in Patient Increase
at half the rate of population increase of Speed Increase
the relative proportion of current biocapacity or world population in each region
in Location Region
the growth of the world population and consumption of Patient Increase
trends in their populations in Patient Trend?
The rapid rate of population decline in tropical species of Speed Decline
all countries with populations greater than 1 million with Possess Country
Increase in population in Patient Increase
species populations Modifier Part Species
MARINE SPECIES POPULATIONS Modifier Part Species
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
SourceDocuments
LinguisticProcessors
[[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP
Morpho-syntactic analysis
TYBOT ConceptMiners
Abstract Physical
H20 CO2
Substance
CO2Emission
WaterPollution
Ontology
Process
Chemical Reaction
GlobalWarming
GreenhouseGas
Ontologize
Axiomatize
(instance s1 Substance) (instance e1 Warming) (katalyist s1 e1)
Synthesize
in
of
Term hierarchy
emission gas
greenhouse gas
area
agricultural area
CO2
naturalprocess:1
English Wordnet
emission:2gas:1
area:1
greenhouse gas:1
rural area:1
geographical area:1
region:3
location:3 substance:1
emission:3
farmland:2
CO2
Conceptual modeling
ICT-211423
Wikyoto
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Do populations always consist of marine species?
A.....
decline...
population.....Z
Are terrestrial species never
marine species?
Simplified Term Fragment
population
marinespecies
terrestrialspecies
Simplified Ontology Fragment
?Population
Group
KyotoServer
Hidden
Shown
.... populations declined
.....terrestrial andmarine species..
in forests.....declined
Do populations consist of
marine species?
InterviewAre terrestrial
species a type of
populations?
Interview
.... populations such as
terrestrial and marine species .....
Smart Kytext
KAF DE-TNTybotspdf
FactAFKAF
Kybots
plugin plugin
DE-KONDE-WN
Facts in RDF
G-WN
Wordnets in LMFOntologies in OWL-DL
G-KON
WIKIPEDIA
SUMO DOLCE
GEO
FRAMENET
ICT-211423
Editing the domain wordnet
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
A.....
decline...
population.....Z
group
terrestrial species population
species population
population of vertebrate
species
marine species population
peoplepopulation
1. Validate Term Hierarchy:-Defining phrases:
- document- domain corpus- Google
-Other phrases-Wiki classes-Generic-WN classes
.... populations such as
terrestrial and marine species .....
Are terrestrial species a type of
populations?
Are terrestrial species never
marine species?
WN & ⌐ DOC
WN & DOC
⌐ WN & ⌐ DOC
⌐ WN & DOC
DE-WN
G-WN: Synset: ENG20-07682918-n {population:2}
a group of organisms of the same species populating a given area
SUMO: +inhabits -> +Group
Wiki: http://en.wikipedia.org/wiki/Population
In sociology and biology a population is the collection of inter-breeding organisms of a particular species.
Smart KyText
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
land
grasslandcropland woodland
country:1, state:6, land:5
domain:2, demesne:2,
land:4
land:1 land:2, ground:7,
soil:3
object:1, physical object:1
real property:1, real estate:1, realty:1
land:3, dry land:1, earth:3,ground:1, solid ground:1,
terra firma:1
administrative district:1, administrative division:1, territorial division:1
region:3
biome:1
urban land
mediterranean woodland
Wordnet & ⌐ Doc
Wordnet & Doc
⌐ Wordnet & Doc
agricultural urban land
⌐ Wordnet, ⌐ Doc
Difficult wordnet mapping
ICT-211423
Editing the domain ontology
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Ontologization of terms
• A domain term is a disjoint hyponym in the domain wordnet and is propagated to the domain ontology as a new Type.
• A domain term is not a disjoint hyponym and therefore we do not propose a new ontology extension but we still need to map the term to the ontology, i.e. make the ontological constraint explicit.
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
A.....
decline...
population.....Z
group
population
terrestrial species population
species population
population of vertebrate
species
marine species population
people
+
?Population
DE-WN DE-ON
Group=
1. Validate Implied Ontological Constraints:- Generalize semantic relations- Interpret relation given ontology parent- Formulate interview using highlighted text
Can populations decline?Do populations consist of marine species?
Do populations always consist of marine species?Do populations always decline?
Are populations located in forests?
Are populations always located in forests?
.... populations of marine species
......... populations
declined .....terrestrial andmarine species..
in forests.....declined
Smart KyText
2. Validate additional constraints- Select dominant relations- Formulate interviews using highligted text
Sumo axiom for Group (Hidden Data)(=> (and (instance ?GROUP Group) (member ?MEMB ?GROUP)) (instance ?MEMB Agent))
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Derived hidden structures• New constraint Population in DE-ON:
(subclass Population Group)(=>
(and (instance ?POP Population) (member ?MEMB ? POP)
(instance ?MEMB Species)))• Extended constraint Population in DE-ON:
(subclass Population Group)(=>
(and (instance ?POP Population) (member ?MEMB ? POP)
(instance ?MEMB Species) (*instance ?REGION Region) * indicates possible relations (*inhabits ?MEMB ?REGION) * indicates possible relations (*location ?MEMB ?REGION))) * indicates possible relations
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Cross-lingual validation
• Population is added by Group-1, with constraints derived from language L1
• Group-2 uses languages L2 and observes a domain Type in the domain ontology with an English gloss, description -> possibly proposed through WSD
• Select/confirm existing domain type as a candidate for validation
• Smart Ky-Text in Language L2 and the Term hierarchy are used to generate questions in L2
• Group-2 can confirm or deny constraints for L2 and add new constraints
• Cross-lingual and cross-group validation is added to the constraints in the ontology
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Cross validated structures
• Population in DE-ON:(subclass Population Group)(=> (and
(instance ?POP Population) (member ?MEMB ? POP)
(instance ?MEMB Species (xval G1-ENG G2-NLD G3-NLD G4-ITA))
(instance ?REGION Region(xval G1-ENG G2-NLD)) (*inhabits ?MEMB ?REGION (xval G3-NLD))
(*location ?MEMB ?REGION (xval G1-ENG G4-ITA)))))
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Capture ServerCapture Server
Document BaseLinear KAF
Document BaseLinear KAF
Tybot server(Term Extraction)
Tybot server(Term Extraction)
Extracted TermsGeneric K-TMF
Extracted TermsGeneric K-TMF
Term Editor(Wikyoto)
Term Editor(Wikyoto)
Domain OntologyOWL_DL
Domain OntologyOWL_DL
Domain WordnetK-LMF
Domain WordnetK-LMF
Kybot Server(Fact Extraction)
Kybot Server(Fact Extraction)
SemanticAnnotationSemantic
Annotation
Document BaseLinear Generic KAF
Document BaseLinear Generic KAF
Document BaseLinear KAF
Document BaseLinear KAF
Kybot EditorKybot Editor
KybotProfilesKybot
ProfilesConcept User
Fact User
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
What Kybots do
• Input:– KAF annotations of text: sequential & encoded by
language– Conceptual frame from the ontology– Expression rules for frame to language mapping:
• Wordnet in a language• Morpho-syntactic mappings rules
• Output are a database of facts in KAF/FactAF (generic):– aggregated facts– inferred facts– language neutral
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Fact mining• KYBOT = Knowledge Yielding Robot• Logical expression
– (instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) – (instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1)
• Expression rules per language: – [N[s1]V[e1]]S – [N[e1]N[s1]N – [[N[e1]][prep][N[s2]]NP
• Ontology * Wordnets– Capabilities– Conditions: WNT -> adjectives, WNT -> nouns– Causes: WNT -> verbs, WNT -> nouns– Process: DamageProcess, ProduceProcess
• Kybot compiler– kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Fact mining by Kybots
SourceDocuments
LinguisticProcessors
[[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP
Morpho-syntactic analysis (KAF)
Abstract Physical
H2O CO2
Substance
CO2 emission
water pollution
Ontology Wordnets &Linguistic Expressions
Process
Chemical Reaction
Generic
Logical Expressions
[[the emission]NP ] Process: e1 [of greenhouse gases]PP Patient: s2 [in agricultural areas]PP] Location: a3
Fact analysisPatient
PatientDomain
emission:2gas:1
greenhouse gas:1
substance:1
emission:3
natural process:1
C02
Lexical database: wordnet
Abstract Physical
H20 CO2
Substance
CO2Emission
Process
ChemicalReaction
GlobalWarming
GreenhouseGas
Ontology
Maximalabstraction&
integrity
Languageneutralintegrity
gasgreen house gas -> gas-increase(AG)-in 2003 (TIME)CO2 -> green house gas-emission (PA)-in European countries (LO)
Term database
Generictext based
Sudden increase of green house gases in 2003........ C02 emission
in European countries....Green house gases such as C02, ....
Text corpus
Lineartext
ConceptMining
by Tybots
Synthesize Text miningby Kybots
Ontologize
Axiomatize
(instance s1 Substance) (instance e1 Warming) (katalyist s1 e1)
Tienjarig jubileum NL-Term, 25 October 2008, AmsterdamICT-211423
Thank you for your attention