fp7, information day call 5, luxembourg, may 11-12, 2009 kyoto (ict-211423) yielding ontologies for...
TRANSCRIPT
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
KYOTO (ICT-211423)Yielding Ontologies for Transition-Based OrganizationFP7: Intelligent Content and Semantics
http://www.kyoto-project.eu/
Piek Vossen, VU University Amsterdam
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
2
Project goals
• Open platform for knowledge sharing across languages and cultures– Wiki environment that allows people in the field to maintain their
knowledge and agree on meaning without knowledge engineering skills
– Bootstrap this knowledge through open text mining & concept learning
– Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries.
– Enables deep semantic search for facts and knowledge
• Free, open source license (GPL)
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
3
• Languages: – English, Dutch, Italian, Spanish, Basque, Chinese, Japanese
• Domain:– Environmental domain, BUT usable in any domain
• Global: – Both European and non-European languages
• Available: – Free: as open source system and data (GPL)
• Future perspective: – Content standardization that supports world wide communication
Scope
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
4
KYOTO (ICT-211423) • Funded:
– 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics
– Taiwan and Japan funded by national grants • STREPS project: research & development• Duration:
– March 2008 – March 2011
• Effort: – 364 person months of work.
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
5
Consortium
1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2. Consiglio Nazionale delle Ricerche (Pisa, Italy), 3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,
Germany), 4. Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5. Academia Sinica (Tapei, Taiwan), 6. National Institute of Information and Communications Technology
(Kyoto, Japan), 7. Irion Technologies (Delft, The Netherlands), 8. Synthema (Rome, Italy), 9. European Centre for Nature Conservation (Tilburg, The Netherlands), • Subcontractors:
– World Wide Fund for Nature (Zeist, The Netherlands), – Masaryk University (Brno, Czech)
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
6
Current situation environment domain
• Vast amount of information in all kinds of formats and structures: websites, documents, databases, experts, community networks
• Scattered over the world: different regions, languages and cultures
• Highly dynamic and developing
• Increasing time and information pressure• Technology gap, use first results Google• Critical knowledge dependency
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
7
KYOTO cycle
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
8
KYOTO's Solution• Text mining:
– Massive and accurate indexing of facts from vast amounts of text;– In any language/culture from scattered sources;– Again and again to detect trends and changes;– Direct relation between knowledge modeling effort and text mining
• Knowledge modeling:– automatic learning of terms and concepts from text in any language;– formalization of knowledge in computer usable format -> wordnets &
ontologies• Community software:
– For experts in the field and not knowledge engineers– Continuous and collaborative effort:
• adapt to the changing domain;• consensus in the field;• consensus across languages and cultures
– Produce interoperable, formal, standardized knowledge structures;– Relate knowledge structure to expressions in languages
Top
Middle
H20 CO2
Substance
Abstract
Process
Physical
Ontology
Environmental organizations
Tybot: term yielding robot
Kybot: knowledge yielding robot
Wordnets
Distributed, diverse & dynamic data
1
Capture text:"Sudden increase of CO2 emissions in 2008 in Europe"
2
CO2 emission3
Wikyoto
maintainterms & concepts
4
Index facts:Process: Emission Involves: CO2Property: increase, suddenWhen: 2008 Where: Europe
5Text & Fact Index
SemanticSearch
6
Citizens
Governments
Companies
DomainCO2
EmissionH20
PollutionGreenhouse
Gas
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
10
Achievements after 1st year
• First version of all system components– Wordnets in 7 languages in uniform database formats
– Standard representation for output of linguistic processing for 7 languages, based on ISO proposals
– Tybot (term extraction), Kybot (fact extraction) and Wikyoto (user editor)
– Semantic search
• Extensive definition of user requirements• Integration of system components
Potential impact
Kyoto Knowledge Base
WnIT
Domain
WnEN
Domain
WnEU
Domain
WnNL
DomainWnJP
Domain
WnCH
Domain
WnES
DomainOntologyOntologyOntology
Domain Ontology
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
13
Linking Open Data dataset cloud
http://richard.cyganiak.de/2007/10/lod/
Wordnetsailingterms
Ontologyenvironment
concepts
environmentfacts
Ontologymedical
concepts
Wordnetlegalterms
Wordnetmedicalterms
medicalfacts
legalfacts
Ontologylegal
concepts
Ontologysailing
concepts
Wordnetenvironment
terms
Wordnetenvironment
terms
Wordnetenvironment
terms
Wordnetenvironment
terms
Wordnetenvironment
terms
Project characteristics
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
15
Why STRP project?
• Major technical challenges• Cross-cultural and cross-lingual• Small consortium for intense collaboration
and discussion• Bridge the gap between users and
technology: two-directional process• Role out needs to follow from technical
achievements
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
16
How to keep focus?
• Use existing state of the art technology• Start from current practice as baseline• Develop robust platform that adds to baseline,
with baseline as fall back• Gradually add richer data, more precision and new
functionalities• Allow end-users to control the process, driven by
textual examples• Open standardized architecture that can be
developed further
Thank you for your attention