1 enviroinfo 2006, 05/09/06 graz automatic concept space generation in support of resource discovery...
TRANSCRIPT
1EnviroInfo 2006, 05/09/06 Graz
Automatic Concept SpaceAutomatic Concept SpaceGeneration in Support of Resource Generation in Support of Resource
Discovery in Spatial Data Discovery in Spatial Data InfrastructuresInfrastructures
Paul Smits, Anders Friis-Christensen
European Commission, DG Joint Research CentreInstitute for Environment and Sustainability
Spatial Data Infrastructures UnitTP 262, Ispra (VA), Italy
2EnviroInfo 2006, 05/09/06 Graz
The mission of the JRC is to provide customer-driven scientific and technical support for the conception, development, implementation and monitoring of EU policies.
As a service of the European Commission, the JRC functions as a reference centre of science and technology for the Union.
Close to the policy-making process, it serves the common interest of the Member States, while being independent of special interests, whether private or national.
JRC’s Mission
3EnviroInfo 2006, 05/09/06 Graz
OutlineOutline
• Introduction
• Objectives of the study
• Approach
• Results
• Conclusions
4EnviroInfo 2006, 05/09/06 Graz
GI PolicyGI Policy GI standardsGI standards
Spatial Information ServicesSpatial Information ServicesFundamental Fundamental GI data setsGI data sets
Introduction – components of a European SDIIntroduction – components of a European SDI
6EnviroInfo 2006, 05/09/06 Graz
IntroductionIntroductionINSPIRE requirements
• metadata*• spatial data sets and spatial data
services*• network services*
– EU geo-portal
• access and rights of use for Community institutions and bodies**
• monitoring and reporting mechanisms**• process and procedures
* technical: under JRC responsibility** legal/procedural: under Eurostat responsibility
7EnviroInfo 2006, 05/09/06 Graz
IntroductionIntroduction
• European interoperability framework for pan-European interoperability framework for pan-European eGovernment servicesEuropean eGovernment services
• Recommendations related to multilingualism, e.g.,Recommendations related to multilingualism, e.g., – For the Pan-European services provided via portals, the
top-level EU portal interface should be fully multilingual, the second-level pages (introductory texts and the descriptions of links) should be offered in the official languages and the external links and related pages on the national websites should be available in at least one other language (for example English) in addition to the national language(s).
http://europa.eu.int/idabc
EcoInformatics meeting, 17/01/06 Ispra
Introduction
Issues on Multilingualism identified by the INSPIRE DT on Network Services– only mentioned in the context of the interoperability of
spatial data sets and services for key attributes and corresponding multilingual thesauri
– Granularity: should the list of available languages be a service feature or at the data set or even at the feature attribute level ?
– Metadata/Data: should only metadata be multilingual or datasets as well ?
– Attributes label versus Attribute value: Should only attributes label be multilingual or should the attribute’ values be as well multilingual?
EcoInformatics meeting, 17/01/06 Ispra
Introduction«view»
Information community 3
«view»
Information community 4
«view»
Central
«view»
Information community 1
«view»
Information community 2
«view»
Information community 1.1
«view»
Information community 1.2
Metadata creation
Collections of metadata (e.g., portal, search engine)
Define query and consult metadata
harvest /distributedsearch
harvest /distributedsearch
searchsearch
10EnviroInfo 2006, 05/09/06 Graz
OutlineOutline
• Introduction
• Objectives of the study
• Approach
• Results
• Conclusions
11EnviroInfo 2006, 05/09/06 Graz
Objective of the studyObjective of the study
• Focus on discovery of resources
• Answer question:– Is, from a technical point of view, a common
ontology or thesaurus desirable and feasible for multi-lingual resource discovery in a European Spatial Data Infrastructure?
12EnviroInfo 2006, 05/09/06 Graz
OutlineOutline
• Introduction
• Objectives of the study
• Approach
• Results
• Conclusions
13EnviroInfo 2006, 05/09/06 Graz
ApproachApproach
• Implement and extend work of H. Chen, et al., "A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project," IEEE Transactions on Pattern Analysis and Machine Intelligence vol. 18 pp. 771-782, 1996.
• Integrate thesauri, vocabularies and gazetteers in resource discovery
• Experiments P. Smits, A. Friis-Christensen, Resource Discovery in a European Spatial Data Infrastructure. IEEE Transactions on Knowledge and Data Engineering (accepted for publication)
14EnviroInfo 2006, 05/09/06 Graz
ApproachApproach
• What is a Concept Space?• Simply put:
– An index of all concepts existing in a metadata repository
– With numerical relationships defined between any two concepts
– To be queried by associative retrieval
15EnviroInfo 2006, 05/09/06 Graz
• Two-step approach– Creation of multi-
lingual concept space
– Associative retrieval based on a neural network
H. Chen, B. Schatz, T. Ng, J. Martinez, A. Kirchhoff, C. Lin, A parallel computing approach to creating engineering concept spaces for semantic retrieval: the Illinois digital library initiative project. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 18, No. 8, August 1996, pp. 771-782.
ApproachApproachStart
End
3. Cluster analysis
«database»
Ontology and v ocabulary
«database»
Resource descriptors
«database»
Concept Space
«database»
Index
1. Collect resourcedescriptors
2. Filter and indexconcepts
«database»
Unidentified Concepts
16EnviroInfo 2006, 05/09/06 Graz
ApproachApproach
• Creation of the multi-lingual concept space– Collection of resource descriptors– Object filtering and indexing
• identify those concepts and terms that we already have in our human-created ontology which includes any thesauri and vocabulary
• to filter out any irrelevant terms like stop words in order to improve performance
• to store any remaining terms in the concept space
17EnviroInfo 2006, 05/09/06 Graz
Approach - Associative queryApproach - Associative query
• Initialize the associative retrieval– The neural network is initialized at query time by
assigning initial membership values to the units of the neural network = concepts in the Concept Space
• Terms in the concept space that match exactly a query term: 1
• Partial matches get membership value < 1 • Terms that do not match the query: 0
18EnviroInfo 2006, 05/09/06 Graz
Approach - Associative queryApproach - Associative query
• Initialize the associative retrieval
Query: “soil”
Soil, bodem
1
Sub-surface
information
0
0
Situation at t=0
Wij = 0
Wij = 0.7
19EnviroInfo 2006, 05/09/06 Graz
Approach - Associative queryApproach - Associative query
• Iterate though the neural network
Soil, bodem
1
Sub-surface
information
0
0
Situation at t=0
Wij = 0
Wij = 0.7
Soil, bodem
1
Sub-surface
information
0.7
0
Situation at t=1
Wij = 0
Wij = 0.7
20EnviroInfo 2006, 05/09/06 Graz
Approach - Associative queryApproach - Associative query
• Link membership values of concepts to resource descriptors
Soil, bodem
1
Sub-surface
information
0.7
0
Situation at t=1
Wij = 0
Wij = 0.7Membership > threshold?Use index to find resourcesthat contain the conceptOrder found resources in order of relevance, based on membership values
21EnviroInfo 2006, 05/09/06 Graz
OutlineOutline
• Introduction
• Objectives of the study
• Approach
• Results
• Conclusions
22EnviroInfo 2006, 05/09/06 Graz
«database»
Metadata repository
Ontology (thesauri,
v ocabularies)
Harv esterWMS
RDF
Wrapper
Concept Space Manager
Associativ e retriev er
query
Concept Space
Ontology editor
OWL
Reasoner
Collection of web addresses
XSLT file library
Ontology importer
XML
DIG
23EnviroInfo 2006, 05/09/06 Graz
ResultsResults
• Creating the metadata repository
24EnviroInfo 2006, 05/09/06 Graz
ResultsResults
25EnviroInfo 2006, 05/09/06 Graz
ResultsResults
26EnviroInfo 2006, 05/09/06 Graz
ResultsResults
• Query computationally expensive
query Remark
Time required for four iterations of neural network
(600 MHz, 512 MB RAM)
soil (eng) Query term found in the concept space (GEMET 2001.1 concept no.
7843)
16.1 s.
infrastructuur (nld) Query term not literally defined in the concept space or ontology.
27.8 s.
27EnviroInfo 2006, 05/09/06 Graz
OutlineOutline
• Introduction
• Objectives of the study
• Approach
• Results
• Conclusions
28EnviroInfo 2006, 05/09/06 Graz
Conclusions from the studyConclusions from the study
• It will be impractical to rely only on one common ontology for resource discovery in a European SDI
• The approach of using human-created ontologies in combination with automatic concept space generation and associative retrieval is a powerful means to the discovery of geospatial resources.
• Proposed approach is useful and merits further investigation and development
• The importance of structured information, using metadata standards, is underlined by our study and is also a basic assumption of our work.