1 enviroinfo 2006, 05/09/06 graz automatic concept space generation in support of resource discovery...

27
1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Automatic Concept Space Generation in Support of Generation in Support of Resource Discovery in Spatial Resource Discovery in Spatial Data Infrastructures Data Infrastructures Paul Smits, Anders Friis-Christensen European Commission, DG Joint Research Centre Institute for Environment and Sustainability Spatial Data Infrastructures Unit TP 262, Ispra (VA), Italy

Upload: miguel-spencer

Post on 27-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

1EnviroInfo 2006, 05/09/06 Graz

Automatic Concept SpaceAutomatic Concept SpaceGeneration in Support of Resource Generation in Support of Resource

Discovery in Spatial Data Discovery in Spatial Data InfrastructuresInfrastructures

Paul Smits, Anders Friis-Christensen

European Commission, DG Joint Research CentreInstitute for Environment and Sustainability

Spatial Data Infrastructures UnitTP 262, Ispra (VA), Italy

Page 2: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

2EnviroInfo 2006, 05/09/06 Graz

The mission of the JRC is to provide customer-driven scientific and technical support for the conception, development, implementation and monitoring of EU policies.

As a service of the European Commission, the JRC functions as a reference centre of science and technology for the Union.

Close to the policy-making process, it serves the common interest of the Member States, while being independent of special interests, whether private or national.

JRC’s Mission

Page 3: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

3EnviroInfo 2006, 05/09/06 Graz

OutlineOutline

• Introduction

• Objectives of the study

• Approach

• Results

• Conclusions

Page 4: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

4EnviroInfo 2006, 05/09/06 Graz

GI PolicyGI Policy GI standardsGI standards

Spatial Information ServicesSpatial Information ServicesFundamental Fundamental GI data setsGI data sets

Introduction – components of a European SDIIntroduction – components of a European SDI

Page 5: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

6EnviroInfo 2006, 05/09/06 Graz

IntroductionIntroductionINSPIRE requirements

• metadata*• spatial data sets and spatial data

services*• network services*

– EU geo-portal

• access and rights of use for Community institutions and bodies**

• monitoring and reporting mechanisms**• process and procedures

* technical: under JRC responsibility** legal/procedural: under Eurostat responsibility

Page 6: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

7EnviroInfo 2006, 05/09/06 Graz

IntroductionIntroduction

• European interoperability framework for pan-European interoperability framework for pan-European eGovernment servicesEuropean eGovernment services

• Recommendations related to multilingualism, e.g.,Recommendations related to multilingualism, e.g., – For the Pan-European services provided via portals, the

top-level EU portal interface should be fully multilingual, the second-level pages (introductory texts and the descriptions of links) should be offered in the official languages and the external links and related pages on the national websites should be available in at least one other language (for example English) in addition to the national language(s).

http://europa.eu.int/idabc

Page 7: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

EcoInformatics meeting, 17/01/06 Ispra

Introduction

Issues on Multilingualism identified by the INSPIRE DT on Network Services– only mentioned in the context of the interoperability of

spatial data sets and services for key attributes and corresponding multilingual thesauri

– Granularity: should the list of available languages be a service feature or at the data set or even at the feature attribute level ?

– Metadata/Data: should only metadata be multilingual or datasets as well ?

– Attributes label versus Attribute value: Should only attributes label be multilingual or should the attribute’ values be as well multilingual?

Page 8: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

EcoInformatics meeting, 17/01/06 Ispra

Introduction«view»

Information community 3

«view»

Information community 4

«view»

Central

«view»

Information community 1

«view»

Information community 2

«view»

Information community 1.1

«view»

Information community 1.2

Metadata creation

Collections of metadata (e.g., portal, search engine)

Define query and consult metadata

harvest /distributedsearch

harvest /distributedsearch

searchsearch

Page 9: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

10EnviroInfo 2006, 05/09/06 Graz

OutlineOutline

• Introduction

• Objectives of the study

• Approach

• Results

• Conclusions

Page 10: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

11EnviroInfo 2006, 05/09/06 Graz

Objective of the studyObjective of the study

• Focus on discovery of resources

• Answer question:– Is, from a technical point of view, a common

ontology or thesaurus desirable and feasible for multi-lingual resource discovery in a European Spatial Data Infrastructure?

Page 11: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

12EnviroInfo 2006, 05/09/06 Graz

OutlineOutline

• Introduction

• Objectives of the study

• Approach

• Results

• Conclusions

Page 12: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

13EnviroInfo 2006, 05/09/06 Graz

ApproachApproach

• Implement and extend work of H. Chen, et al., "A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project," IEEE Transactions on Pattern Analysis and Machine Intelligence vol. 18 pp. 771-782, 1996.

• Integrate thesauri, vocabularies and gazetteers in resource discovery

• Experiments P. Smits, A. Friis-Christensen, Resource Discovery in a European Spatial Data Infrastructure. IEEE Transactions on Knowledge and Data Engineering (accepted for publication)

Page 13: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

14EnviroInfo 2006, 05/09/06 Graz

ApproachApproach

• What is a Concept Space?• Simply put:

– An index of all concepts existing in a metadata repository

– With numerical relationships defined between any two concepts

– To be queried by associative retrieval

Page 14: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

15EnviroInfo 2006, 05/09/06 Graz

• Two-step approach– Creation of multi-

lingual concept space

– Associative retrieval based on a neural network

H. Chen, B. Schatz, T. Ng, J. Martinez, A. Kirchhoff, C. Lin, A parallel computing approach to creating engineering concept spaces for semantic retrieval: the Illinois digital library initiative project. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 18, No. 8, August 1996, pp. 771-782.

ApproachApproachStart

End

3. Cluster analysis

«database»

Ontology and v ocabulary

«database»

Resource descriptors

«database»

Concept Space

«database»

Index

1. Collect resourcedescriptors

2. Filter and indexconcepts

«database»

Unidentified Concepts

Page 15: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

16EnviroInfo 2006, 05/09/06 Graz

ApproachApproach

• Creation of the multi-lingual concept space– Collection of resource descriptors– Object filtering and indexing

• identify those concepts and terms that we already have in our human-created ontology which includes any thesauri and vocabulary

• to filter out any irrelevant terms like stop words in order to improve performance

• to store any remaining terms in the concept space

Page 16: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

17EnviroInfo 2006, 05/09/06 Graz

Approach - Associative queryApproach - Associative query

• Initialize the associative retrieval– The neural network is initialized at query time by

assigning initial membership values to the units of the neural network = concepts in the Concept Space

• Terms in the concept space that match exactly a query term: 1

• Partial matches get membership value < 1 • Terms that do not match the query: 0

Page 17: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

18EnviroInfo 2006, 05/09/06 Graz

Approach - Associative queryApproach - Associative query

• Initialize the associative retrieval

Query: “soil”

Soil, bodem

1

Sub-surface

information

0

0

Situation at t=0

Wij = 0

Wij = 0.7

Page 18: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

19EnviroInfo 2006, 05/09/06 Graz

Approach - Associative queryApproach - Associative query

• Iterate though the neural network

Soil, bodem

1

Sub-surface

information

0

0

Situation at t=0

Wij = 0

Wij = 0.7

Soil, bodem

1

Sub-surface

information

0.7

0

Situation at t=1

Wij = 0

Wij = 0.7

Page 19: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

20EnviroInfo 2006, 05/09/06 Graz

Approach - Associative queryApproach - Associative query

• Link membership values of concepts to resource descriptors

Soil, bodem

1

Sub-surface

information

0.7

0

Situation at t=1

Wij = 0

Wij = 0.7Membership > threshold?Use index to find resourcesthat contain the conceptOrder found resources in order of relevance, based on membership values

Page 20: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

21EnviroInfo 2006, 05/09/06 Graz

OutlineOutline

• Introduction

• Objectives of the study

• Approach

• Results

• Conclusions

Page 21: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

22EnviroInfo 2006, 05/09/06 Graz

«database»

Metadata repository

Ontology (thesauri,

v ocabularies)

Harv esterWMS

RDF

Wrapper

Concept Space Manager

Associativ e retriev er

query

Concept Space

Ontology editor

OWL

Reasoner

Collection of web addresses

XSLT file library

Ontology importer

XML

DIG

Page 22: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

23EnviroInfo 2006, 05/09/06 Graz

ResultsResults

• Creating the metadata repository

Page 23: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

24EnviroInfo 2006, 05/09/06 Graz

ResultsResults

Page 24: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

25EnviroInfo 2006, 05/09/06 Graz

ResultsResults

Page 25: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

26EnviroInfo 2006, 05/09/06 Graz

ResultsResults

• Query computationally expensive

query Remark

Time required for four iterations of neural network

(600 MHz, 512 MB RAM)

soil (eng) Query term found in the concept space (GEMET 2001.1 concept no.

7843)

16.1 s.

infrastructuur (nld) Query term not literally defined in the concept space or ontology.

27.8 s.

Page 26: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

27EnviroInfo 2006, 05/09/06 Graz

OutlineOutline

• Introduction

• Objectives of the study

• Approach

• Results

• Conclusions

Page 27: 1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders

28EnviroInfo 2006, 05/09/06 Graz

Conclusions from the studyConclusions from the study

• It will be impractical to rely only on one common ontology for resource discovery in a European SDI

• The approach of using human-created ontologies in combination with automatic concept space generation and associative retrieval is a powerful means to the discovery of geospatial resources.

• Proposed approach is useful and merits further investigation and development

• The importance of structured information, using metadata standards, is underlined by our study and is also a basic assumption of our work.