ecoterm iv nbii/eionet demo of federated kos search

52
EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Upload: kalia-dejesus

Post on 15-Mar-2016

35 views

Category:

Documents


4 download

DESCRIPTION

EcoTerm IV NBII/EioNet Demo of Federated KOS Search. Mike Frame Vienna, Austria April 2007. Discussion Topics…. Project Background NBII Thesaurus GEMET Thesaurus Prototype Client Sample Query Results Including no, 1, or both thesauri Overall Findings. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

EcoTerm IVNBII/EioNet Demo of Federated

KOS Search

Mike Frame

Vienna, Austria

April 2007

Page 2: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Discussion Topics…• Project Background• NBII Thesaurus• GEMET Thesaurus• Prototype Client• Sample Query Results

• Including no, 1, or both thesauri • Overall Findings

Page 3: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Biocomplexity Thesaurushttp://thesaurus.nbii.gov

http://thesaurus.nbii.gov

Page 4: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

EIONET GEMET Thesaurushttp://www.eionet.europa.eu/gemet/webservices?langcode=en

Page 5: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

NBII/EIONET Thesaurus Web-service

1

• Background - collaboration through Ecoinformatics TWG • Primary Goal – access distributed multi-lingual thesauri• Results – SKOS web-service & client

Page 6: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Latest Client & Service capabilities Access to both NBII and GEMET Single language capability Results are provided by source All documentation is completed

http://thesaurus.nbii.gov

Page 7: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Demo Client

Page 9: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Current State Users

• Most aren’t aware of the underlying vocabulary Vocabulary are often unique to organization

and more for “categorization” than retrieval Goal

• Include all Vocabularies and let Search Engine handle results

Page 10: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Demonstration Search Retrieval Created a demonstration datasets

• NBII Cataloged Resources•~30,000 web-sites, publications, images,

maps, etc.•Xml structured data – controlled subject

• NBII FGDC Metadata•~22,000 resources on research studies• 150-200 elements•Semi-structured with no controlled vocabulary

Page 11: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

NBII Catalog Records Based on the Dublin Core + 18 elements, of which 10 are mandatory In place since 2002 Used by distributed content managers

Page 12: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

NBII Metadata CH

Page 13: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Process Added thesaurus capabilities to Development

Search Engine for: • NBII Thesaurus• EIONET GEMET Thesaurus• Used BT, RT, NT relationships & weighting

Performed sample queries within the test repositories for:• No thesaurus • GEMET only aided searching• NBII only aided searching• GEMET+NBII aided searching (X)

Page 14: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Test Repository 1NBII Resource Catalog

(Dublin Core)

Page 15: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

No Thesauri – “invasive species”

Page 16: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

NBII Thesaurus – “invasive species”

Page 17: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

GEMET Thesaurus – “invasive species”

Page 18: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

No Thesauri – “Endangered Species”

Page 19: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

NBII Thesaurus – “endangered species”

Page 20: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

GEMET Only – “endangered species”

Page 21: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

No Thesaurus – “rare species”

Page 22: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

NBII Thesaurus – “rare species”

Page 23: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

GEMET Thesaurus – “rare species”

Page 24: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

GEMET Thesaurus – “rare species” (expanded degrees of relevance)

Page 25: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

No Thesauri – “protected species”

Page 26: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

NBII Thesaurus – “protected species”

Page 27: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

GEMET Thesaurus – “protected species”

Page 28: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Results – NBII Catalog Resourcesterm None NBII GEMET“invasive species”

2487 10802 2487

“endangered species”

1612 3532 1619

“rare species”

“rare species” (expanded)

249 7186 290

5847

“”protected species”

203 2345 1664

Page 29: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Results – NBII Resource Catalog

0

2000

4000

6000

8000

10000

12000

Invasivespec ies

endangeredspec ies

rare spec ies protec tedspec ies

None NBII GEMET

Page 30: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Test Repository 2NBII FGDC Metadata

Page 31: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Sample Queries – No vocabulariesMetadata CH “ invasive species”

Page 32: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Sample Queries – NBII onlyMetadata CH “invasive species”

Page 33: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Sample Queries – GEMET onlyMetadata CH

“ invasive species”

Page 34: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Sample Queries – No vocabulariesMetadata CH

“endangered species”

Page 35: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Sample Queries – NBII onlyMetadata CH

“endangered species”

Page 36: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Sample Queries – GEMET onlyMetadata CH

“ endangered species”

Page 37: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

No Thesauri – Metadata CH“rare species”

Page 38: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

NBII Thesaurus – Metadata CH “rare species”

Page 39: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

GEMET Thesaurus – Metadata CH“rare species”

Page 40: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Sample Queries – No vocabulariesMetadata CH “protected species”

Page 41: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Sample Queries – NBII onlyMetadata CH

“protected species”

Page 42: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Sample Queries – GEMET onlyMetadata CH

“ protected species”

Page 43: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Results – FGDC Metadataterm None NBII GEMET“invasive species”

302 7884 302

“endangered species”

1008 2690 1019

“rare species” 59 4259 64

“protected species”

11 2152 1011

Page 44: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Results – NBII Resource Catalog

0

1000

2000

3000

4000

5000

6000

7000

8000

Invasivespec ies

endangeredspec ies

rare spec ies protec tedspec ies

None NBII GEMET

Page 45: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Overall ResultsGeneral Findings

Assumption that a Thesaurus improves “number” of results is valid• Degree does vary by the term and mappings

Since users search from a # of perspectives, backgrounds, expertise, multiple thesaurus do improve the number of results

Page 46: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Overall ResultsUsing only GEMET Terminology

Terms not included in the NBII thesaurus that were in GEMET improved search results

GEMET strength of broad coverage aided searches

In General for the Metadata repository• Results varied somewhat, but often same

top 10 results

Page 47: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Overall ResultsGeneral Findings

With “No thesaurus” test results produced poorer #1 results

Thesaurus results for the structured set ordered results list more differently than unstructured set (Metadata)

Page 48: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Issues “integrating” multi-scope and purpose

thesauri presents challenges:• Can’t turn the effort into a thesaurus project• Degrees of relevance of terms is an issue• Concept matching or different intent• Differing classification (RT vs. NT) across

thesauri • Differing “weighting” algorithms

Page 49: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Further Study Options 1.) Take multiple thesauri “as is”2.) Do some “attempted” concept

matchingi.e. “endangered animal species” –

“endangered animal”3.) If not match is present, add term and

relationship as is4.) Obtain terms from XMDR

Page 50: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Further Study Options – cont. Follow-up with additional repositories Repeat with other query terms Re-look at weighting algorithms Do queries with subset of terms Repeat with completely integrated

thesaurus as compared to>>>>>>> Repeat queries with machine integration

Complete By June

Page 51: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

Questions, Comments,

Page 52: EcoTerm IV NBII/EioNet Demo of Federated KOS Search

GEMET Control file endangered species,category of endangered

species[.2],endangered animal species[0.8],endangered plant species[0.8]

protected species,category of endangered species[0.2],endangered species [0.2]

rare species,category of endangered species[0.2],extinct species[0.2],vanished species[0.2]