ecoterm iv nbii/eionet demo of federated kos search
DESCRIPTION
EcoTerm IV NBII/EioNet Demo of Federated KOS Search. Mike Frame Vienna, Austria April 2007. Discussion Topics…. Project Background NBII Thesaurus GEMET Thesaurus Prototype Client Sample Query Results Including no, 1, or both thesauri Overall Findings. - PowerPoint PPT PresentationTRANSCRIPT
EcoTerm IVNBII/EioNet Demo of Federated
KOS Search
Mike Frame
Vienna, Austria
April 2007
Discussion Topics…• Project Background• NBII Thesaurus• GEMET Thesaurus• Prototype Client• Sample Query Results
• Including no, 1, or both thesauri • Overall Findings
Biocomplexity Thesaurushttp://thesaurus.nbii.gov
http://thesaurus.nbii.gov
EIONET GEMET Thesaurushttp://www.eionet.europa.eu/gemet/webservices?langcode=en
NBII/EIONET Thesaurus Web-service
1
• Background - collaboration through Ecoinformatics TWG • Primary Goal – access distributed multi-lingual thesauri• Results – SKOS web-service & client
Latest Client & Service capabilities Access to both NBII and GEMET Single language capability Results are provided by source All documentation is completed
http://thesaurus.nbii.gov
Demo Client
Initial Challenges Identified Thesaurus scope, intent, purpose, and
coverage is different • NBII = sub-discipline of environment
• Endangered species• Broader Terms:Species , Special status species ,
Taxa
• EIOINET = broad environment• Broader Terms:environmental protection
Current State Users
• Most aren’t aware of the underlying vocabulary Vocabulary are often unique to organization
and more for “categorization” than retrieval Goal
• Include all Vocabularies and let Search Engine handle results
Demonstration Search Retrieval Created a demonstration datasets
• NBII Cataloged Resources•~30,000 web-sites, publications, images,
maps, etc.•Xml structured data – controlled subject
• NBII FGDC Metadata•~22,000 resources on research studies• 150-200 elements•Semi-structured with no controlled vocabulary
NBII Catalog Records Based on the Dublin Core + 18 elements, of which 10 are mandatory In place since 2002 Used by distributed content managers
NBII Metadata CH
Process Added thesaurus capabilities to Development
Search Engine for: • NBII Thesaurus• EIONET GEMET Thesaurus• Used BT, RT, NT relationships & weighting
Performed sample queries within the test repositories for:• No thesaurus • GEMET only aided searching• NBII only aided searching• GEMET+NBII aided searching (X)
Test Repository 1NBII Resource Catalog
(Dublin Core)
No Thesauri – “invasive species”
NBII Thesaurus – “invasive species”
GEMET Thesaurus – “invasive species”
No Thesauri – “Endangered Species”
NBII Thesaurus – “endangered species”
GEMET Only – “endangered species”
No Thesaurus – “rare species”
NBII Thesaurus – “rare species”
GEMET Thesaurus – “rare species”
GEMET Thesaurus – “rare species” (expanded degrees of relevance)
No Thesauri – “protected species”
NBII Thesaurus – “protected species”
GEMET Thesaurus – “protected species”
Results – NBII Catalog Resourcesterm None NBII GEMET“invasive species”
2487 10802 2487
“endangered species”
1612 3532 1619
“rare species”
“rare species” (expanded)
249 7186 290
5847
“”protected species”
203 2345 1664
Results – NBII Resource Catalog
0
2000
4000
6000
8000
10000
12000
Invasivespec ies
endangeredspec ies
rare spec ies protec tedspec ies
None NBII GEMET
Test Repository 2NBII FGDC Metadata
Sample Queries – No vocabulariesMetadata CH “ invasive species”
Sample Queries – NBII onlyMetadata CH “invasive species”
Sample Queries – GEMET onlyMetadata CH
“ invasive species”
Sample Queries – No vocabulariesMetadata CH
“endangered species”
Sample Queries – NBII onlyMetadata CH
“endangered species”
Sample Queries – GEMET onlyMetadata CH
“ endangered species”
No Thesauri – Metadata CH“rare species”
NBII Thesaurus – Metadata CH “rare species”
GEMET Thesaurus – Metadata CH“rare species”
Sample Queries – No vocabulariesMetadata CH “protected species”
Sample Queries – NBII onlyMetadata CH
“protected species”
Sample Queries – GEMET onlyMetadata CH
“ protected species”
Results – FGDC Metadataterm None NBII GEMET“invasive species”
302 7884 302
“endangered species”
1008 2690 1019
“rare species” 59 4259 64
“protected species”
11 2152 1011
Results – NBII Resource Catalog
0
1000
2000
3000
4000
5000
6000
7000
8000
Invasivespec ies
endangeredspec ies
rare spec ies protec tedspec ies
None NBII GEMET
Overall ResultsGeneral Findings
Assumption that a Thesaurus improves “number” of results is valid• Degree does vary by the term and mappings
Since users search from a # of perspectives, backgrounds, expertise, multiple thesaurus do improve the number of results
Overall ResultsUsing only GEMET Terminology
Terms not included in the NBII thesaurus that were in GEMET improved search results
GEMET strength of broad coverage aided searches
In General for the Metadata repository• Results varied somewhat, but often same
top 10 results
Overall ResultsGeneral Findings
With “No thesaurus” test results produced poorer #1 results
Thesaurus results for the structured set ordered results list more differently than unstructured set (Metadata)
Issues “integrating” multi-scope and purpose
thesauri presents challenges:• Can’t turn the effort into a thesaurus project• Degrees of relevance of terms is an issue• Concept matching or different intent• Differing classification (RT vs. NT) across
thesauri • Differing “weighting” algorithms
Further Study Options 1.) Take multiple thesauri “as is”2.) Do some “attempted” concept
matchingi.e. “endangered animal species” –
“endangered animal”3.) If not match is present, add term and
relationship as is4.) Obtain terms from XMDR
Further Study Options – cont. Follow-up with additional repositories Repeat with other query terms Re-look at weighting algorithms Do queries with subset of terms Repeat with completely integrated
thesaurus as compared to>>>>>>> Repeat queries with machine integration
Complete By June
Questions, Comments,
GEMET Control file endangered species,category of endangered
species[.2],endangered animal species[0.8],endangered plant species[0.8]
protected species,category of endangered species[0.2],endangered species [0.2]
rare species,category of endangered species[0.2],extinct species[0.2],vanished species[0.2]