long-term ecological research working_groups/controlled_vocabulary working group: “synthesis...
TRANSCRIPT
CONTROLLED VOCABULARY WORKING GROUP – REPORT SEPTEMBER 2009
Long-Term Ecological Research
http://intranet.lternet.edu/im/news/committees/working_groups/controlled_vocabulary
Working Group: “Synthesis through data discovery and use: Past Present and Future Wed. 10-12pm
AGENDA FOR VOCAB WORKING GROUP Background and Past Activities Finalizing the list – who approves?
Procedures for managing the list Next steps
Tool development Keywording Searching
Hierarchies/polytaxonomys/thesauri/ontologies
THE PROBLEM
For past activities, see the report at: http://intranet.lternet.edu/im/node/114 and http://intranet.lternet.edu/archives/documents/Newsletters/DataBits/06spring/
Summary: Eclectic keywords make searching difficult –
most terms are used only once! No easy way to group or organize similar
datasets to facilitate “browse” searches
STEPS TAKEN
Assembled list of LTER EML Keywords Cross linked that list to:
NBII Thesaurus Words GCMD Keywords Metacat Searchers
Edited Changed words to preferred forms (kept
track of synonyms) Removed specific places, taxonomic names
STEPS TAKEN
Selected Keywords shared with GCMD and NBII, or Keywords used at more than one LTER site
Reviewed Removals and additions were suggested Voting via SurveyMonkey
Edited Added words voted for Removed words voted against When vote was close – went with current
status
THE LIST
640 keywords 148 synonyms 201 NBII keywords 21 GCMD keywords
LTER SCIENCE KEYWORD LIST 1.0???
Is additional editing required? Who decides if it is an LTER “official” list?
And what does it mean if it is? What procedures should be followed for
subsequent editing of the list? Who should manage the list database?
Term Scope Definition Synonyms
NEXT STEPS - TOOLS
Autocomplete search tool - Duane Costa
Autocomplete keywording tool - Duane Costa
Update-document-keywords tool? Advanced search tool?
NEXT STEPS HIERARCHIES
There is general agreement that keywords are most useful when they can be tied to other keywords
How do we create the needed keyword taxonomy(s)? Barbara Benson has done some work
looking at other hierarchies (KNB, GCMD) Giri Palanisamy has sent us the broader,
narrower and related terms for the ~1/3 of the words that are also in the NBII thesaurus
HIERARCHIES - RATIONALE
the existing KNB browse hierarchy is rather limited (the LTER version that gives the number of hits is a good feature)
a browse hierarchy could be useful to sites in developing one at the site
it could be hooked into any tools that are developed to assist in assigning keywords to datasets
it could be used in a tool that enables the creation of a browse hierarchy from a keyword list
it could assist in searches done by keywords in offering an option to go up a level from the keyword to a broader concept and thus yield a high number of hits in the search
NEXT STEPS – OTHER LISTS
Taxonomic and place keywords were excluded from the science keywords Do we need a gazetteer for places? Do we need taxonomic lists & tools for
taxonomic information? Are there other types of lists that are
needed?
DISCUSSION TOPICS
Feedback on tools Ideas for additional tools
Hierarchy
AROUND THE ROOM – NEXT STEP
LTER words emerging organically Not just general search Other efforts
Vegetation ecology community interested in ontologies for vegetation traits
LTER words are not specialized Would be good to keep in touch with other efforts
SONET – intercommunication (Gries) critical Rob Raskin taking GCMD and ontologizing it NASA is developing “Suite” – upper level ontology
Semtools – (O’Brien) – using Morpho and making it better database management system – using subsumption hierarchies in OWL
OWL allows use of generic applications (JENA) – standard format
AROUND THE ROOM
Autocompletion tools helpful for NEW EML But need tools for updating existing metadata Having a first cut of recommendations would help
Tool that does suggestions based on document content would be helpful Semantic annotation Hook to parents, children and related
Educate PI’s on using list is important Just availability of list is important
NEXT STEPS
Automatic annotation with broader terms
Identify “unfindable” datasets – what datasets have no LTER Keywords or synonyms? Go dataset by dataset and see which have
no hits EML is limited in how it assigns
keyword lists Could target tools at keyword set Namespacing control could be relaxed to
go beyond “theme” and “place”
NEXT STEPS
Ecotrends – predated LTER list Would have been good to have LTER list Eventually would like to integrate May be able to exploit synonym rings
When title and dataset don’t match – Title says “Productivity” but attribute is “biomass” need to examine holistically
Linking terms to definitions needed Also taxonomic database would be useful for
“bugs” (true bugs vs insects)
NEXT STEPS
Practices in design When develop – always think about how
they are tied to organizational routines Think proactively about how to make it
routine – getting people to think in categories
Pursue Polytaxonomys based on Barbara’s list
Develop synonym list further See how keyword lists match AND has 3-level hierarchy
Start at top or bottom in adding….