long-term ecological research working_groups/controlled_vocabulary working group: “synthesis...

17
CONTROLLED VOCABULARY WORKING GROUP – REPORT SEPTEMBER 2009 Long-Term Ecological Research http:// intranet.lternet.edu/im/news/commi ttees/working_groups/controlled_vo cabulary Working Group: “Synthesis through data discovery and use: Past Present and Future Wed. 10-12pm

Upload: julius-obrien

Post on 23-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

CONTROLLED VOCABULARY WORKING GROUP – REPORT SEPTEMBER 2009

Long-Term Ecological Research

http://intranet.lternet.edu/im/news/committees/working_groups/controlled_vocabulary

Working Group: “Synthesis through data discovery and use: Past Present and Future Wed. 10-12pm

Page 2: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

AGENDA FOR VOCAB WORKING GROUP Background and Past Activities Finalizing the list – who approves?

Procedures for managing the list Next steps

Tool development Keywording Searching

Hierarchies/polytaxonomys/thesauri/ontologies

Page 3: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

THE PROBLEM

For past activities, see the report at: http://intranet.lternet.edu/im/node/114 and http://intranet.lternet.edu/archives/documents/Newsletters/DataBits/06spring/

Summary: Eclectic keywords make searching difficult –

most terms are used only once! No easy way to group or organize similar

datasets to facilitate “browse” searches

Page 4: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

STEPS TAKEN

Assembled list of LTER EML Keywords Cross linked that list to:

NBII Thesaurus Words GCMD Keywords Metacat Searchers

Edited Changed words to preferred forms (kept

track of synonyms) Removed specific places, taxonomic names

Page 5: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

STEPS TAKEN

Selected Keywords shared with GCMD and NBII, or Keywords used at more than one LTER site

Reviewed Removals and additions were suggested Voting via SurveyMonkey

Edited Added words voted for Removed words voted against When vote was close – went with current

status

Page 6: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

THE LIST

640 keywords 148 synonyms 201 NBII keywords 21 GCMD keywords

Page 7: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

LTER SCIENCE KEYWORD LIST 1.0???

Is additional editing required? Who decides if it is an LTER “official” list?

And what does it mean if it is? What procedures should be followed for

subsequent editing of the list? Who should manage the list database?

Term Scope Definition Synonyms

Page 8: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

NEXT STEPS - TOOLS

Autocomplete search tool - Duane Costa

Autocomplete keywording tool - Duane Costa

Update-document-keywords tool? Advanced search tool?

Page 9: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

NEXT STEPS HIERARCHIES

There is general agreement that keywords are most useful when they can be tied to other keywords

How do we create the needed keyword taxonomy(s)? Barbara Benson has done some work

looking at other hierarchies (KNB, GCMD) Giri Palanisamy has sent us the broader,

narrower and related terms for the ~1/3 of the words that are also in the NBII thesaurus

Page 10: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

HIERARCHIES - RATIONALE

the existing KNB browse hierarchy is rather limited (the LTER version that gives the number of hits is a good feature)

a browse hierarchy could be useful to sites in developing one at the site

it could be hooked into any tools that are developed to assist in assigning keywords to datasets

it could be used in a tool that enables the creation of a browse hierarchy from a keyword list

it could assist in searches done by keywords in offering an option to go up a level from the keyword to a broader concept and thus yield a high number of hits in the search

Page 11: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

NEXT STEPS – OTHER LISTS

Taxonomic and place keywords were excluded from the science keywords Do we need a gazetteer for places? Do we need taxonomic lists & tools for

taxonomic information? Are there other types of lists that are

needed?

Page 12: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

DISCUSSION TOPICS

Feedback on tools Ideas for additional tools

Hierarchy

Page 13: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

AROUND THE ROOM – NEXT STEP

LTER words emerging organically Not just general search Other efforts

Vegetation ecology community interested in ontologies for vegetation traits

LTER words are not specialized Would be good to keep in touch with other efforts

SONET – intercommunication (Gries) critical Rob Raskin taking GCMD and ontologizing it NASA is developing “Suite” – upper level ontology

Semtools – (O’Brien) – using Morpho and making it better database management system – using subsumption hierarchies in OWL

OWL allows use of generic applications (JENA) – standard format

Page 14: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

AROUND THE ROOM

Autocompletion tools helpful for NEW EML But need tools for updating existing metadata Having a first cut of recommendations would help

Tool that does suggestions based on document content would be helpful Semantic annotation Hook to parents, children and related

Educate PI’s on using list is important Just availability of list is important

Page 15: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

NEXT STEPS

Automatic annotation with broader terms

Identify “unfindable” datasets – what datasets have no LTER Keywords or synonyms? Go dataset by dataset and see which have

no hits EML is limited in how it assigns

keyword lists Could target tools at keyword set Namespacing control could be relaxed to

go beyond “theme” and “place”

Page 16: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

NEXT STEPS

Ecotrends – predated LTER list Would have been good to have LTER list Eventually would like to integrate May be able to exploit synonym rings

When title and dataset don’t match – Title says “Productivity” but attribute is “biomass” need to examine holistically

Linking terms to definitions needed Also taxonomic database would be useful for

“bugs” (true bugs vs insects)

Page 17: Long-Term Ecological Research  working_groups/controlled_vocabulary Working Group: “Synthesis through data

NEXT STEPS

Practices in design When develop – always think about how

they are tied to organizational routines Think proactively about how to make it

routine – getting people to think in categories

Pursue Polytaxonomys based on Barbara’s list

Develop synonym list further See how keyword lists match AND has 3-level hierarchy

Start at top or bottom in adding….