controlled vocabulary & thesaurus design planning & maintenance

21
Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Upload: lily-lambert

Post on 27-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Controlled Vocabulary & Thesaurus Design

Planning & Maintenance

Page 2: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Controlled Vocabulary Review

What? What Controlled Vocabulary is right for you?

When? When should the CV be developed and implemented?

Why? Why is this CV a necessary development project?

How? How is the CV going to be developed?

Page 3: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Thesaurus Design Questions

Is a controlled vocabulary really necessary? What is the lowest level of vocabulary that will get the job done?

Will natural language searching be sufficient? Will an interface design improvement alleviate the need for a controlled vocabulary?

Will there be more than one indexer? Is someone available with the time and the skills to develop a thesaurus?

Will someone be available in the future?

Page 4: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Project Justification

Cost of finding (time, frustration) Cost of not finding (bad decisions) Cost of training (staff turnover) Value of discovery (related information, browsing)

Language is ambiguous – synonyms, abbreviations, acronyms, misspellings, homonyms, antonyms, etc.

Page 5: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Project Justification

What are the specific objectives of the project?

Are essential objects hidden in a lot of chaff?

Are a few good objects sufficient? Or is it necessary to find the best, the one that makes a difference, or everything on a topic?

Use easily understood terms like common vocabularies rather than technical terms like taxonomies

Stories tell it best.

Page 6: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Project Justification

“Users of … intranets frequently express frustration with how much time it takes to find items—both when searching for known items and when browsing to see if items on a particular topic exist in the system. . . Browsing and search functions are much enhanced if the indexing and topic hierarchy, or taxonomy, make sense to the user and are customized to reflect the content of the source documents.” Jan Sykes, Information Management Services, February 2001

Page 7: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Project Justification

“Power users find great value in using a known, granular indexing language that can surface the most relevant items and filter out items of peripheral or no interest.” Jan Sykes, Information Management Services, February 2001

“Keyword search captures only 33% of relevant information.” Chris Wilkie, BBC Information and Archives, Sept. 2002

Page 8: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Project Justification

“Most of the complaints we get are due to the way users search – they use the wrong keywords.” Must search stink?, Forrester, 2000

“40% of search failures come from customers and information providers using different terms.” The Business Benefits of Taxonomy, Judi Vernau, SchemaLogic, Oct. 2005

Page 9: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Project Justification

“Knowledge workers spend 35% of their productive time searching for information online, while 40% of the corporate users report that they cannot find the information they need to do their jobs.” Working Council of CIOs, Business Week, Feb. 27, 2001

Page 10: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Process of CV Design

Understand user and organizational needs Define the subject scope Identify sources of ‘raw’ vocabulary Harvest terms (wordstock) that are likely to be search terms in the field

Group the terms into broad categories, subcategories and sub-subcategories

Establish relationships Collect feedback and revise until stable

Page 11: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Involving Users

More user involvement = better suited to use

Take every opportunity to involve users Start from user search logs to find commonly used terms

User experience focus groups Prototyping Solicit community feedback Online discussion groups Surveys Observation Term submissions

Page 12: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Interoperability

Searchers want to search multiple databases at once

Indexers want to use a vocabulary they are familiar with to index objects in a different domain

Content producers want to merge multiple databases indexed using different vocabularies

User communities want a single thesaurus that spans multiple domains

International organizations want a single vocabulary that supports searching in multiple languages

Page 13: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Thesauri can differ in:

Specificity Treatment of synonyms Pre- vs. post-coordination Relationships Warrant Scope

Page 14: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Methods of Integration

Mapping Switching language Integration

Unified Medical Language System’s (UMLS) 3 main components:

Metathesaurus concepts Semantic Network categories SPECIALIST Lexicon indices

Super-language Merging

Page 15: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Simple Knowledge Organization System

Term: Economic cooperation

UF: Economic co-operation

BT: Economic policy

NT: Economic integrationEuropean economic cooperationEuropean industrial cooperationIndustrial cooperation

RT: Interdependence

SN: Includes cooperative measures in banking, trade,

industry etc., between and among countries.

Page 16: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

SKOS

Page 17: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

SKOS

1750

2108

4382

Page 18: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

SKOS

Page 19: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Testing & Evaluation: Methods

Heuristic Evaluation Evaluation by an expert or a panel of experts

Affinity Modeling Task a sample of users with organizing your terms

Compare to your own organization of the terms

Usability Testing Holistic evaluation of the information system, including the content, interface, etc.

Page 20: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Testing & Evaluation: Discussion

Why test a controlled vocabulary? What are some useful criteria for evaluating a controlled vocabulary?

Page 21: Controlled Vocabulary & Thesaurus Design Planning & Maintenance

Developed by the Association of Library Collections & Technical Services and Library of Congress’s Cataloger’s Learning Workshop

Upkeep & Maintenance

Controlled vocabularies as living entities needing New material added Outdated material removed Changes made

Requires a long-term maintenance plan Institution support and resources Someone who is a maintainer

Look to your users for input! Term submissions Search logs

Anticipate change!