the growing pains of a controlled vocabulary

26
1 Karen Loasby 7 March 2005 The growing pains of a controlled vocabulary

Upload: karen-loasby

Post on 01-Nov-2014

2.258 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: The growing pains of a controlled vocabulary

1Karen Loasby 7 March 2005

The growing pains of a controlled vocabulary

Page 2: The growing pains of a controlled vocabulary

2Karen Loasby 7 March 2005

Introduction

• Karen Loasby• Information architect• Worked for BBC for 4 years on search,

navigation, metadata and content management projects

• 2 years previously for the Guardian newspaper archiving the paper and arranging content on the website

• MSc in Information Science from City University, London

Page 3: The growing pains of a controlled vocabulary

3Karen Loasby 7 March 2005

Agenda

• Background

• The problem

• Formal classification vs. Folk tags

• Our middle ground

• What happened

• Learning points

• Questions

Page 4: The growing pains of a controlled vocabulary

4Karen Loasby 7 March 2005

Background

• Content management project

• Regional websites

• Need for metadata

• Authors around the UK

Page 5: The growing pains of a controlled vocabulary

5Karen Loasby 7 March 2005

Page 6: The growing pains of a controlled vocabulary

6Karen Loasby 7 March 2005

Problem

• Faceted classification system

• Authors to tag

• Central control

• But …

• Journalists are the specialists – know the domain and the vocabulary.

Page 7: The growing pains of a controlled vocabulary

7Karen Loasby 7 March 2005

Formal classification

• Pre-determined terms

• Centralised control• Rich relationships

Page 8: The growing pains of a controlled vocabulary

8Karen Loasby 7 March 2005

Folk tags

• What it is then?• Folksonomy,

ethnoclassification, social classification, social categorisation and so on

Page 9: The growing pains of a controlled vocabulary

9Karen Loasby 7 March 2005

Comparing approaches

Formal• High maintenance• Consistent/predictable• Rich relationships• Can be artificial

Folk• Low maintenance• Quirky/surprising• Less added value• Real user language

Page 10: The growing pains of a controlled vocabulary

10Karen Loasby 7 March 2005

A role for both

• Where we are using folk tagging

• And where we won’t– Trust & Authority– High value to business– Missing motivation from users– Broad domain/user base– To avoid tryanny of minority

Page 11: The growing pains of a controlled vocabulary

11Karen Loasby 7 March 2005

An experimental middle ground

• Centralised control of terms

• But encouraging absorption of user language

• Higher maintenance than folk tags

• Cheaper than professional cataloguing

Page 12: The growing pains of a controlled vocabulary

12Karen Loasby 7 March 2005

BBC Experience

Semi-automatic classification

Terms suggested from the CVs

Terms are OK

The suggested terms do not describe

the content

Search or browse for terms

Send suggestion to the CV team

Terms are OK

Send suggestion to the CV team

CV team evaluatesuggestion Say no to the term

– change the classification on

the content object

Add to CV as a variant term

or preferred term

Page 13: The growing pains of a controlled vocabulary

13Karen Loasby 7 March 2005

Operational system

• 8000 requests in 10 months

• From 160 journalists– Average per user of 50 terms– However this varied wildly. Our top user has

suggested 476 terms

Page 14: The growing pains of a controlled vocabulary

14Karen Loasby 7 March 2005

Graph showing variationbetween teams

0

100

200

300

400

500

600

700

800cum

bria

tyne

cam

bridgeshire

guern

sey

leic

este

r

south

york

shire

wilt

shire

suffolk

liverp

ool

mancheste

r

berk

shire

bristo

l

kent

coventr

y

tees

jers

ey

sto

ke &

sta

ffs

nottin

gham

derb

y

hum

ber

som

ers

et

nort

ham

pto

nshire

norf

olk

beds, bucks &

hert

s

leeds

here

ford

& w

orc

s

birm

ingham

Page 15: The growing pains of a controlled vocabulary

15Karen Loasby 7 March 2005

Growth in the CVs

• Up 15000 terms in 10 months

• Most growth in person/proper names • People, venues and organisations• Up by 50% to 35,000

Page 16: The growing pains of a controlled vocabulary

16Karen Loasby 7 March 2005

Growth of facets

CV Requests By Month

0

1000

2000

3000

4000

5000

6000

7000

Month

Qu

an

tity Name

Location

Subject

BBC Brand

Time Period

Page 17: The growing pains of a controlled vocabulary

17Karen Loasby 7 March 2005

Types of terms

• Mostly good– Only 200 terms actually rejected

• Synonyms vs. entirely new terms– New for names (only 2% synonyms)– Synonyms for subject (15% synonyms)– Location – needed colloquial terms

Page 18: The growing pains of a controlled vocabulary

18Karen Loasby 7 March 2005

Resourcing

• Handling the requests from journalists

• First 3 months – one IA

• Subsequently 2 to 3 junior IAs

• Too much – how to reduce?

Page 19: The growing pains of a controlled vocabulary

19Karen Loasby 7 March 2005

Lessons learned

• Success with the journalists– They suggested terms!– Got the faceted classification – Began to suggest terms in “our” format – Some did engage at a detailed level

Page 20: The growing pains of a controlled vocabulary

20Karen Loasby 7 March 2005

Lessons Learnt

• Difficulties for journalists– System looks as if totally automatic as part of

a content management system– “Journalists are people too”

– Users struggling with a content object tagging system; rather than page based

Page 21: The growing pains of a controlled vocabulary

21Karen Loasby 7 March 2005

Example

Subject: Pregnancy

Page 22: The growing pains of a controlled vocabulary

22Karen Loasby 7 March 2005

Lessons Learnt• Difficulties for journalists, cont.

– They find it boring – Makes it harder for the aim of “finding and re-

use” to apply – Needed to do more pre-emptive work for them

Page 23: The growing pains of a controlled vocabulary

23Karen Loasby 7 March 2005

Lessons learnt

• Number of terms suggested depends on– Type of facet– Dynamism of content– Scope of the content– Enthusiasm of users

Page 24: The growing pains of a controlled vocabulary

24Karen Loasby 7 March 2005

Next?

• High value facets still need control– Make use of the metadata(!)– Sell the message– Federated management– Earlier in production

• And for folk tagging?

Page 25: The growing pains of a controlled vocabulary

25Karen Loasby 7 March 2005

Thanks to the IA team for their analysis work;– Jon Carey– Adil Hussein– Christine Rimmer

Page 26: The growing pains of a controlled vocabulary

26Karen Loasby 7 March 2005

Thank you

Questions or comments?

Karen Loasby

[email protected]