implementing a taxonomy in a content management portal content week 2005 miami, florida monday,...

98
Implementing a Taxonomy in a Content Management Portal Content Week 2005 Miami, Florida Monday, January 31, 2005 Workshop H 2:45pm – 4:45 pm Marjorie M.K. Hlava Access Innovations, Inc. 505-998-0800 [email protected] www.accessinn.com

Upload: jared-warner

Post on 18-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Implementing a Taxonomy in a Content Management Portal

Content Week 2005Miami, Florida

Monday, January 31, 2005Workshop H

2:45pm – 4:45 pm

Marjorie M.K. HlavaAccess Innovations, Inc.

[email protected]

www.accessinn.com

Introductions

• Name• Project• Expectations for these two short hours• Please fill in the sign up sheet• Would you like

– 1. Copy of this presentation?– 2. Sample software?– 3. Other information?

Copyright © 2005 Access Innovations, Inc.

What will we talk about this afternoon?

• 1.Definitions• 2.Where taxonomy fits in the Information Circle• 3.Where to use a taxonomy• 4.Taxonomies for Communities of Practice• 5.Surrounding theories and applications• 6.How to build and maintain• 7.How is used in enterprise information

Thesaurus Master

Data Feed

MAIto add Metadata

Database Management System Add

Metadata using MAI

Search

Inverted File

Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

1. Definitions

Copyright © 2005 Access Innovations, Inc.

What is a taxonomy?

• A hierarchical thesaurus with authority terms applied at the final node

• A browse-able web interface• A Linnaean System• A browse- able list with the term instance at

the final leaf

Copyright © 2005 Access Innovations, Inc.

Types of Taxonomies

• Naming and organizing things into groups that share similar characteristics

• 1. Flat – just a list• 2. Hierarchical

– Taxonomic view• 3. Faceted

– Sorted by a single charasteristic – Metadata - Dublin Core– COSATI -GILS

• 4. Thesaurus– Term records– Database backend– Easier to modify and maintain

Copyright © 2005 Access Innovations, Inc.

Taxonomy in meta data

• Definition– Taxonomy is a thesaurus in its hierarchical view

with the authority files applied at the final nodes– It allows the browse-able front end to a portal– It provides keyword and name access to the

content in the portal

Copyright © 2005 Access Innovations, Inc.

Taxonomy definition

• A taxonomy is a thesaurus in hierarchical view with authority file terms added at the final nodes

• Thesaurus• Authority file• Hierarchical form• Final nodes

Copyright © 2005 Access Innovations, Inc.

Thesaurus

• Concepts • Methods• Procedures

• Cognitive approach• The knowledge capture piece• The topics or subjects

Copyright © 2005 Access Innovations, Inc.

Authority file

• People• Places• Things

• The tangible approach• Concrete Entities

Copyright © 2005 Access Innovations, Inc.

Hierarchical view

• Gives the Portal view• The view of all the preferred terms in

categorized order• An outline of the thesaurus

Copyright © 2005 Access Innovations, Inc.

Final Nodes• The last position on the hierarchical tree

– Taxonomy• concept

– narrower terms» final node - people, place or thing term» document instance» Letter to George Wiesman Dec 12, 2003» Technical report number TR-1039» Museum artifact 1706 wodden wagon wheel

Copyright © 2005 Access Innovations, Inc.

Term Records – the Database Part

• Associative terms– Related terms

• Equivalence terms– Preferred and non preferred– Use and used for– Synonyms

• Hierarchical terms– Broader narrower terms– Parent Child

Copyright © 2005 Access Innovations, Inc.

Other term record fields

• Scope notes• Cross references• History• Term Status• Category• User defined

Copyright © 2005 Access Innovations, Inc.

2. Where does a taxonomy fit in the information circle?

Copyright © 2005 Access Innovations, Inc.

Information Circle - Overview

Taxonomy

User

Content

Output

Copyright © 2005 Access Innovations, Inc.

Content

Taxonomy

User

Content

Output

•Web Pages•White Papers•Research Reports•Licensed Data Feeds•Intranet•Internal Reports•Lotus Notes files•Databases•Public Relations Documents/Press Releases•Market Research Reports•Customer Relationship Management (CRM)•HR Files•Accounting/Financial Records•Legal Documents•Patents•Museum artifacts

Copyright © 2005 Access Innovations, Inc.

Taxonomy

User

Content

Output

Content – cont’d

HTML – Meta name / KeywordsDB – Field / Meta tag / ElementXML – Entity table for valid values

Content Creation:

Copyright © 2005 Access Innovations, Inc.

Taxonomy

Taxonomy

User

Content

Output

Taxonomy is applied to new and existing content:

Meta Tags

Thesaurus TermsAuthority Terms

DateAuthor

Descriptionetc.

Rule Base Taxonomy

Copyright © 2005 Access Innovations, Inc.

Taxonomy – cont’d

Taxonomy

User

Content

Output

Index data - Manually - Automatically

Suggest new candidate terms

Review

Copyright © 2005 Access Innovations, Inc.

Output

Taxonomy

User

Content

Output

Searchable Data

- Internal Data - External Data

Copyright © 2005 Access Innovations, Inc.

User

Taxonomy

User

Content

Output

Web Browsing/Searching

Database Browsing/Searching

Query Resolution

Copyright © 2005 Access Innovations, Inc.

User – cont’d

Taxonomy

Output

User

Content

User Input - Suggested Candidate Terms - New Documents

Reports Based on User Search - Search Logs - Null Hits (These will also suggest new candidate terms)

Copyright © 2005 Access Innovations, Inc.

New Content

Taxonomy

User

NewContent

Output

The cycle begins again

Copyright © 2005 Access Innovations, Inc.

Information Circle - Overview

Taxonomy

User

Content

Output

Copyright © 2005 Access Innovations, Inc.

3. Where to use a taxonomy

• Link the Taxonomy and Indexing • Always in sync with the industry• Keep up to date with terminology• Automatically index the old data• Filter newsfeeds• Search using the Taxonomy• File using the taxonomy• Spell check using the taxonomy• Link to translation system• Catalog using the taxonomy• Index a book

Copyright © 2005 Access Innovations, Inc.

Copyright © 2005 Access Innovations, Inc.

Copyright © 2005 Access Innovations, Inc.

Copyright © 2005 Access Innovations, Inc.

Thesaurus Master

Copyright © 2005 Access Innovations, Inc.

Copyright © 2005 Access Innovations, Inc.

Database Management

System - Add Metadata

using MAI

Search

Inverted File

AadvarkAlligator

AppleAdvantage

….Zebra

Record locatorAccessinn.com/12345/demofile/recid15

Database records

Each with many

elements

Portal Searching

Copyright © 2005 Access Innovations, Inc.

Search

Inverted File

AadvarkAlligatorApple

Advantage….

Zebra

Record locatorAccessinn.com/12345/demofile/recid15

Database records

Each with many elements

Portal Searching

Many data bases can be reached

Copyright © 2005 Access Innovations, Inc.

4. Taxonomies forCommunities of Practice

Copyright © 2005 Access Innovations, Inc.

Taxonomies in a Community of Practice

• Nature of Communities of Practice (CoP)• Taxonomies in context• Value of taxonomies• Creating a taxonomy• Applying the taxonomy

Copyright © 2005 Access Innovations, Inc.

Nature of CoPs

• Free flowing, loosely structured

• Simple, ad hoc categorization

• Active CoPs need organization

• Search tends to be hit-or-miss

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Copyright © 2005 Access Innovations, Inc.

Taxonomies in Context

A taxonomy aspires to be:• a correlation of the different functional, regional

and (possibly) national languages used by a community of practice

• a support mechanism for navigation• a support tool for search engines and knowledge

maps• an authority for tagging documents and other

information objects• a knowledge base in its own right

Reference: “Taxonomies: the vital tool of information architecture”, www.tfpl.com

Copyright © 2005 Access Innovations, Inc.

Value of Taxonomies

• Improves organization & structure• Facilitates navigation• Facilitates knowledge discovery• Reduces effort• Saves time

“Taxonomies are better created by professional indexers or librarians than by domain experts.”

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Copyright © 2005 Access Innovations, Inc.

Naval Postgraduate School’s Homeland Security Taxonomy (1)

Copyright © 2005 Access Innovations, Inc.

Naval Postgraduate School’s Homeland Security Taxonomy (2)

Copyright © 2005 Access Innovations, Inc.

IBM Insight graphical view

Copyright © 2005 Access Innovations, Inc.

Applying a Taxonomy (1)

Manually• Add terms into

meta data fields• Design

navigation & site indexes with taxonomy hierarchy

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Incorporating Hierarchical Classification from a Taxonomy

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Applying a Taxonomy (2)

System integration• Search & retrieval

systems• Auto-assignment

of metadata• Categorization

systems

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Applying the Taxonomy to a Digital Library

Web portal

Locally held

documents

Public repositories

Commercial data sources

Agency data sources

INTERNET (public)

spiders

Meta-Search Tool

Filtered content

Search engineSearch engine

Search engine

Search engineSearch engine

Automated categorization

Library catalogs

Search engine

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Copyright © 2005 Access Innovations, Inc.

5. Surrounding theories and applications

Copyright © 2005 Access Innovations, Inc.

Other Vocabulary types

• Uncontrolled lists• Classification System• Subject headings• Controlled vocabulary

– usually synonyms and spelling• Authority files• Thesaurus• Taxonomy

Copyright © 2005 Access Innovations, Inc.

Uncontrolled list - define

• Add terms as they occur• No cross reference• Simple flat structure

Copyright © 2005 Access Innovations, Inc.

Controlled term lists - defined

• State the preferred terms • Provide allowed term entry• Heavily cross referenced• Not generally hierarchical• Popular• Easy to create

Copyright © 2005 Access Innovations, Inc.

Controlled term list - format

• Cars – use Automobiles

• Personal Computer – use Microcomputer

Copyright © 2005 Access Innovations, Inc.

Classification vs Subject Headings

• Classification– single spot or placement– browse physical list– often a numbering system– clear hierarchy– no or few cross references

Copyright © 2005 Access Innovations, Inc.

Classification vs Subject Headings

• Subject headings– generic search– hidden classification system– related terms and cross references in heavy use– Usually the inverted form

• cells, electric

– Alphabetic access

Copyright © 2005 Access Innovations, Inc.

Authority systems - defined

• Lists of terms in the preferred format for use• Frequently have cross references• Widely available• Frequently coded lists• Brand names

Copyright © 2005 Access Innovations, Inc.

Authority lists - examples

• ISO Country Name and Code– International Standards Organization

• ISO Language list• NAICS (SIC)

– Standard Industrial Classification Code (SIC)– Replaced by– North American Industrial Classification System

(NAICS)

Copyright © 2005 Access Innovations, Inc.

What is a thesaurus?

• Jessica L. Milstead. All Rights Reserved• “For writers, it is a tool like Roget’s one with words grouped

and classified to help select the best word to convey a specific nuance of meaning.

• For indexers and searchers, it is an information storage and retrieval tool: a listing of words and phrases authorized for use in an indexing system, together with relationships, variants and synonyms, and aids to navigation through the thesaurus”

• www.jelem.com

Copyright © 2005 Access Innovations, Inc.

Thesaurus - defined

• For information retrieval 1960’s– indexing either intellectual or automatic– in searching– searching but not indexing– indexing but not searching– hierarchical view for searching

Copyright © 2005 Access Innovations, Inc.

Thesaurus - defined

• Monolingual - standard– British – English - ISO 5578– American – English –ANSI/NISO Z39.19

• Multilingual – standard ISO 5579– concept mapping– Eurovoc

• Discipline or Mission based - ad hoc

Copyright © 2005 Access Innovations, Inc.

Thesaurus -standard format

• Main Entries• Top Terms - TT• Broader Terms - BT • Narrower Terms - NT• RELATED TERMS - RT• Scope Notes - SN• History - HI• Date term added/changed - DA

Copyright © 2005 Access Innovations, Inc.

Standards

• Monolingual– NISO / ANSI – Z39.19– ISO 5578

• Multilingual– ISO 5579

Copyright © 2005 Access Innovations, Inc.

ISO Standards

• Set up already - easy to adopt• Multiple broader terms• The standards outline procedures

– ISO -better for implementation– NISO much better reading

Copyright © 2005 Access Innovations, Inc.

Why do we index ?

• Improve precision– define scope of terms

• Improve recall– different terms for same concept

• Guide to a field of expertise• Learning tool• Richer expression

Copyright © 2005 Access Innovations, Inc.

Uses ?

• Indexing*– …process by which subject terms or classification symbols

are assigned to concepts in documents

– A thesaurus is also known as an indexing language

– * not the building of the inverted file in computer sense of indexing

Copyright © 2005 Access Innovations, Inc.

What are we controlling ?

• Synonyms– different terms same concept

• Polysemes or Homonyms– same word different meanings– Lead– Reading

Copyright © 2005 Access Innovations, Inc.

How ?

• Meaning– delineation of scope of a term

• Term equivalence– linking of synonyms

• Disambiguation of homonyms– lead (metal)– lead (element)– lead (management)

Copyright © 2005 Access Innovations, Inc.

Precision options

• Language specificity• Coordination• Compound terms - level of

precoordination• Homographs and scope notes• Word distance indication

Copyright © 2005 Access Innovations, Inc.

Precision options

• Structural relationships• Links and roles• Treatment and aspect codes• Weighting

Copyright © 2005 Access Innovations, Inc.

Disambiguation

Bill Invoice

Bill Legislative

Bill Sport

Bill Person

Copyright © 2005 Access Innovations, Inc.

Disambiguation

Bills Invoices

Bills Legislation

Bill Animal

Bill Person

PT

NT BT

RTRT

BTNT

Copyright © 2005 Access Innovations, Inc.

6. How to build and maintain a taxonomy

Copyright © 2005 Access Innovations, Inc.

How to build a taxonomy

• Collect the terms• Pull out authority terms• Organize into arrays• Choose top terms• Organize hierarchically• Flesh out term records• Test, review, and edit

Copyright © 2005 Access Innovations, Inc.

Or said another way …

• Define scope• Collect terms and relationships• Identify existing taxonomies• Identify resources• Create & refine taxonomy• Apply taxonomy• Review and update

Copyright © 2005 Access Innovations, Inc.

Maintain

• Steady stream of terms– Web logs – Null sets– New announcements– Indexing team– Library– Records managers– Etc.

• Candidate terms• Out of date is nearly useless

Copyright © 2005 Access Innovations, Inc.

Best Results Measures

• Accuracy• Productivity• Hits, Misses and Noise• Precision (Recall)• Relevance• Ease of set up• Time to production

Copyright © 2005 Access Innovations, Inc.

Integration

• Thesaurus– full featured– multiple views– multiple versions– multiple languages

• Automatic indexing– filtering– assisted

• Data Harmony MAI and Thesaurus Master

Copyright © 2005 Access Innovations, Inc.

Visual Taxonomy

• Ways to look– Hierarchical– Alphabetic – by term– Ring diagrams– Topic maps– Related terms

Visual Taxonomy

Copyright © 2005 Access Innovations, Inc.

API to Many Systems for CMS

Copyright © 2005 Access Innovations, Inc.

Apply to the meta data

• Automatic application?• Spider setting internally• External web crawls – use all aliases• Filter data• Enhance search experience

Copyright © 2005 Access Innovations, Inc.

Meta data

• The fields• The elements

– Class codes– Title– Author– Plaintiff– Product– subject / topic

• Meta Name Keywords in HTML

Copyright © 2005 Access Innovations, Inc.

Copyright © 2005 Access Innovations, Inc.

7. How Taxonomies are used in Enterprise Information

Copyright © 2005 Access Innovations, Inc.

Brand is repeated in several spots and tied to search as well

Another way of listing brands

Category list from taxonomy is tied to brand

list and product list

Category code from the taxonomy is tied to the brand

list and the product list

Copyright © 2005 Access Innovations, Inc.

Enterprise Taxonomy Management• Consistent application across entire site• Synonyms are used interchangeably• User doesn’t need to know the taxonomy• Pop up view is helpful• Site map for construction and browsing• Allows hidden sections for internal use

Copyright © 2005 Access Innovations, Inc.

Taxonomies

• Form the basis for knowledge sharing• Add value to discussion• Allow deeper retrieval • Are straightforward to create• Require on-going maintenance

Copyright © 2005 Access Innovations, Inc.

Your Taxonomy

• There is too much information to pile it on the floor.

• It fits in many places in the information flow

Copyright © 2005 Access Innovations, Inc.

Data Feed

Thesaurus Master

MAIto add Metadata

Database Management System Add

Metadata using MAI

Search

Inverted File

Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Thank you for your time!Questions?

Marjorie M.K. Hlava

Access Innovations, Inc.

505-998-0800

[email protected]

www.accessinn.com