strategies llctaxonomy sept. 28, 2005copyright 2005 taxonomy strategies llc. all rights reserved....

25
Strategies LLC Taxonomy Sept. 28, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Ron Daniel Taxonomy Strategies LLC [email protected] Frequently Asked Questions about Taxonomies and Metadata

Upload: griffin-knight

Post on 16-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Strategies LLCTaxonomy

Sept. 28, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

Ron DanielTaxonomy Strategies LLC

[email protected]

Frequently Asked Questions about Taxonomies and Metadata

2Taxonomy Strategies LLC The business of organized information

Pop Quiz

On a blank piece of paper:

• What questions did you want to have answered by coming to today’s talks?

• What new questions do you have, based on what you’ve learned from the previous presentations?

Flag one question to be answered later.

You do NOT have to provide your name.

Please DO provide your job title, division, and either company or company type.

3Taxonomy Strategies LLC The business of organized information

Agenda

Pop Quiz FAQs – Frequently Asked Questions SAQs – Seldom Asked Questions Today’s Questions

4Taxonomy Strategies LLC The business of organized information

What is a taxonomy – just a folder structure or something else?

Irony in action – there is no agreed definition of what a “taxonomy” is. When talking with someone about taxonomy, make sure

you are talking about the same things.

We look at taxonomies and metadata together. The metadata specification will call for several fields

that take pre-defined lists of values. Those lists, flat or hierarchical, are “facets” within the

overall taxonomy.

5Taxonomy Strategies LLC The business of organized information

Other things sometimes called taxonomy

Type Remarks

Synonym Ring Connects a series of terms together Treats them as equivalent for search purposes

e.g (Dog, Canine, Pooch, Mutt) (Cat, Feline, Kitty), …

Authority File Used to control variant names with a preferred term Typically used for names of countries, individuals, organizations

e.g. (IBM, Big Blue, International Business Machines Inc.)

Classification Scheme

A hierarchical arrangement of terms May or may not follow strict “is-a” hierarchy rules Usually enumerated; ie, LC or Dewey

Thesaurus Expresses semantic relationships of: • Hierarchy (broader & narrower terms)• Equivalence (synonyms) • Associative (related terms)

May include definitions

Ontology Resembles faceted taxonomy but uses richer semantic relationships among terms and attributes and strict specification rules

A model of reality

6Taxonomy Strategies LLC The business of organized information

How do taxonomies actually improve search?

Input (Query) Side

“Search” using a small set of pre-defined values instead of trying to guess what word or words might have been used in the content.

Have synonyms mapped together so searches for “car” and “automobile” return the same things.

Output (Results) Side

Organize search results into groups of related items.

Sorting and filtering

Refinement

7Taxonomy Strategies LLC The business of organized information

Taxonomy in action on the results side

Position Category

Company

City

State

Salary

8Taxonomy Strategies LLC The business of organized information

Who should build the taxonomy?

The taxonomy (and metadata specification) should be produced by a cross-functional team which includes business, technical, information management, and content creation stakeholders.

The team should plan on maintaining the taxonomy as well as building it. Maintenance will not (usually) be anyone’s full-time job. Exact mix of people on team will change.

It should be built in an iterative fashion, with more content and broader review for each iteration.

9Taxonomy Strategies LLC The business of organized information

How big should the taxonomy be?

Consultant’s answer – “It depends” How much content do you need to organize? How fine-grained does the categorization need to be?

Overly-simplistic method: Nterms = # items / desired bucket size (1 M documents, 100 documents / bucket = > 10k buckets) Bad method – documents don’t distribute evenly

Second method: # facets ≈ Log(# items) ± 2 (1 M items => 5..7 facets) Sum of terms across all facets < 1200 in most cases

10Taxonomy Strategies LLC The business of organized information

How do we know we have a good taxonomy?

Method Process Who Requires Validation

Walk-thru Show & explain Taxonomist SME Team

Rough taxonomy

Approach Appropriateness to task

Walk-thru Check conformance to editorial rules

Taxonomist Draft taxonomy

Editorial Rules

Consistent look and feel

Usability Testing

Contextual analysis (card sorting, scenario testing, etc.)

Users Rough taxonomy

Tasks & Answers

Tasks are completed successfully Time to complete task is reduced

User Satisfaction

Survey Users Rough Taxonomy

UI Mockup Search

prototype

Reaction to taxonomy Reaction to new interface Reaction to search results

Tagging Samples

Tag sample content with taxonomy

Taxonomist Team Indexers

Sample content

Rough taxonomy (or better)

Content ‘fit’ Fills out content inventory Training materials for people &

algorithms Basis for quantitative methods

11Taxonomy Strategies LLC The business of organized information

Taxonomy validation: Tagging content How many items?

GoalNumber of

Items Criteria

Illustrate metadata schema 1-3 Random (excluding junk)

Develop training documentation

10-20 Show typical & unusual cases

Qualitative test of small vocabulary (<100 categories)

25-50 Random (excluding junk)

Quantitative test of vocabularies

3-10X number of categories

Use computer-assisted methods when more than 10-20 categories. Pre-existing metadata is the most meaningful.

The best way to validate a taxonomy is to use it to tag some content.

12Taxonomy Strategies LLC The business of organized information

Taxonomy validation: Closed card sorting

Useful to validate whether the terms in a taxonomy are organized in a way that is commonly understood. Ask people to sort narrower terms in a taxonomy into

the broad categories or facets. The card sort is considered closed if you provide the

names of those broad categories. Ask people if there are facets that they think should be

added and why. 15-20 users are sufficient to get useful feedback.

13Taxonomy Strategies LLC The business of organized information

Taxonomy validation: Quantitative MethodHow evenly does it divide the content?

Documents will not distribute uniformly across categories

Zipf (1/x) distribution is expected behavior

80/20 rule in action (actually 70/20 rule)

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

Congre

sses

Biogra

phy

Period

icals

Map

s

Fiction

Exhib

itions

Juve

nile l

itera

ture

Bibliog

raph

y

Statis

tics

Top 10 Content Types

Nu

mb

er o

f R

eco

rds

Leading candidate for splitting

Leading candidates for merging

Above the curve is better than expected

14Taxonomy Strategies LLC The business of organized information

What if I have to do it solo?

Realize: Its not totally solo – IT help,

Graphics & UI help, Business Goals help, Funding help, Review & QA help…

You are the general contractor It needs to be part of your

objectives Limit the objectives to what can be

achieved by you, and by your organization

Concentrate: Resource allocation

(i.e. Manage your time) Fundamental processes

Query log examination Error correction procedure

Communications!!!

Cherry-pick from Roles on a larger team: Business Lead – align with

organization goals, get needed resources, make cost/benefit decisions, report upstairs

IT Liaison – Work with IT specialists to get software installed, logs gathered, content harvested, etc. Consider impact of changes on tools and data

Taxonomy / Search Specialist – analyze behavior and suggest changes. Implement changes which pass cost/benefit muster

Website/User Representative – consider impact of changes on users and job performance

15Taxonomy Strategies LLC The business of organized information

Where do the benefits come from?Common taxonomy ROI scenarios

Catalog site - ROI based on increased sales through improved: Product findability Product cross-sells and up-sells Customer loyalty

Call center - ROI based on cutting costs through: Fewer customer calls due to improved website self-service Faster, more accurate CSR responses through better information access

Compliance – ROI based on: Avoiding penalties for breaching regulations Following required procedures (e.g. Medical claims)

Knowledge worker productivity - ROI based on cutting costs through: Less time searching for things Less time recreating existing materials, with knock-on benefits of less confusion and

reduced storage and backup costs

Executive mandate No ROI at the start, just someone with a vision and the budget to make it happen

16Taxonomy Strategies LLC The business of organized information

Agenda

Pop Quiz FAQs – Frequently Asked Questions SAQs – Seldom Asked Questions Your Questions

17Taxonomy Strategies LLC The business of organized information

What should I be thinking about at the start of a taxonomy project?Taxonomy development and maintenance is the LEAST of three

problems: The Taxonomy Problem: How are we going to build and maintain the lists

of pre-defined values that can go into some of the metadata elements?

The Tagging Problem: How are we going to populate metadata elements with complete and consistent values?

What can we expect to get from automatic classifiers? What kind of error detection and error correction procedures do we need? What fields do we need?

The ROI (Return On Investment) Problem: How are we going to use content, metadata, and vocabularies in applications to obtain business benefits?

More sales? Lower support costs? Greater productivity? Risk avoidance? How much content? How big an operating budget? How to expose to users?

Business Goals and Cultural Factors are major influences on tagging and taxonomy. These must be acknowledged at the start to avoid rework.

18Taxonomy Strategies LLC The business of organized information

What must change when the Taxonomy changes?

There’s more to maintaining the Taxonomy than maintaining just the taxonomy.

The master copy of the taxonomy.

Announcements for stakeholders!

The information sent to downstream users of the taxonomy.The versions and formats of the taxonomy distributed to others.The list of changes.

The data tagged with the taxonomy?

The user interface which uses the taxonomy?

Backend system software which uses the taxonomy?

The training set for automatic classifiers?

The educational material for users, catalogers, programmers, etc.?

19Taxonomy Strategies LLC The business of organized information

Agenda

Pop Quiz FAQs – Frequently Asked Questions SAQs – Seldom Asked Questions Your Questions

20Taxonomy Strategies LLC The business of organized information

Backup Slides

21Taxonomy Strategies LLC The business of organized information

Why do we usually recommend faceted taxonomies? Categorize in multiple,

independent, categories.

Allow combinations of categories to narrow the choice of items.

4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104) Easier to maintain Easier to reusue existing

material Can be easier to navigate, if

software supports it

Main Ingredients

Cooking Methods

Meal Type Cuisines

• Chocolate• Dairy• Fruits• Grains• Meat &

Seafood• Nuts• Olives• Pasta• Spices &

Seasonings• Vegetables

• Breakfast• Brunch• Lunch• Supper• Dinner• Snack

• African• American• Asian• Caribbean• Continental• Eclectic/

Fusion/ International

• Jewish• Latin American• Mediterranean• Middle Eastern• Vegetarian

• Advanced• Bake• Broil• Fry• Grill• Marinade• Microwave• No Cooking• Poach• Quick• Roast• Sauté• Slow

Cooking• Steam• Stir-fry

42 values to maintain (10+6+11+15)

9900 combinations (10x6x11x15)

22Taxonomy Strategies LLC The business of organized information

What could possibly go wrong with a little edit?

ERP (Enterprise Resource Planning) team made a change to the product line data element in the product hierarchy.

They did not know this data was used by downstream applications outside of ERP.

An item data standards council discovered the error.

If the error had not been identified and fixed, the company’s sales force would not be correctly compensated.

“Lack of the enterprise data standards process in the item subject area has cost us at least 30 person

days of just ‘category’ rework.”

Source: Danette McGilvray, Granite Falls Consulting, Inc.

22

23Taxonomy Strategies LLC The business of organized information

When should we NOT use facets?

When you have to work with software that can’t handle them. Remember, software is replaced but data is migrated.

When you need to use an existing standard taxonomy.

…By Content Type

Calendars & EventsTop Links…

HolidaysUpcoming Events

Federal Reserve System…Beige BookBoard of GovernorsFOMC

More Calendars & Events…ERACOfficer AvailabilityStaff ConferenceToastmastersTours

DirectoriesDocumentationFormsNewsPolicies & Procedures

By OrganizationFederal Reserve SystemFRB Atlanta

Board of DirectorsExecutive OfficeManagement CommitteeResearch DivisionS&R Division

Facets can help you build a useful

hierarchy. This one is

a mix of content type and

organization.

24Taxonomy Strategies LLC The business of organized information

What are facets I might think about?

E&P Lifecycle

Hydro carbon System

Geologic Age

Process Mgmt

Lease Mgmt Other Orgs

Basins, Reservoirs

& Fields

FacilitiesWells Disciplines

Countries & Regions

ReservesHuman

ResourcesContent Types

Production

Locations Org Chart

Strategies LLCTaxonomy

Sept. 28, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

Questions?

Ron Daniel

925-368-8371

[email protected]