Page 1: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

Content Categorization Tools Taxonomies & Technologies for

Infrastructure Solutions

Tom ReamyChief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services

Page 2: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture



KAPS Group & Categorization Research The Answer is Taxonomy, What is the problem? Machine Categorization

– Companies, Methods, Directions

The Place of Taxonomy in the Enterprise– Taxonomy as an infrastructure activity– Foundation for Content Management, Search, Portals, Smart


Page 3: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


KAPS Group

KAPS Background – Knowledge Architecture Consultants– Organize and contextualize content, communities, and tasks– Professional Services partner to Categorization Companies

Categorization research– Evaluated 20+ companies– More companies, more new technologies– The answer is categorization, not Google

Page 4: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


The Answer is Taxonomy.What is the Problem? Professionals spend more time looking for information than

using it Professionals spend up to 2 hours a day searching Corporate Intranets Survey

– Can’t find anything– Search Stinks - Can’t find good content– No good content

Page 5: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


The Answer is Taxonomy.What is the Problem?

Infoglut: More information is being generated every day in modern companies than our entire corpus from the Athenian golden age

Quantity of information overwhelms our ability to present and classify it.

Search is not enough.– Humans search concepts, not strings

Page 6: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


A Modest Proposal:A Solution to Infoglut Bury all new content for 2,500 years Lose most new content in a library fire Unless you can convince a group of monks that your

content is worth copying, it gets tossed Dark Ages Solution: Stop writing for a thousand years

Page 7: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


Infoglut:A Really Radical Solution Hire librarians, editors, information architects to categorize

your content

OR Develop technologies that:

– support and enhance the ability of authors and editors to characterize content

– enhance the ability of users to find content

AND Create a hybrid human/automatic solution

Page 8: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


New Technologies: Categorization Explosion

Autonomy Semio Verity Inxight Topical Net Mohomine LingoMotors H5Technologies YellowBrix Entopia

Bridgewell MetaTagger Applied Semantics Sageware SmartLogik Inktomi/Quiver Stratify Vivisimo Textology Other - Tacit

Page 9: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

04/19/23 Inxight Confidential

Auto-Categorization: Methods

– Semi-Automatic: Rules, If-Then• Maximum precision & flexibility

– Catalog by Example: Bayesian, SVM, Neural• Training Sets (5-500)• Speed, Learning

– Statistical Clustering• Set of Documents & Taxonomy Level

– Semantic Analysis & World Knowledge

Page 10: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

04/19/23 Inxight Confidential

Origins of Auto-Categorization

News Feeds and Content providers• uniform content, size and structure• professional writers• Simple or standard vocabulary

Corporate intranet• Wildly varied content• Mix of good, bad, and ugly writers• Tower of Babel: Acronyms, special meanings

Page 11: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

04/19/23 Inxight Confidential

New Technologies: The Human Element

Automatic Categorization is Not Humans are better, but not as consistent

– Bring outside contexts to the document• Purpose, similar documents, common sense

– Understandable mistakes Computers are faster and cheaper

– Faster yes, Cheaper ?– Cost of poorer quality categorization

• Intranet: 20,000 users taking 60 seconds longer = $20,000 a week

The Best Answer is Hybrid or Cyborg Categorization

Page 12: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture



No clear leader in categorization No one has it all. Immature industry and pent up demand No out of the box solutions: Support Distributed Hybrid Look for

• Advanced Algorithms• Clustering, Auto-Summarization, noun phrase extraction• World Knowledge, import public & custom taxonomies• Integration – rules, metadata, components & product

• CM, Search, Portals, Expertise, Collaboration, Applications

Page 13: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


Location of Taxonomy in the Enterprise:An Infrastructure Activity

Technology• $Millions and 1,000’s of


Organizational• Recognized Value• fundamental to business


Intellectual• A couple of librarians• No budget• First to be laid off

3 Infrastructures

Technological Organizational Intellectual

Page 14: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


Location of Taxonomy in the Enterprise:An Infrastructure Activity

Technology• $Millions and 1,000’s of


Organizational• Recognized Value• fundamental to business


Intellectual• A couple of librarians• No budget• First to be laid off

3 Infrastructures

Technological Organizational Intellectual

3 Infrastructures

Technological Organizational


Page 15: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


Creating an Intellectual Infrastructure

Knowledge Audit / Knowledge Map Knowledge Creating

– Innovation, Content Management, E-learning

Knowledge Sharing / Transmission– Collaboration, Retrieval - content, experts

Knowledge Using– Smart Applications, CRM, Portals

Knowledge Architecture People

Page 16: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


Content Management and Taxonomy

Taxonomic Publishing Model– Publish by Category, not web site– Web Site the wrong unit of organization

Distributed Work Flow• Collaborative Categorization and keywords by Subject Matter

Experts, aided by software

Content Re-Organization– Rich Web of Related Content

• Basic information + contexts

Content Re-Organization: Next Steps– Document can be wrong unit of organization

Page 17: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture


Taxonomy and SearchKnowledge Retrieval: Information + Contexts Information Retrieval: ProductName

– List of Documents, ranked by frequency of keyword

Knowledge Retrieval: ProductName– Personal & Community & Historical Filters – List of Documents – about product– Categorized list:

• Features of Product• Comparisons of Products• Legal / Policy documents• Activities associated with product

– Background Resources • Glossaries, Communities

Top Related