taxonomies: solution or symptom? · taxonomies work well for official content for two reasons: 1....

4
1 Taxonomies: Solution or Symptom? February 2011 ISSN 1554-303X MONTAGUE INSTITUTE REVIEW Taxonomies (hierarchical topic lists) work reasonably well for e-commerce and public Web sites because the audience is well defined and the content highly vetted. But for unofficial information of the sort stored in SharePoint lists and libraries, taxonomies are less effective. Often they are only a band-aid that masks a more seri- ous underlying disease — the lack of a functional knowledge ecology. In this article, we look at: • why unofficial corporate content is hard to find; • what is needed to make unofficial infor- mation findable; • where to look for suitable publishing models; • how to adapt commercial publishing mod- els to unofficial information on intranets. Unofficial information: the corporate content orphan In most organizations, unofficial infor- mation — content that is not used for legal or marketing purposes — is poorly managed, yet it is often a critical factor in performance. In knowledge-intensive firms, sales are not closed or new products created based on marketing bro- chures, corporate records, or government filings. Instead, revenues are affected by highly detailed, current, and cross functional information that can be stored anywhere — on someone’s laptop, in a shared folder on a network computer, or in someone’s head. Taxonomies work well for official content for two reasons: 1. The audiences are few and well defined. 2. The content has been vetted through a formal publication process that includes edito- rial review, executive approval, fact checking and copy editing, metadata tagging, and regular updating. For official content, taxonomies serve as a table of contents for highly selective content targeted to a specific audience. They are less ef- fective for unofficial content because the content is of uncertain quality, and there are many pos- sible audiences. The migration mess Improving access to unofficial content is a major selling point for an integrated development platform like SharePoint. But merely moving file shares into SharePoint libraries does not automati- cally make them more findable. Duplicates need to be eliminated, versions sorted out, metadata assigned, and confidential information removed. Photos, drawings, and scanned documents are a particular problem because the search engine crawler has no text from which to extract query terms. SharePoint libraries often start with content from a specific department or job function — e.g. sales proposals, legal precedents, project reports. Because the initial user group is the one that created them, finding library items is not a big problem. But the situation changes when content from other groups is added and the library grows to thousands of items. Without some way of or- ganizing and searching them, the initial benefits of file migration are lost. A taxonomy is often proposed as the answer. But is it? Findability vs. productivity It is at this point that the differences between official and unofficial information become impor- tant. To the IT person, content is content — some- thing to be stored, secured, organized, tagged, and formatted. To the manager or knowledge worker, content is one of many tools needed to perform a task — and it doesn’t much matter where it’s stored. A single content-oriented taxonomy is not very useful for migrated file shares, especially if it is only one of many places the user must look. To see why this is so, do an Amazon.com The Montague Institute Review is published by the Montague Institute and edited by Jean Graef. © Copyright 1998 - 2015 Jean L. Graef. All rights reserved.

Upload: others

Post on 06-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taxonomies: Solution or Symptom? · Taxonomies work well for official content for two reasons: 1. The audiences are few and well defined. 2. The content has been vetted through a

1

Taxonomies: Solution or Symptom?February 2011

ISSN 1554-303X

MONTAGUE INSTITUTE REVIEW

Taxonomies (hierarchical topic lists) work reasonably well for e-commerce and public Web sites because the audience is well defined and the content highly vetted. But for unofficial information of the sort stored in SharePoint lists and libraries, taxonomies are less effective. Often they are only a band-aid that masks a more seri-ous underlying disease — the lack of a functional knowledge ecology. In this article, we look at:

• why unofficial corporate content is hard to find;

• what is needed to make unofficial infor-mation findable;

• where to look for suitable publishing models;

• how to adapt commercial publishing mod-els to unofficial information on intranets.

Unofficial information: the corporate content orphan

In most organizations, unofficial infor-mation — content that is not used for legal or marketing purposes — is poorly managed, yet it is often a critical factor in performance. In knowledge-intensive firms, sales are not closed or new products created based on marketing bro-chures, corporate records, or government filings. Instead, revenues are affected by highly detailed, current, and cross functional information that can be stored anywhere — on someone’s laptop, in a shared folder on a network computer, or in someone’s head.

Taxonomies work well for official content for two reasons:

1. The audiences are few and well defined.

2. The content has been vetted through a formal publication process that includes edito-rial review, executive approval, fact checking and copy editing, metadata tagging, and regular updating.

For official content, taxonomies serve as a table of contents for highly selective content targeted to a specific audience. They are less ef-fective for unofficial content because the content is of uncertain quality, and there are many pos-sible audiences.

The migration messImproving access to unofficial content is a

major selling point for an integrated development platform like SharePoint. But merely moving file shares into SharePoint libraries does not automati-cally make them more findable. Duplicates need to be eliminated, versions sorted out, metadata assigned, and confidential information removed. Photos, drawings, and scanned documents are a particular problem because the search engine crawler has no text from which to extract query terms.

SharePoint libraries often start with content from a specific department or job function — e.g. sales proposals, legal precedents, project reports. Because the initial user group is the one that created them, finding library items is not a big problem. But the situation changes when content from other groups is added and the library grows to thousands of items. Without some way of or-ganizing and searching them, the initial benefits of file migration are lost. A taxonomy is often proposed as the answer. But is it?

Findability vs. productivityIt is at this point that the differences between

official and unofficial information become impor-tant. To the IT person, content is content — some-thing to be stored, secured, organized, tagged, and formatted. To the manager or knowledge worker, content is one of many tools needed to perform a task — and it doesn’t much matter where it’s stored. A single content-oriented taxonomy is not very useful for migrated file shares, especially if it is only one of many places the user must look.

To see why this is so, do an Amazon.com

The Montague Institute Review is published by the Montague Institute

and edited by Jean Graef.

© Copyright 1998 - 2015 Jean L. Graef. All

rights reserved.

Page 2: Taxonomies: Solution or Symptom? · Taxonomies work well for official content for two reasons: 1. The audiences are few and well defined. 2. The content has been vetted through a

2

search on a topic of general interest to your organization’s customers or stakeholders. For example, suppose your company sells office furniture and related hard goods. An Amazon search on “office design” and “sustainable of-fice” reveals books targeted to different audiences: architects, commercial inte-rior designers, engineers, and facilities managers. The tables of contents, A - Z indexes, and appendices of these books reflect the perspectives, needs, tasks, and terminologies of each different audience (see Figures 1 and 2).

Translated into the SharePoint environment, each book would become a Web site for a specific audience, work group, or business process. Each site would have a table of contents (“tax-onomy”), links to specialized resources and services, metadata necessary to fa-cilitate search and retrieval, and content quality assurance system. This domain-specific approach to migrated file share content is more likely to achieve the CEO’s productivity goals than a single, content-focused topic hierarchy.

User-focused information ecosystemUnfortunately, creating domain-

specific portals is not that easy. In the first place, the people tasked with making migrated file shares searchable (usually IT), start with content, not us-ers. They may not be aware of the differ-ences in information seeking behavior between, say, a field sales person, a proposal writer, and a scientific subject matter expert.

When specific user groups rather than content are the focus, the need for something like a book publishing ecosystem becomes obvious. Skilled, knowledgeable staff are needed to:

• weed out obsolete documents and correct errors;

• massage content so that security and confidentiality issues are addressed;

• determine which descriptors are necessary for each type of content;

• decide where the values for each descriptor will come from: free form user entry, a list of standard values, auto-entry according to computer rules;

• create terms and categories for those descriptors that require standard-ized values;

Figure 1: Taxonomy (table of contents) from the engineering perspective. The author is a professional electrical engineer. The audience is architects, engineers, and public policy makers. The vocabulary is technical and includes many acronyms. The content includes drawings as well as text. In addition to a table of contents, the book includes lists of suppliers and consultants and a glossary.

Figure 2: Table of contents from a high-level, multi-disciplinary perspective. The author is an interior designer who is a “thought leader” and innovator. The audience is architects and interior designers. The vocabulary is non technical. The content includes drawings, photos, tables, and charts along with client examples. In addition to the table of contents, the book includes a print and Web bibliography and a glossary

• create templates that incorpo-rate the necessary descriptors so that new documents will be complete and consistent;

• design a function- or subject-specific home page for each major user group that gives convenient access to all the tools and services needed to

perform a specific job function or busi-ness process;

• create new content when infor-mation gaps are revealed.

These are straightforward edi-torial tasks, and there are plenty of people with the skills to perform them.

Page 3: Taxonomies: Solution or Symptom? · Taxonomies work well for official content for two reasons: 1. The audiences are few and well defined. 2. The content has been vetted through a

3

The trouble is that they usually aren’t available in the corporate environment outside of the marketing/communica-tions function. Worse yet, the need for such an ecosystem for unofficial content is largely unrecognized.

Unfortunately, many of the peo-ple who do have some understanding of the job — librarians, competitive intelligence professionals, information architects — have been laid off dur-ing recent recessions. Using interns to clean up SharePoint libraries and perform editorial tasks is problematic, not only because they lack the neces-sary organization-specific knowledge but also because they typically have a very short tenure.

For IT, creating information re-trieval systems for different user groups is daunting enough. But there’s also the need to integrate multiple incompatible applications, meet management dead-lines, and juggle project plans. Under these conditions, adding a content-oriented taxonomy to a collection of migrated unofficial documents is the equivalent of using a couple of aspirin to cure a serious chronic illness.

End users to the rescue?But wait. Don’t we install Share-

Point in part to give end users the tools to customize their own work environ-ments? Can’t IT off-load some of the editorial, metadata management, and site design tasks to business units? Yes, but they can’t do it alone.

With some SharePoint training, it’s not too difficult for business unit staff to learn how to migrate, weed out, organize, and search their own file shares. Connecting all the individual islands of information is a much bigger, more complex task. Yet it’s necessary if organizations are going to leverage their intellectual capital to achieve performance breakthroughs — and get a decent return on their SharePoint investment.

Cross-functional access to infor-mation across business units requires:

• selection of appropriate resourc-es targeted to a specific work group or job function;

• vocabulary tools (glossaries, synonyms) to help users navigate un-

Figure 3: Elsevier’s Engineering Village is a fee-based portal designed for engineers. It’s the end point of a content pipeline that begins with professional journals that are aggregated into electronic databases. Elsevier gets paid to provide a user-focused search and discovery interface to databases of interest to different audiences — in this case engineers. Features include personalized e-mail alerts, the ability to save searches and create personalized folders, both simple and advanced search options, output to a selection of academic citation formats, links to the full text of articles, personal assistance (Ask a Librarian, Ask an Engineer), an online thesaurus, and tags.

Figure 4: Thomson Reuters offers comprehensive information packages for financial, legal, healthcare, media, science, tax, and accounting professionals. These packages include not only sophisticated search across a variety of publications but also specialized tools that focus on workflow and bottom-line issues, such as analytics and benchmarking tools to help hospitals boost performance. Like Elsevier, Thomson is an information aggregator and distributor with an emphasis on news, including broadcast video. In addition to its news gathering and distribution strategy, Thomson Reuters operates a financial analytics service.

Page 4: Taxonomies: Solution or Symptom? · Taxonomies work well for official content for two reasons: 1. The audiences are few and well defined. 2. The content has been vetted through a

4

familiar terminology;• tables of contents, “see also”

references, and summary articles to provide a context for information;

• bi-directional links between experts and content, so that users can navigate from content search results to relevant people and vice versa;

• accurate metadata, such as ver-sion, publication date, author, client, application or industry;

• information sharing and tagging functions.

Systems that perform all these functions already exist in the fee-based information industry. Examples are Elsevier (for scientific and engineering audiences) and Thomson/Reuters (for corporate functions).

An ecosystem for unofficial Share-Point libraries

The fee-based information indus-try model includes all parts necessary to provide effective access to unofficial information in SharePoint libraries — user focus, content selection, sophis-ticated search, specialized workflow functions, and rich metadata — but it’s not an exact fit for intranets. For one thing, the raw material (internal articles, reports, etc.) is usually not subject to standardized quality control processes. For another, the responsibility in most organizations for ensuring quality is ill-defined.

Finally, the commercial systems are expensive. It takes time and money to create and maintain the necessary metadata underpinnings, design spe-cialized information functions, and work out technical incompatibilities. Fee-based information providers can recoup these costs because they are spread over a large base of customers who are willing to pay for added value that they provide.

Even though the commercial pub-lishing model isn’t a perfect fit for unof-ficial corporate content, it’s important to recognize the need for something like it. Once that’s accepted, it’s possible to do a cost/benefit analysis to decide such implementation issues as:

• What to codify? Not all types of knowledge are worth the time and

expense of publishing. Do we really want to ask the sales force to give up their email networks and use content repositories instead? How much of what’s on their laptops is worth moving to SharePoint libraries?

• Use of automation. Can auto-classification programs reduce the cost of metadata creation and tagging? If so, for which types of content?

• Use of search and discovery tools. When does full text search work best? When are topic hierarchies and controlled vocabularies necessary? What about other tools, such as A - Z indexes, glossaries, cross references, and Best Bets?

• Staffing and governance. When is it more cost effective to embed publishing and research staff at the business unit level and when should the publishing process be centralized? Can we use freelance contractors for part of the work, and if so, where do we find them? See Models of Embedded Librarianship.

• Standards and external co-operation. Cooperation in standards development such as the Semantic Web reduce publishing costs for individual organizations. Which of these external groups should we work with and who should be our representative?

ConclusionHierarchical topic lists (taxono-

mies) are useful when they serve as a table of contents for highly vetted con-tent collections, such as e-commerce sites and official corporate content. By themselves, they do not work well for unofficial content that is migrated from network file shares into systems like SharePoint. Not only do they fail to solve the search and discovery problem, but they can mask the need for an in-ternal publishing ecology that contains some elements of fee-based commercial information systems. q