beth golden manager, editorial services factiva intelligent indexing sla 2004

19
Beth Golden Manager, Editorial Services Factiva Intelligent Indexing™ SLA 2004

Upload: maria-todd

Post on 27-Mar-2015

223 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Beth Golden

Manager, Editorial Services

Factiva Intelligent Indexing™

SLA 2004

Page 2: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Agenda

• Factiva Intelligent Indexing™

• Application of Factiva Intelligent Indexing™

• Pros and Cons

• Quality Control

Page 3: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Factiva Intelligent Indexing™

Factiva Taxonomy

320,000 companies

760+ industries

450+ news subjects

370+ regions

22 languages

Page 4: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

FII Structure

• One universal taxonomy

• Building blocks

• Inclusive hierarchy

• Polyarchy

• Synonyms and alias names

• Full descriptions

• Variable depth and breadth

Page 5: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Polyarchy

• Internet/Online services

• E-commerce

• Internet browsers

• Internet portals

• Internet search engines

• Internet service providers

• etc.

• Computers

• Computer hardware

• Computer services

• Computer stores

• Networking

• Semiconductors

Software

• Applications software

• GroupWare

• Intelligent agents

• Internet browsers

• etc.

Page 6: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Factiva Intelligent Indexing™

Company Codes

Industry Codes

Subject Codes

Region Codes

Codes On documents Search

Page 7: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

FII Application

• Code mapping

• Entity extraction

• Rule-based system

• Linguistic analysis software

• Manual review

Page 8: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Code Mapping

• Most information providers provide some form of metadata. This is

matched to relevant Factiva indexing terms.

• Advantages:

• Easy and quick

• Efficient use of existing data

• Disadvantages:

• Mismatches between coding schemes

• Different interpretations of same concepts

• Variable quality – which sources do you trust?

Page 9: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Entity extraction

• This tool finds company names which are then compared to our

controlled vocabulary.

• Advantages:

• Consistent

• Precise

• Disadvantages:

• Ambiguous names

• High maintenance costs

Page 10: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Symbology Snapshot

Page 11: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Rule-based system

• Sets of IF-THEN statements established by editors, information

architects, or subject-matter experts.

• Advantages:

• Good at highly formulaic content

• Precise

• Disadvantages:

• Need thousands of rules for a complete system

• Maintenance of the rules themselves becomes VERY expensive!

• Only captures explicit concepts

Page 12: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Example

Page 13: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Linguistics-based categorization

• This tool is currently employed across all English, French, German and Spanish language publications. A combination of linguistic analysis and statistical algorithms allows new content to be compared to example data and coded appropriately.

• Advantages:

• Scales to millions of documents, thousands of categories, multiple languages

• Copes well with change

• Fits editorial workflow

• Good fine-tuning tools – editorial control

• Codes implicit as well as explicit concepts

• Disadvantages:

• Training time and cost

Page 14: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Editorial Control

• Set relevance levels

• Maintain training set

• Stop words - correlation and multiple meanings

• "Chechnya" to the industries model, as it was triggering the freelance

journalist code (because so many of them were dying there)

Page 15: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Manual coding

• About 200 editors spread across main time zones

• Advantages:

• Humans easily grasp the gist of the story

• Cope well with exceptions

• Visible/Controllable

• Disadvantages:

• Very resource-intensive = Expensive

• Slow

• Inconsistent (subjective and temporal)

• Not scalable

Page 16: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Review process

• Lists reviewed every three months, redefinition, new codes,

expansion changes

• Market research/customer feedback and behavior

• Changes to parent schemes/standards

• Editorial/Quality control feedback

• Internal coding forum

• 45-day notice period

Page 17: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Quality control

• Sampling by editors

• Scoring for precision and recall

• Analysis by source, language, code, editor etc.

• Feedback to editors and systems

• Corrective action

Page 18: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Results

• Three million articles coded a month

• All receive a level of autocoding

• Seventy-nine percent automation or more than two million are auto-

coded with no further manual review

Page 19: Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004

Recap

• Factiva’s taxonomy is Factiva Intelligent Indexing™

• Factiva uses a hybrid methodology for application

• Factiva has a coding team for governance and maintenance

• End result: Factiva Intelligent Indexing™ leverages our editorial

strengths, combining human experience and expertise with the latest

automation software to implement a completely flexible and granular

indexing system across all of our content.