beth golden manager, editorial services factiva intelligent indexing sla 2004
TRANSCRIPT
Beth Golden
Manager, Editorial Services
Factiva Intelligent Indexing™
SLA 2004
Agenda
• Factiva Intelligent Indexing™
• Application of Factiva Intelligent Indexing™
• Pros and Cons
• Quality Control
Factiva Intelligent Indexing™
Factiva Taxonomy
320,000 companies
760+ industries
450+ news subjects
370+ regions
22 languages
FII Structure
• One universal taxonomy
• Building blocks
• Inclusive hierarchy
• Polyarchy
• Synonyms and alias names
• Full descriptions
• Variable depth and breadth
Polyarchy
• Internet/Online services
• E-commerce
• Internet browsers
• Internet portals
• Internet search engines
• Internet service providers
• etc.
• Computers
• Computer hardware
• Computer services
• Computer stores
• Networking
• Semiconductors
Software
• Applications software
• GroupWare
• Intelligent agents
• Internet browsers
• etc.
Factiva Intelligent Indexing™
Company Codes
Industry Codes
Subject Codes
Region Codes
Codes On documents Search
FII Application
• Code mapping
• Entity extraction
• Rule-based system
• Linguistic analysis software
• Manual review
Code Mapping
• Most information providers provide some form of metadata. This is
matched to relevant Factiva indexing terms.
• Advantages:
• Easy and quick
• Efficient use of existing data
• Disadvantages:
• Mismatches between coding schemes
• Different interpretations of same concepts
• Variable quality – which sources do you trust?
Entity extraction
• This tool finds company names which are then compared to our
controlled vocabulary.
• Advantages:
• Consistent
• Precise
• Disadvantages:
• Ambiguous names
• High maintenance costs
Symbology Snapshot
Rule-based system
• Sets of IF-THEN statements established by editors, information
architects, or subject-matter experts.
• Advantages:
• Good at highly formulaic content
• Precise
• Disadvantages:
• Need thousands of rules for a complete system
• Maintenance of the rules themselves becomes VERY expensive!
• Only captures explicit concepts
Example
Linguistics-based categorization
• This tool is currently employed across all English, French, German and Spanish language publications. A combination of linguistic analysis and statistical algorithms allows new content to be compared to example data and coded appropriately.
• Advantages:
• Scales to millions of documents, thousands of categories, multiple languages
• Copes well with change
• Fits editorial workflow
• Good fine-tuning tools – editorial control
• Codes implicit as well as explicit concepts
• Disadvantages:
• Training time and cost
Editorial Control
• Set relevance levels
• Maintain training set
• Stop words - correlation and multiple meanings
• "Chechnya" to the industries model, as it was triggering the freelance
journalist code (because so many of them were dying there)
Manual coding
• About 200 editors spread across main time zones
• Advantages:
• Humans easily grasp the gist of the story
• Cope well with exceptions
• Visible/Controllable
• Disadvantages:
• Very resource-intensive = Expensive
• Slow
• Inconsistent (subjective and temporal)
• Not scalable
Review process
• Lists reviewed every three months, redefinition, new codes,
expansion changes
• Market research/customer feedback and behavior
• Changes to parent schemes/standards
• Editorial/Quality control feedback
• Internal coding forum
• 45-day notice period
Quality control
• Sampling by editors
• Scoring for precision and recall
• Analysis by source, language, code, editor etc.
• Feedback to editors and systems
• Corrective action
Results
• Three million articles coded a month
• All receive a level of autocoding
• Seventy-nine percent automation or more than two million are auto-
coded with no further manual review
Recap
• Factiva’s taxonomy is Factiva Intelligent Indexing™
• Factiva uses a hybrid methodology for application
• Factiva has a coding team for governance and maintenance
• End result: Factiva Intelligent Indexing™ leverages our editorial
strengths, combining human experience and expertise with the latest
automation software to implement a completely flexible and granular
indexing system across all of our content.