more powerful solr search with semaphore - jeremy bentley
TRANSCRIPT
Smartlogic TM
Apache Lucene Eurocon
Jeremy Bentley, CEO
1st degree of order
Filing management • 80% of enterprise information is unstructured • Doubling every 19 months and accelerating [Gartner] • Increasing burden of compliance • Enterprise 2.0 additions
2nd degree of order
Index management • File plans and metadata schema • Mono- hierarchical standardised taxonomies • Manually applied classification • Low level of consistency and quality
3rd degree of order Computerised 1st and 2nd degrees
A 10 year Flatline Expectation Gap
• 2001, IDC, “Quan5fying Enterprise Search” Searchers are successful in finding what they seek 50% of the 9me or less
• 2011, MindMetre/SmartLogic More than half (52%) cannot find the informa9on they need using their Enterprise search system
5
Terabytes o
f data
Source: the Na5onal Archives
The explosion of information
2001-‐2009 1993-‐2001
? 4Tb
80Tb
20 5mes increase in Informa5on volume
Search Gets Harder as Data sets Grow
7
Circa 1996
Different vocabulary and ambiguity You Say I Say
Moon Buggy Lunar Roving Vehicle Manned Lunar Surface Vehicle
Swine Flu Swine Influenza Virus H1N1
Touchscreen Touch screen Mul5-‐touch
You Say What do you mean?
Apple A fruit? Fiona -‐ A singer / songwriter? An electronics company?
Rights Employment rights? Equal rights? Right of way?
Ford Ford Motor Forward Industrials (5cker=FORD) A shallow river crossing
Missing results
Too many results
Drawbacks Apparent
1 Needle in the Haystack
2 Multiple search terms
3 Irrelevant results
4 Out of date results
5 Multiple media forms
6 Unrestricted geography
7 Inappropriate ads
Not So Apparent
8 Can’t filter, select subset
9 No related topics
10 Missing results
11 No context or guidance
12 Best resource not clear
ü Time consuming ü Inefficient ü Ineffective
1
2
3
4
5
7
6
Conventional Search - Ineffective, Frustrating, and Inadequate
Knowing what you have
Web Enterprise
Metadata effort High Low
Result Quality requirement
Low High
Paradox of Effort
Metadata is to search, what pistons are to a petrol engine.
How do I structure it?
Crea5on Date
Modified Date
Author
Format (PDF,DOC,XLS)
Subject
Loca5on
Project
Func5on (IT,HR,Finance)
Expe
rt
Protec5ve
Marker
Reten5
on
Expiry
Publish
er
Site
Structural Process
Information
3rd degree content universe
Digital Asset
Management
Publishing Systems
Social collaboraFon
eDiscovery
Document Management
Content Management
Enterprise Search
Records Management
Portal Infrastructure
Process Management &
Workflow
4th degree of order
Digital Asset
Management
Publishing Systems
Social collaboraFon
eDiscovery
Document Management
Content Management
Enterprise Search
Records Management
Portal Infrastructure
Process Management &
Workflow
Content
Intelligence
4th degree of order Content Intelligence
Content Intelligence Plahorm
Solr
Semaphore
Copyright @ 2011 Smartlogic Semaphore Limited 16
Business Vocabulary
Classifica5on Decision User
Ac5on
Apply
Inform
Expose
Semaphore
Copyright @ 2011 Smartlogic Semaphore Limited 17
Business Vocabulary
Classifica5on Decision
Apply
Inform
Expose Metadata
Contextual User Experience
Seman6c models
Seman6c So7ware
User Ac5on
Components • Metadata • Seman5c Models • Contextual User Experience • Seman5c Sokware
Copyright @ 2011 Smartlogic Semaphore Limited 18
Metadata
Copyright @ 2011 Smartlogic Semaphore Limited 19
Low Quality tags High cost to apply
Manual Process
Single Unified ‘one size fits all’ approach
Long 5me to crak & build , manually applied
Today
High Quality tags Low cost to apply
Automa5c Process
Mul5ple approaches for various domains/audiences
Short 5me to build & deploy, automa5cally
With Content Intelligence
Content-types available – Flashnotes
– Research reports – Trade ideas
Analytics available – Current bond price
– Relative bond spreads Influenced by – Credit ratings on
Ford Motor Credit Company – European and US economies – Changes in consumer demand
Automate compliance and
distribution tasks – ‘Watch list’ lookup
– Distribution according to preset rules
– Automated mapping to create aggregator metadata
Harnessing
User Experience – Conceptual relevance
– Related topics – Links to analytics
Search engine enhancement – Search results – Email alerts
Contextualising
Key competitors – BMW
– Daimler Chrysler – General Motors
– Toyota – Volkswagen Products
– Focus – Ka
– MX5
Preferred term (Agreed Label)
Ford Motor Company
Subsidiaries – Ford Motor Credit Company
– Mazda
Parent topics – Automotive sector
– Bond issuers
Also known as – Ford
– Ford Motor – F (Bloomberg)
– FoMoCo – blue oval
Covered by – Bob Smith
Location of fundamental data – Earnings estimates
– Historic sales and profits
Organising
Unstructured content integration
– Published reports – Related topics
– Links to analytics – Search results – Email alerts
Semantic Models
Key Features 1 Taxonomy enables
discovery, related searches
2 Related topics and content
3 Facets enable filtering results by:
4 - Source
5 - Numerous topics
6 - Date
7 Best Bets
8 Automated doc. Tagging
9 A-Z
ü More relevant results ü Fewer “bad hits” ü Powerful navigation
1
3
5
4
2
8
9
6
7
Contextual User Experience
Content ExploraFon
Highligh5ng rela5onships in a result set greatly improves the user experience.
Semantic Software
Semaphore Ontology & Metadata Management
Text Analysis & Extrac5on Automa5c and assisted Content classifica5on
Contextual Naviga5on Services Seman5c Reasoning & Processing
Semaphore Search Integration
Search Engine
Query Index
Corpus
Web Services API
Search Enhancement
Server
XML API
Classifica5on Server
Collector/Normalizer
Extracted Text Document “Tags”
Ontology Informa5on
Text Miner
Ontology Manager
User R
eque
sts
Portal
Search Applica5on Framework
Sample Interface Co
de
Semaphore core module
Semaphore op5onal module
Local Term Index
Classifi
ca5o
n Ru
les
4th degree of order
Digital Asset
Management
Publishing Systems
Social collaboraFon
eDiscovery
Document Management
Content Management
Enterprise Search
Records Management
Portal Infrastructure
Process Management &
Workflow
Content
Intelligence
Content Intelligence
Informa5on Manufacturing
Knowledge Recovery
Content Analy5cs
Data Loss Preven5on Risk & Compliance
Mone5sa5on
Metadata
Content Intelligent Solutions
Web Self Service
Knowledge Acquisi5on & Recovery
Governance Risk Compliance
Cross Plahorm Content Integra5on
Micro-‐Targe5ng & Distribu5on
www.smartlogic.com 28