the enterprise search market in a nutshell
TRANSCRIPT
![Page 1: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/1.jpg)
1
The Enterprise Search Market in a Nutshell
Iain Fletcher
October 19, 2015
ICIC 2015, Nice
![Page 2: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/2.jpg)
2
Agenda
• About Search Technologies (30 seconds)
• The enterprise search market
• Likely future architectures for supporting
important search applications
![Page 3: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/3.jpg)
3
Search Technologies: Background
San Diego
London UK
San Jose, CR
Cincinnati
San Francisco
Washington (HQ)
Frankfurt DE
• Founded 2005
• 180 employees
• 600+ customers
• Independent consulting company
• Focus on enterprise search
• Working will all leading platforms
Prague, CZ
![Page 4: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/4.jpg)
4
600+ Customers
![Page 5: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/5.jpg)
5
The Enterprise Search Market
![Page 6: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/6.jpg)
6
High-level Search Engine Classifications
1. Part of a portfolio, many are recently acquired technologies
– E.g. SharePoint/FAST, HP Autonomy, IBM/Vivisimo, Dassault/Exalead,
Oracle/Endeca
2. Stand-alone specialists, often deployed to address specific apps or
challenges
– E.g. GSA, Coveo, Attivio, Sinequa, Recommind
3. Open source, with or without support or proprietary add-ons
– Raw: Lucene, Solr, Elasticsearch
– With support/add-ons: LucidWorks, Cloudera Search, Elastic ELK
4. Cloud-based services, typically based on open source technology
– E.g. Amazon Cloudsearch (Solr), Microsoft Azure search (Elasticsearch)
![Page 7: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/7.jpg)
7
The dominant market share is currently with
SharePoint, open source, and the GSA
• SharePoint 2013 search is credible, and bundled
– Search teams are under pressure to use it, or to provide a
compelling reason to do otherwise
• Solr and Elasticsearch are robust and reliable
– Thanks to very wide-spread deployment
• The Google brand sells – and a lot of GSAs have been
shipped during the past few years
Market Observations
![Page 8: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/8.jpg)
8
Functional Observations
• Core indexing / searching is generally fast and reliable
– Search is a maturing / converging technology
• Key differences remain in peripheral functionality, such as
content processing prior to indexing, and query processing
– Coveo, Attivio, Sinequa etc. have well-developed indexing
pipelines, UI tools, and a range of data connectors
– SharePoint and GSA are delivered with limited content
processing functionality and limited connectivity
– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t
provide a formal indexing pipeline, UI, or connectors
![Page 9: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/9.jpg)
9
Further Observations
• The search engines with less focus on peripheral issues
such as content processing and connectivity have dominant
market share
• Connectivity is often challenging, especially when
combined with continual data growth, and document-level
security requirements
• The movement of data sets to the cloud adds further
complexity for enterprise search systems
– Hybrid indexing environments will be with us for some years
– Some content sets in the cloud, some behind the firewall
![Page 10: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/10.jpg)
10
Great Search requires Attention to Detail
E.g. in content processing
prior to indexing • Normalization
– Names, dates, synonyms….
• Entity identification and resolution
• Categorization
• Document vector extraction
• Document splitting and concatenation
• Link & popularity analysis
• Dupe & near-dupe detectionIndex
security
category
metadata
![Page 11: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/11.jpg)
11
Future Directions for Search
So what will search architectures look like in the future?
Important influences:
• The business need for organizational and analytical agility
• The convergence of search and (“big data”) analytics
• Continual growth in data volumes, and evolution in
repository / storage fashions
![Page 12: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/12.jpg)
12
Converging Architectures
Let’s take a brief look at:
1. The “Big Data Architecture”, as evangelized by IBM,
Cloudera, etc.
2. Recent Search Architectures
Background Info
![Page 13: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/13.jpg)
13
The Big Data Architecture
Designed for Structured Data
![Page 14: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/14.jpg)
14
The Traditional Search Architecture
Integrated Search EngineContentSources
Connectors Index Pipeline SearchIndexEmployee
Directory
CMS
File Share
UI
Etc.
Designed for Unstructured Content
![Page 15: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/15.jpg)
15
The Traditional Search Architecture
Integrated Search EngineContentSources
Connectors Index Pipeline SearchIndexEmployee
Directory
CMS
File Share
UI
Etc.
• As data volumes grow, re-indexing
becomes challenging
• The rate at which content can be
acquired from repositories is usually the
bottleneck
Designed for Unstructured Content
![Page 16: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/16.jpg)
16
The Traditional Search Architecture
Integrated Search EngineContentSources
Connectors Index Pipeline SearchIndexEmployee
Directory
CMS
File Share
UI
Etc.
• A few documents-per-second?
• There are only 2.6 million seconds in a
month
RE-INDEX
![Page 17: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/17.jpg)
17
A Better Search Architecture
• Re-indexing rates greatly improved
• “Touch-time” with repositories can be managed autonomously
Search EngineContentSources
ConnectorsIndex
PipelineSearchIndex
EmployeeDirectory
CMS
Etc.
RE-INDEX
Content
Processing
SecureCache
Iterative
Development
![Page 18: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/18.jpg)
18
The Future Architecture?
Hadoop
Search EngineContentSources
ConnectorsIndex
PipelineSearchIndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
SecureCache
Iterative
Development
• This environment will encourage ever more sophisticated text analytics
• We expect to see much innovation in text analytics during the next few years
• The deliverable is a better, and richer search index
![Page 19: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/19.jpg)
19
An Established Architecture
Hadoop
Search EngineContentSources
ConnectorsIndex
PipelineSearchIndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
SecureCache
Iterative
Development
• Google.com works something like this, since 2004
![Page 20: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/20.jpg)
20
An Integrated Search/Analytics Architecture
Hadoop
ContentSources
Connectors
CMS
File system
Rapid Indexing
Content
Processing
SecureCache
Iterative
Development
ETL
DataSources
Data Warehouse
Logfiles
Etc.
Etc. Search App.
Search App.
Analysis App.
Analysis App.
• Encourages agile exploitation of data and content resources
![Page 21: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/21.jpg)
21
Summary 1
• Search and Big Data applications are tending towards to the same architecture
• Autonomous connectivity and content processing simplifies and de-risks – if you can get it right
• The foundation of great search is still a clean, rich and detailed index
• The “search index” itself is a mature technology, almost a commodity
• Much of the innovation during the next few years will be in text analytics, and other methods of preparing content prior to indexing
![Page 22: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/22.jpg)
22
The compulsory analyst quote….
And finally….
“Enterprise Search Can Bring Big Data Within Reach”
• Multiple, purpose-built indexes that are derived from enriched content are necessary.
http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/
* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog
![Page 23: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/23.jpg)
23
The Enterprise Search Market in a Nutshell
Iain Fletcher
October 20, 2015
Questions?
![Page 24: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/24.jpg)
24
Spare Slides
![Page 25: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/25.jpg)
25
Reference Architecture
Content sources
Connectors
Indexes
Semantics
Text Mining
Quality Metrics
Content Processing Pipelines
Big Data Framework
Indexes
Queryparsing
Search Engine
Web Browser
Staging Repository
![Page 26: The Enterprise Search Market in a Nutshell](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed363a1a28abd4108b45c5/html5/thumbnails/26.jpg)
26
Where is the Focus?
• The Business View
• The Implementation View
ApplicationContent Capture & Preparation
Data Store
/ Index
ApplicationContent Capture
& PreparationData Store
/ Index