ii-sdv 2015, 20 - 21 april, in nice
TRANSCRIPT
![Page 1: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/1.jpg)
1
Search Technologies: Who We Are
The leading independent IT services firm specializing in the design,
implementation, and management of enterprise search and big data
search solutions.
![Page 2: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/2.jpg)
2
Search Technologies: Background
San Diego
London UK
San Jose, CR
Cincinnati
San Francisco
Washington (HQ)
Frankfurt DE
• Founded 2005
• 150+ employees
• 600+ customers worldwide
• Deep enterprise search expertise
• Consistent revenue growth
• Consistent profitability
![Page 3: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/3.jpg)
3
Search Engine and Big Data ExpertiseOur Technology and Integration Partners
![Page 4: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/4.jpg)
4
600+ Customers
![Page 5: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/5.jpg)
5
Search Technologies: What We Do
• All aspects of search application implementation
– Content access and processing, search system architecture, configuration, deployment
– Accuracy analysis, metrics, engine scoring, relevancy ranking, query enhancement
– User interface, analytics, visualization
• Technology assets to support implementation– Aspire high performance content processing
– Content Connectors (Document, Jive, SharePoint, Salesforce, Box.com, etc.)
• Engagement models
– Most projects start with an “assessment”
– Fully project-managed solutions, designed, delivered, and supported
– Experts for hire, supporting in-house teams or as a subcontractor
![Page 6: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/6.jpg)
6
Search Technologies: Expertise by Role
Role Responsibilities
Project Manager Ensures project is on time and within budget.
Architect Designs overall solution architecture.
Requirements Analyst Documents requirements and solution goals.
Engineer Hands-on software development and configuration.
Lead Engineer Lead developer – understands application from end to end.
Search Quality Analyst Analyzes search metrics, improves quality of results.
Data Analyst Analyzes data; defines content processing needs.
Support Engineer Provides 8x5 of 24x7 for software and managed services
![Page 7: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/7.jpg)
7
Q&A
![Page 8: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/8.jpg)
8
Microsoft Search Expertise
• Over 150 SharePoint/FAST customers
• 50 Engineers trained on Microsoft search technologies
• Projects with Elsevier, CPA, Florida Power & Light, Library of Congress,
GPO, Norton Rose, Daily Mail Group, Accenture, Unilever
• Working with all versions and combinations of SharePoint and FAST
(ESP, FSIS, FSIA, Etc.)
• FAST’s Worldwide Partner of the Year back in 2006
– 30,000+ days of implementation experience since…
• Already up-to-speed with SharePoint 2013
![Page 9: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/9.jpg)
9
Google Search Appliance Expertise
• 30 Engineers trained on GSA
• Over 100 Google Search Appliance Customers
• Projects with AARP, Isilon, Petco, SAIC, ESRI, General Electric, Vistage,
Ning, NVIDIA, Hershey, Mayo Clinic and others
• Google recommends us for their most challenging GSA integration
projects
• Focus on integration and implementation issues that GSA does not
handle out-of-the-box
• 3rd Party Connector Development
![Page 10: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/10.jpg)
10
Open Source Expertise
• 40 Engineers Trained on Solr/Lucene or ElasticSearch stack (ELK)
• Projects with Comcast, BBC, U.S. House, MemoryLane, Bloomberg,
Citibank, BusinessLink, Genentech, Qualcomm, YP.com
• Focus on extending document processing and query parsing
frameworks to enable open source to function in complex enterprise
scenarios
• Focus on extreme scale and performance scenarios – YP.com, Comcast
![Page 11: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/11.jpg)
11
Big Data Expertise
• Expertise with Big Data technologies
– NoSQL – Hadoop, Cloudera
– Data Mining and Machine Learning – Mahout
– Distributed databases – Cassandra / Datastax
• Projects in Data Warehouse Search, Fraud Detection, Automated
Candidate Matching
![Page 12: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/12.jpg)
12
Assessment & Delivery
![Page 13: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/13.jpg)
13
Primary Engagement Models
• Provide complete search & big data solutions, from
requirements analysis and design through to
implementation and ongoing support
• Provide expert services to support in-house customer
projects or larger system integrators
![Page 14: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/14.jpg)
14
Delivery Model
Assessment
• Deep dive to evaluate
technical situation and
business objectives
• Document Analysis
• Develop detailed project
plan and schedule
Implementation
• Focus on technical
execution and quality
• Tightly manage objectives
as per Assessment
• Ensure completion to
timeframe and budget
Support
• Knowledge transfer from
Implementation team
ensures smooth hand-off to
Support
– 8x5 or 24x7
– Managed Services
– Hosting
AssessmentStatement
of Work Implementation Completion Support
![Page 15: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/15.jpg)
15
Define Business and
Technical Objectives
Review Existing
Applications
Review Data, Environment
and User Requirements
Review Performance Requirements
Define Future Architecture
Generate Assessment
Report
Assessment ProcessAn Assessment typically involves the following steps, but is always customized to the requirements of the customer.
DEFINE REVIEW & ANALYZE RECOMMEND REPORT
![Page 16: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/16.jpg)
16
Assessment Document
• Executive Summary
• High Level Requirements
• System Overview
• Detailed Requirements
• New Initiatives
• Proposed Project Plan
• Conclusions and Next Steps
![Page 17: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/17.jpg)
17
Assessment Benefits
• Deep focus on Technical and Business Objectives
• Detailed, documented options and recommendations
• Better visibility into the Project Scope
• Better communications and Project Management
• Opportunity to leverage expertise on your team
![Page 18: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/18.jpg)
18
Project Execution Model• Projects organized around 3 key personnel
• Agile development methodologies (Sprints and Scrums)
• JIRA and Greenhopper used for issue tracking and project management
Technical Lead
Project Manager
Architect
![Page 19: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/19.jpg)
19
Support & Managed Services
![Page 20: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/20.jpg)
20
Technical Support & Managed Services• Standard and Premium Support available worldwide
• Application Managed Services available worldwide
• Communication Channels• Support Online Portal (http://support.searchtechnologies.com
• Support Phone Line (619 564 4351 option 1)
• Support Email ([email protected])
• Support Time Frames• 8x5 or 24x7
Regular Support Premium Support
Critical 4 business hours after logging the issue 2 hours after logging the issue and Call Support
Major 1 business day after logging the issue 4 business hours after logging the issue
Minor 2 business day after logging the issue 1 business day after logging the issue
Trivial 1 business week after logging the issue 2 business days after logging the issue
![Page 21: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/21.jpg)
21
Support Online Portalhttp://support.searchtechnologies.com
![Page 22: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/22.jpg)
22
Other Online Resources (Wiki)http://wiki.searchtechnologies.com
![Page 23: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/23.jpg)
23
Hosting Services
• 10+ hosted customer applications
• 24x7 Technical Support
• Cloud Hosted Services
![Page 24: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/24.jpg)
24
Organization
![Page 25: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/25.jpg)
25
Executive TeamExecutive Enterprise Search Industry Experience
Kamran KhanPresident & CEO
19 years: International Sales, VP Sales, Executive
John Steinhauer VP Technology
16 years: Development Management, Project Management, Executive
Pat BoothDirector of Finance
17 years: Finance, Operations, Executive
Paul NelsonChief Architect
25 years: Development, Innovation, Architecting, Dev. Management
John BackVP Sales - US
15 years: Sales, VP Sales
Graham CharlesworthVP Sales - Europe
17 years: Business Development, VP Sales, Executive
Dennis TranVice President
21 years: International Sales, VP Sales
Graham GillenVP Marketing
15 years: VP Marketing, Product Marketing, Analyst & Partner Relations
Iain FletcherDirector Marketing Europe
17 years: International Sales, Product Management & Marketing
Years in the Search / IT Industry
![Page 26: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/26.jpg)
26
Organization Chart
Kamran KhanCEO
Pat Booth
Director of Finance
Joni Morgan
Sr. Bus Analyst
Nathalie Rodriguez
Corp. Accountant
Karen Pramis
Corp. Accountant
Graham Gillen
VP Marketing
Stacy BrooksMarketing Mgr
Iain FletcherDir. Marketing
Europe
Telemarketing Associates
Graham Charlesworth
VP Sales Europe
Graham Jackson
Account Mgr
Bernd Rahmig
DE Acct Mgr
Linda BerryEU Finance &
Admin
John SteinhauerVP Technology
Phil LewisUK Tech Dir
16 Engineers
Maynor AlvaradoCR Tech Dir
59 Engineers
Joan SchaechEast PS Mgr
31 Engineers
Matt LumsdenWest PS Mgr9 Engineers
John-Henry GrossProduct Mgr
John BackVP Sales NA
Mary Jo HoughtonAccount Mgr - NE
Jerry JunkerAccount Mgr - MW
Joe AbramsAccount Mgr – W
Dennis TranGoogle Accounts
Mimy Indra
Account Mgr -Federal
Jan SeatonDirector HR
Paula SmallRecruiter
Amanda BolanosSr. Admin Asst
Kristin Andrews
Receptionist/AA
Paul NelsonVP, Chief Architect
![Page 27: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/27.jpg)
27
Engineering Team
• Project Engineering
– Frontline technical consultants working on customer projects
• Project Management
– Global organization to manage customer projects
• Core Engineering
– Building assets and tools used by project teams and customers
• Technical Support and Managed Services
– Supporting software and Applications
• Sales Engineering
– Technical expertise to drive sales
![Page 28: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/28.jpg)
28
Aspire Content Processing, Connectors and QPLTechnology Assets
![Page 29: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/29.jpg)
29
Content sources
Connectors
AspireContent Processing
PipelinesIndexes
Search Engine
Web Browser
Staging Repository
Publishers
Technology Assets
1. Aspire Framework– High Performance Content Processing
– Ingests and processes content and publishes to a variety of indexes for commercial and open source search engines
2. Aspire Data Connectors– API level access to content repositories
3. Query Processing Language (QPL)– Advanced query processing
Complements to commercial and open source search technologies
1
2
3 QPL
![Page 30: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/30.jpg)
30
Aspire Content Processing
![Page 31: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/31.jpg)
31
Importance of Content Processing
• Inconsistent and sparse content, especially metadata, is a
leading cause of user dissatisfaction and underperformance
in search applications
• Meticulous preprocessing prior to indexing is a critical, yet
often neglected aspect of search systems
• The original format and structure of the content is typically
optimized for human consumption, content processing
optimizes it for indexing and search
![Page 32: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/32.jpg)
32
Content sources
Connectors
AspireContent Processing Pipelines
Indexes
Search Engine
Web Browser
Staging Repository
Publishers
Content Processing Supports
QPL
• Optimum Relevancy & Recall
• Search Navigators
• Content Grouping
• Secure Content Hub For Enterprise Content
• Support for Advanced Analytics
![Page 33: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/33.jpg)
33
Content sources
Connectors
AspireContent Processing Pipelines
Indexes
Search Engine
Web Browser
Staging Repository
Publishers
Content Processing Stages
QPL
• Connectors – Secure access to content
• Staging Repository – Fast & secure re-indexing
• Pipelines
– Cleansing, enriching and normalizing prior to indexing
• Publishers – Output to search engine
![Page 34: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/34.jpg)
34
What is Aspire?
• A vendor neutral framework to support high-volume, high-
performance content processing
• A toolkit to create custom components needed to
implement high quality search implementations
• A highly effective and low cost way to prepare data for
indexing by extracting and normalizing metadata, cleansing
and enriching data
• A framework that enables Search Technologies to create
outstanding search experiences for customers
![Page 35: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/35.jpg)
35
Content Processing Examples• Normalization
– Names, dates, synonyms, spelling
• Entity identification and resolution
• Derive additional metadata from content
• Discover hierarchy metadata
• Categorization
• Document Matching
• Document segmentation and concatenation
• Link analysis
• Duplicate detection
• Security analysis
Index
security
category
metadata
![Page 36: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/36.jpg)
36
Indexes
Semantics
Text Mining
Quality Metrics
Aspire Aspire Aspire Aspire
Aspire Aspire Aspire Aspire
Big Data Framework
Big Data Array
Aspire Reference Architecture with Big Data Scaling for Big Data Solutions
Content sources
Connectors
AspireContent Processing Pipelines
Indexes
Search Engine
Web Browser
Staging Repository
Publishers
QPL
![Page 37: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/37.jpg)
37
Aspire Benefits
• Vendor neutral framework “future proofs” solutions
• Mature toolkit provides full set of components to create
solutions faster, economically and reliably
• Improved index quality enabled by content processing
• Java based solution supports a wide array of computing
platforms and is scalable
• Workflow and scripting support enables more flexible and
maintainable solutions
![Page 38: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/38.jpg)
38
Customers Using Aspire
• Search Technologies
• ACS / Xerox
• Adecco
• ASCO
• Aspermont
• BASF
• Bayer (POC)
• Blackberry
• Bloomberg/BNA
• Boehringer Ingelheim
• Carson-Dellosa
• CBBB
• Celera Systems
• Chick-Fil-A
• CPA Global
• EMC Corporation
• Evonik (Germany)
• Florida Power & Light
• GE Research
• GFR Media
• Haymarket (PistonHeads)
• Haymarket (HIFI)
• Hershey
• JobSite
• Just Eat
• Labour
• LOC
• Mitre
• NARA
• Deloitte
• Nectar
• NetDocuments
• New York Housing
• OLRC
• OSD/CAPE
• Penske Truck leasing
• Reed Business International
• Rolls Royce
• SCIE
• Seagate
• Shire
• Sony Media
• Sprint
• Thoughtworks
• United Nations
![Page 39: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/39.jpg)
39
Aspire Fundamentals• An OSGi framework + plug-in components architecture
• Vendor independent
• Intuitive Admin UI
• Rich library of component bundles and components
– Connectors to content sources
– Document processing components
• Parsing, extracting, splitting, joining, metadata mapping, etc.
• Scripting support using Groovy
– Publishers to leading search engines
• Integration with Hadoop
![Page 40: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/40.jpg)
40
Intuitive Modern Administration UI
![Page 41: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/41.jpg)
41
Aspire Community
Licensing & Maintenance
• Free to download and use
• Registration Required
• License Agreement Required
• Maintenance & Support is not available
Packaging
• Framework and Core Components
• Publishers: Solr, CloudSearch and GSA
• Connectors for File system and RDB
• No security
• Javadoc for Programming New Components
• Administration Tool
• Archetypes for quickly creating new components and
distributions
• Access to Aspire Wiki
• Access to the Maven repository
– But for a limited set of components
Licensing & Maintenance
• Priced per server per month
• Maintenance and Technical Support included
Packaging
• Aspire Community, plus
– All currently available publishers*
– Corporate Site Map
– Enterprise Security
– Distributed Processing
– Connectors: CIFS, Heritrix, Enhanced RDB
– Dynamic Crawler Controls
• Access to Wiki
• Access to the Aspire Maven Repository
– Includes access to all released pipeline
components
• Technical Support (via support portal, telephone,
and e-mail) 8x5 or 24 x 7 support available
(additional cost)* Except FAST Content API
Aspire Enterprise
![Page 42: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/42.jpg)
42
Connectors
![Page 43: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/43.jpg)
43
Connectors Provide• API level access to repositories
• Retrieval of:
– Content and metadata
– ACLs for repositories that support security
– Hierarchy information
• Full and incremental crawling
• Multiple modes for crawl scheduling
• Search engine independence
• Ease of install and configuration from a common Admin UI
Connectors
![Page 44: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/44.jpg)
44
Connectors• Aspire Enterprise Connectors
– File (CIFS)
– RDB
– Heritrix
• Premium Connectors
• SharePoint 2010
• SharePoint 2013
• Lotus Notes
• Amazon S3
• Confluence
• Documentum
• EMC eRoom
• Socialcast
• IBM Connections
• Salesforce.com
• TeamForge
• Oracle RightNow
• Jive
Connectors
![Page 45: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/45.jpg)
45
QPL – Query Processing Language
![Page 46: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/46.jpg)
46
We Expect Help With Queries
![Page 47: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/47.jpg)
47
What is Query Processing?
• Analyzing the content of a query, determine a users intent and
optimize it for the search engine
• Examples:
– Term consolidation: red wine → “red wine”
– Term expansion: FSA → FSA OR “Financial Services Authority”
– Semantic expansion: Gun → Gun OR Rifle OR Pistol OR Firearm
– Geographic: Near Buffalo NY → &q=*:*&fq={!geofilt pt=45.15,-
93.85 sfield=store d=5}
– Normalization: Bill Smith → William Smith
![Page 48: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/48.jpg)
48
Benefit of Query Processing
• Improved Precision and Recall
– Users want to type just a few terms
– Search engines want users to speak advanced Boolean
• Improved User Experience
– Query processing acts like a skilled interpreter
• Remove the extraneous
• Fill in the details to bridge the gap between human and machine
![Page 49: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/49.jpg)
49
Query Parsing Language - QPL
• Search Engine Independent Server to Process Queries
– Scripting rule-based approach
– Supports maintainability of business logic
– Search engine independence reduces TCO
– Gives search engineers control, where it belongs
• UI engineers should not be controlling queries
– Search Technologies expert services to implement
and tune
QPL
![Page 50: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/50.jpg)
50
DPMS and Aspire - EXAMPLES
![Page 51: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/51.jpg)
51
DPMS Example #1 – Federal Register
![Page 52: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/52.jpg)
52
DPMS Example #1 – Federal Register
![Page 53: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/53.jpg)
53
DPMS Example #1 – Federal Register
![Page 54: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/54.jpg)
54
DPMS Example #2 – World’s Patent Data
• Consolidation of 80 million XML encoded patents from 95 patent offices into a single, searchable application.
• A long and rich history since 1790 with numerous liguistics, normalization, cleansing, enrichment and data linking challenges
• Forward and backward references
• Assignee, inventor, corporate hierarchies for which normalization is required
• Multiple classification hierarchies which change over time
• Hierarchical claims structure
• Whole document comparison features (similarity search)
• KEY ISSUES: Controlling complexity and handling scale
![Page 55: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/55.jpg)
55
DPMS Example #2 – World’s Patent Data
![Page 56: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/56.jpg)
56
DPMS Example #2 – World’s Patent Data
![Page 57: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/57.jpg)
57
DPMS Example #2 – World’s Patent Data
![Page 58: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/58.jpg)
58
Document Processing Methodology for Search
• The Philosophy
– Understand the Document Model
– Understand the User Model• Includes business-level requirements
– Create the Search Engine Model• Search = the pivot point between User and Data
– Document everything
![Page 59: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/59.jpg)
59
DPMS – The Methodology
Assessment
(Search Technologies Architect and Business
Analyst)
DPMSAnalysis
(Knowledge Engineer, Business Analyst, etc.)
Assessment ReportExpert assessment and recommendations
Validation
Aspire
DMDs
Review(Architect, Domain
Experts, Peers)
1Assessment
2Detailed Analysis
3Execution
Implementation(Developer)
Validate DMDs
SearchEngine
![Page 60: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/60.jpg)
60
Business Process Overview
Submission
Ingest Process
Congressional Submission
Workflow (folder)
Migration
Application
Bulk Submission
Process
Preservation
Archival Processing
Workflow
Archival Updating
Workflow
Access
Public User
Access & Delivery
Application
Authorized User
Access & Delivery
Application
Processing
Package Updating
Workflow
Access Processing
Workflow
Publishing Process
ILS Integration
Application
Submission
Process
Congressional Submission
Workflow (interactive)what renditionsare available?
how will metadata be
extracted and merged?
what manual edits may be
required?
how are PDF files processed?
how will the HTML rendition be
created
how will parser data and input files be
validated
what’s on the search form?
how will the content and metadata be
indexed
what are the navigators?
how will the MODS be created?
how are search results formatted?
what do content URLs look like?
DMD Defines How Data Flows Through System
![Page 61: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/61.jpg)
61
Google Additional Slides
![Page 62: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/62.jpg)
62
What Search Technologies Provides
• GSA Search Assessment Analysis
• Search application development
• Corporate Wide Search Solution
• SharePoint GSA search integration
• Custom Connectors, such as RightNow, Lotus Connections, Confluence, etc.
• System architecture and design
• Security integration
• Performance analysis and optimization
• Managed Service and 24x7 Support
![Page 63: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/63.jpg)
63
GSA Assessment Services
• Search Application Assessment
– Requirements gathering and planning
• Entity Recognition Assessment
– Entity identification and implementation planning
• Sensitive Data Assessment
– Data security above and beyond document-level ACL compliance
![Page 64: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/64.jpg)
64
Customer Examples
• EMC – Storage Platform
– Corporate Wide Search Platform for internal users and partners
– Aspire connectors: SharePoint, File system, Database, eRoom, JIVE, Teamforge, Socialcast
• Isilon Systems – Storage Platform
– Customer Support – RightNow Connector
– Sales – Salesforce.com
• Amirsys – Medical Diagnosis
– Decision Support Portal
– Used by 40,000 physicians in 50 countries
• Savvis – Service and Web Hosting Company
– Command Center application
– SharePoint Connector
![Page 65: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/65.jpg)
65
Case Study Slides
![Page 66: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/66.jpg)
66
Example CustomersCorporate Wide Search
• EMC
• Norton Rose (FAST ESP) – Application Management, Technical Support
• PTC (FAST ESP) – Tier 3 Technical Support, Hosting
• BNA (Solr/Lucene) – Application Management, Consulting, Hosting
• Unilever (Verity K2, RetrievalWare, FAST ESP) – Application Management, Consulting
• NXT Customers (NXT) – 40 Hosted NXT Applications
• Chick-Fil-A (FAST ESP) – Application Management, Consulting
• Seagate - GSA-based CWS. 3 connectors + Aspire Enterprise framework to normalize
Data Warehouse (Big Data)
• State Compensation Insurance Fund
E-Commerce
• Nordstrom
• Apple (anonymous)
• Samsung (anonymous)
Search & Match
• Adecco (anonymous) and/or Jobsite
Media & Publishing
• Reed Elsevier (Reconstruction Data etc.)
• CPA (FAST ESP) – Application Management, Consulting, Hosting
• Haymarket
• Gartner (?)
Government
• GPO (FAST ESP) – Application Management, Consulting
• Library of Congress (FAST ESP) – Application Management, Consulting, Hosting
• NARA – National Archives – Application and Infrastructure Architecture and Development, Consulting
• OLRC
Need more examples inDifferent solution areasMaybe not so many on CWS
![Page 67: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/67.jpg)
67
Corporate Wide Search / Enterprise Search
![Page 68: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/68.jpg)
68
Comcast
![Page 69: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/69.jpg)
69
Comcast
Background
• Built on Solr/Lucene
• Largest cable and home internet provider in the US
• Search Technologies provides expert architecture, design and
development services to in-house team.
• Replacement of a home-grown system with new Solr / Hadoop
application used to service set-top box requests and browsing of TV
listings by subscribers
![Page 70: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/70.jpg)
70
Comcast
Key Details
• Very fast indexer - 500 records per second
• Recommendations engine processes 2.8 billion records in 8 hours
(down from 24 hours).
• Vote-counting recommendations algorithm calculates recommended
movies and TV shows for a million movies and shows in Comcast’s
library.
• Millisecond search response - using Solrj
• Integration with and improvement of existing
ranking/grouping/boosting rules
![Page 71: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/71.jpg)
71
Capital Group
![Page 72: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/72.jpg)
72
Capital Group
Background
• Built on FAST ESP
• Global investment and financial management firm
• Search Technologies built the complete solution
• Intranet search portal serving multiple applications and departments
covering every aspect of the business
• Searching prior customer communications, presentations, and legal
documents
• Used by every aspect of the business
![Page 73: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/73.jpg)
73
Capital Group
Key Details
• New, highly customised search user interface
• Migration from legacy RetrievalWare system
• Core technologies: Java Server Faces, Weblogic 9.2, Apache Web
Services API, Apache commons, Embedded Java DB
• Support for Chinese and Japanese
• Customised feeding and document processing
• Data resides in Documentum, Lotus Notes, Oracle
• Full Windows AD-based security
![Page 74: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/74.jpg)
74
SAIC
![Page 75: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/75.jpg)
75
SAIC
• Background
• Built on Google GSA
• Large Government-focused systems integrator
• Search Technologies provides expert services
• Intranet application
![Page 76: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/76.jpg)
76
SAIC
• Key Details
• Indexing SharePoint Cluster of 50 Site collections
• Hundreds of User-Managed Sub-Sites
• Document-level security and NTLM authentication
• XSLT customization to display fields according to document type
• Massive expansion planned
![Page 77: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/77.jpg)
77
Media & Publishing
![Page 78: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/78.jpg)
78
Yellowpages.com
![Page 79: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/79.jpg)
79
Yellowpages.com
Background
• Originally Built on FAST ESP. Recently migrated to Solr/Lucene
• Worlds Leading Internet Yellowpages Site
• Owned by AT&T
• Search Technologies involved since 2005 providing expert services on
both FAST ESP and Solr/Lucene
![Page 80: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/80.jpg)
80
Yellowpages.com
Key Details
• Business Listings available for all 50 states
• Massively scalable search clusters in 2 data centers
• ATG based JSP GUI
• Oracle content updated daily
• Handling over 2000 queries per second
• Linguistic work (spellings, synonyms)
![Page 81: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/81.jpg)
81
GPO.gov
![Page 82: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/82.jpg)
82
GPO.gov
Background
• Built on FAST ESP & Documentum
• The publishing arm of the Federal Government
• Search Technologies is the main contractor for search, including
architecture, design, development and implementation
• The Federal Digital System www.gpo.gov/fdsys provides public access
to information provided by Congress and other Federal agencies
“The GPO and the Office of the Federal Register accomplished a minor miracle in warp speed time” - Ray Mosely, Director of the Federal Register
![Page 83: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/83.jpg)
83
GPO.govKey Details
• 50+ data sources, each with its own legacy, format & purpose, including
US Laws, Congressional Reports, Daily Congressional Records,
Economic Indicators, Reports to the President and the Budget of the US
Government
• Developed a document processing infrastructure to prepare incoming
data sets for indexing
![Page 84: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/84.jpg)
84
Computer Patent Annuities (CPA)
![Page 85: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/85.jpg)
85
Computer Patent Annuities (CPA)
Background
• Built on FAST ESP
• Leading legal/intellectual property services provider
• Search Technologies is providing the complete solution
• A major new patent search application involving 90MM patents from
100+ authorities around the world
![Page 86: II-SDV 2015, 20 - 21 April, in Nice](https://reader034.vdocuments.us/reader034/viewer/2022052700/55a69b311a28abd47d8b45c0/html5/thumbnails/86.jpg)
86
Computer Patent Annuities (CPA)
Key Details
• Data cleansing, normalization & enrichment
• Establishing new relationships between patents
• Fast “similarity searching” requiring highly optimized indexes
• Collaborative tools for patent research teams
• Search-driven BI features in SharePoint