a new content processing framework for search applications iain fletcher...

Post on 25-Feb-2016

30 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com. Agenda. Briefly About Search Technologies Key Issues for Enterprise Search A New Content Processing Framework for Search Applications How do we use it? What does it look like? - PowerPoint PPT Presentation

TRANSCRIPT

1

1

A New Content Processing Framework for Search Applications

Iain Fletcherifletcher@searchtechnologies.com

2Agenda

• Briefly About Search Technologies• Key Issues for Enterprise Search• A New Content Processing Framework for

Search Applications• How do we use it?• What does it look like?• Use case example

2

3Search Technologies overview 3

• The leading IT services company focused on search engines• Consulting• Implementation• Managed services

• Technology independent, working with most of the leading search engines

• 90 staff, 250+ customers

4Search Technologies overview

San Diego, CA

San Jose, CR

Herndon, VA

Ascot, UKBoston, MACincinnati, OH

5Executive team

Executive Enterprise Search Industry Experience

Kamran KhanPresident & CEO

18 years: International Sales, VP Sales, Executive

John Steinhauer VP Technology

16 years: Development Management, Project Management, Executive

Paul NelsonChief Architect

22 years: Development, Innovation, Architecting, Dev. Management

Graham CharlesworthVP Europe

16 years: Business Development, VP Sales, Executive

Phil LewisTech. Director, Europe

19 years: Development, Innovation, Architecting, Project Management

Dennis TranVP & Founder

21 years: International Sales, VP Sales

John BackVP Sales

15 years: Sales, Federal Sales Director

Iain FletcherVP Marketing

16 years: International Sales, Product Management, VP Marketing

# years in the search engine industry

5

7

7

A New Content Processing Framework for Search Applications

8Agenda

• Briefly About Search Technologies• Key Issues for Enterprise Search• A New Content Processing Framework for

Search Applications• How do we use it?• What does it look like?• Use case example

8

9Enterprise Search - An Indifferent Reputation

• Major surveys show that no progress has been made during the last 10 years

• Searchers are successful in finding what they seek 50% of the time or less • 2001, IDC, “Quantifying Enterprise Search”

• More than half cannot find the information they need using their Enterprise search system • 2011, MindMetre/SmartLogic, “Mind the Enterprise

Search Gap”

9

10Search Fundamentals 10

11Metadata Supports Relevance Ranking

12Metadata Supports Relevance Ranking

Supported by great metadata!• Title• Meta description•URL• Inbound links• Alt tag text•Etc.•Provided for free by millions of SEO practitioners

13Key Issues

• Almost all modern search functions are driven by data structure

13

14Key Issues

• The majority of serious problems in serious search systems are caused by data quality issues

Also...• “Big Data” and BI from unstructured data will

face the same challenges• Can you trust an analysis if you are unsure of data

providence?

14

15Data quality examples

• The subscription portal caught out by template information

• The Intranet search skewed by a new piece of hardware

• The Intranet search where great quality was the problem!

15

16Key Issues

• Data structure and quality issues are addressed in the indexing pipelines of search engines• Cleaning, enriching, normalizing, granularizing...

• It is about process as much as technology• And data constantly evolves

• Sometimes the built-in indexing pipeline is not good enough (issues with scale, flexibility or transparency)• Some search engines don’t really have one

• We’ve written our own

16

17Agenda

• Briefly About Search Technologies• Key Issues for Enterprise Search• A New Content Processing Framework for

Search Applications• How do we use it?• What does it look like?• Use case example

17

18Document Processing Methodology for Search (DPMS)

• The Philosophy• Understand the Document Model• Understand the User Model

• Includes business-level requirements• Create the Search Engine Model

• Search = the pivot point between User and Data• Document everything

18

19DPMS – The Methodology

Assessment (Search Technologies

Architect and Business Analyst)

DPMSAnalysis

(Knowledge Engineer, Business Analyst, etc.)

Assessment Report

Expert assessment and recommendations

ValidationAspire

DMDsReview

(Architect, Domain Experts, Peers)

1Assessment

2Detailed Analysis

3Execution

Implementation(Developer)

Validate DMDsSearchEngine

20DPMS – The Implementation

21Introducing “Aspire”

• Think of it as a stand-alone indexing pipeline with a framework + component architecture

• Framework built for scalability, performance and flexibility – designed to use cloud elasticity

• Components built to be autonomous and transparent

22Technology Suite

• 100% Java• OSGi™ See www.osgi.org

• The Dynamic Module System for Java™• Apache Felix

• Open source implementation of OSGi• Jetty

• Embedded HTTP server• Maven & Maven Repositories

• For component deployment

23Component Configuration

• Any number of document processing pipelines can be used in an application

• Disparate data sources will need different treatment• Components can be shared where appropriate• Configurations are easy to change

23

24Component autonomy

• Components communicate via XML• Each component has a known and transparent input and output,

and can be tested in isolation• This simplifies problem diagnosis, promotes transparency and

controls cost-of-ownership

24

25Data Quality Monitoring

• Components have built-in quarantine systems to monitor data quality

• Content is constantly evolving• This provides transparency and enables content issues to be

diagnosed and resolved faster

25

26The Component Library

• Search Technologies maintains a library of components

• Currently there are more than 70• Components can be as simple as 3 lines

of groovy script, or complex, 3rd party technologies

• Many applications can be addressed using existing components + configuration

26

27Component Upgrading

• Components can be upgraded in-situ from a cloud-based service, without stopping/restarting the system

• Helpful in the maintenance of complex or mission-critical systems

27

28Component control

• Every component has its own control / status page

28

29A very simple example

30Security expansion example

31Patent Assignee Name Normalization

32Complexity example 32

• CPA Global Discover• The world’s leading patent research

portal• 80 million patents from 95 patent offices• More than a dozen navigators built• Numerous graphical search results

display options• Whole document comparison features

33In Summary

• Many applications today don’t need this level of diligence• But as data and data dynamism grows, more will

• A stand-alone unstructured content processing system can serve multiple applications, and makes sense for some companies

• Method. Diligence. Transparency – its not rocket science...

• Applying this approach to enterprise search is a key part of moving user satisfaction forward during the next few years

33

34

34

Thank You!

Iain Fletcherifletcher@searchtechnologies.com http://uk.linkedin.com/in/iainfletcher

top related