federated search webinar for sla (special libraries assoc.)

43
September 9, 2009 Federated Search in a Disparate Environment PREPARED FOR: SLA Webinar Series Evidence-Based Practice in Libraries 2040 Corbett Rd Monkton, Md 21111 (410.472.4631 * [email protected] Helen L. Mitchell Curtis Principal, Enterprising Solutions

Upload: helen-mitchell

Post on 08-May-2015

2.731 views

Category:

Technology


3 download

DESCRIPTION

A comprehensive presentation on Federated Search (FS) Technologies including the types of FS, FS Challenges & Benefits, a case study, FS Evaluation Criteria, Examples of FS Solutions, Best Practices and Future Vision of where FS Technologies may go.

TRANSCRIPT

Page 1: Federated Search Webinar for SLA (Special Libraries Assoc.)

September 9, 2009

Federated Search in a Disparate

Environment

PREPARED FOR:

SLA Webinar Series

Evidence-Based Practice in Libraries

2040 Corbett Rd

Monkton, Md 21111

(410.472.4631

* [email protected]

Helen L. Mitchell Curtis

Principal, Enterprising Solutions

Page 2: Federated Search Webinar for SLA (Special Libraries Assoc.)

2

Enterprising Solutions

Biography

Helen L. Mitchell Curtis – Principal, Enterprising Solutions

32+ years at FDA leading one of the largest enterprise search implementations among Civilian Federal Agencies

Develop enterprise-wide search strategies & solutions

Integrate search technologies across IT applications and disparate document repositories

Build governance, management and end user buy-in

Promote collaboration, standards, findability and improved organization of data and document assets

Passion – to help clients to reduce costs, improve quality and efficiency, reduce 'pain points' and achieve a positive search experience

Enterprising Solutions

Page 3: Federated Search Webinar for SLA (Special Libraries Assoc.)

3

Enterprising Solutions

Polling Question

• What is Your Role? (select all that apply, if group participants)

• CIO, Executive Director

• Library Director (Corporate, Gov’t, Academia, Solo)

• Librarian/Information Management Professional

• IT Professional or Consultant

• Project/Product Manager

• Sales/Marketing/Communications

• End User (i.e., Scientist, Researcher, Engineering Professional)

• Federated Search Vendor

• Other

Page 4: Federated Search Webinar for SLA (Special Libraries Assoc.)

4

Enterprising Solutions

Agenda

1. Terms Clarified

2. Types of Federated Search (FS)

3. FS Challenges & Benefits

4. FDA Case Study

5. FS Evaluation Criteria

6. Examples of FS Solutions

7. Live Federated Search Demo

8. Best Practices

9. Future Vision

10.Questions & Answers

Page 5: Federated Search Webinar for SLA (Special Libraries Assoc.)

5

Enterprising Solutions

Clarify Terms

1. Definition by AIIM Market IQ

2. Definition by CMS Watch

3. A Federated Search Primer – Part II

4. Deep Web Technologies

5. Federated Search Rpt & Toolkit-Jill Hurst-Wahl

• Reliable and complete retrieval of content based on user need, i.e. everything relevant is recalled (recall) while simultaneously returning only that content relevant to the user’s focus (precision), thus eliminating the review of irrelevant content by the user.

1

Findability

• Systems…within an organization…seeking information held internally…in a variety of formats and locations, including databases, document management systems, and other repositories.

2 Content is pre-indexed, simultaneously searched,

and displayed to authorized users.

Enterprise Search

(ES)

• The process of performing a simultaneous real-time search of multiple diverse and distributed sources from a single search page, with the federated search engine acting as intermediary.

3

Federated

Search (FS)

• The set of web-sites and their documents that cannot be accessed via crawler-type search engines such as Google. Deep web content typically lives inside of databases, and is accessed through search forms.

4 It is also referred to as the Hidden or Invisible Web.

Deep Web

• SW written to access a content source that must know the URL of the source, how to send search commands, its search syntax, & how to process the search results returned from a source.

5Connector

Page 6: Federated Search Webinar for SLA (Special Libraries Assoc.)

6

Enterprising Solutions

Polling Question

Information Accessibility (select all that apply)

1. I can easily find information to do my job

2. Less than 50% of our organization’s info is searchable online

3. More than 50% of our organization's info is searchable online

4. I reference less than 5 systems (info sources) in any given week

5. I reference 5 or more systems (info sources) in any given week

Page 7: Federated Search Webinar for SLA (Special Libraries Assoc.)

7

Enterprising Solutions

Findability Issues

AIIM Market IQ Research on Findability (of 528 end users):

50% believe Findability in their organization is ―Worse to Much Worse‖ than their consumer-facing web sites

49% have no formal goal for Enterprise Findability within their organizations

49% ―Agreed or Strongly Agreed‖ that finding the information to do their job is difficult and time consuming

69% believe less than 50% of their organization's information is searchable online

36% reference five or more systems in any given week

Source: AIIM Market Intelligence, 2008

Page 8: Federated Search Webinar for SLA (Special Libraries Assoc.)

8

Enterprising Solutions

Why Use Federated Search

To increase findability to better accomplish business objectives.

To issue a single query across multiple content sources through a common search interface.

When not feasible to re-index all of the content available from large public sites like PubMed.

To increase user awareness of all content sources such as deep web for scientific, technical and business content.

To eliminate using multiple database search protocols & passwords.

When don‘t have the rights to index the content (e.g. subscription sites).

Real-time search: for content constantly being updated & impractical to keep the data as timely as it needs to be.

Page 9: Federated Search Webinar for SLA (Special Libraries Assoc.)

9

Enterprising Solutions

Federated Search Sources(examples)

Reason Corporate Academic Gov’t Public

Library

Subscription Databases X X X X

Internal or External Repositories X X

Library Catalog(s) X X X X

News X X

Digitized Material X X X

Blogs & Wikis X X X

Intranet/Internet Sites X X

Industry Specific Sources X

DB‘s available to customers X X

Historical Collections X

Page 10: Federated Search Webinar for SLA (Special Libraries Assoc.)

10

Enterprising Solutions

Typical Non-Federated Search

Courtesy of MuseGlobal, Inc.

Page 11: Federated Search Webinar for SLA (Special Libraries Assoc.)

11

Enterprising Solutions

Typical Federated Search

Courtesy of MuseGlobal, Inc.

Page 12: Federated Search Webinar for SLA (Special Libraries Assoc.)

12

Enterprising Solutions

Federated „Master Index‟ Search

Index multiple data sources content into a single master index

Queries & results come from that one master index

Many Enterprise Search products integrate FS via ‗connectors‘ to accomplish this (ex., FAST, Autonomy, Endeca)

Source: New Idea Engineering, Inc.

Page 13: Federated Search Webinar for SLA (Special Libraries Assoc.)

13

Enterprising Solutions

Federated „Data Silos‟ Search

Source: New Idea Engineering, Inc.

‗Search Federator‘ processes queries for each data source silo

Transforms search terms to match each content source requirements

Submits query to each of the sources simultaneously

Merges each source‘s results together - single look & feel

Maintains no indices of its own, relies on linked systems capabilities

Page 14: Federated Search Webinar for SLA (Special Libraries Assoc.)

14

Enterprising Solutions

Surface vs. Deep Web Search

Popular search engines (Google, Yahoo…) ―crawl‖ surface web

FS can drill down to the deep web where specialized content (i.e., scientific and technical databases) reside

Deep Web FS Examples:www.completeplanet.com -70,000+ searchable DBs & specialty

search engines

www.science.gov- federates U.S. federal agency science info

http://imlsdcc.grainger.uiuc.edu/ -Institute of Museum & Library Services (IMLS) - Digital Collections

& Content w/descriptions of digital

resources developed by IMLS

grantees

Source: Juanico-Environmental Consultants, Ltd.

Page 15: Federated Search Webinar for SLA (Special Libraries Assoc.)

15

Enterprising Solutions

Vertical Search Engine

Closely related to Deep Web – searches for a particular niche i.e., a specific industry, topic, type of content (e.g., scientific research, travel, movies, images, blogs)

Example: www.vetseek.info - is a search engine focusing on veterinary science and related topics

Page 16: Federated Search Webinar for SLA (Special Libraries Assoc.)

16

Enterprising Solutions

Polling Question

Federated Search Solutions (select one)

1. We are currently conducting an evaluation to procure a Federated Search Product

2. We currently have a Federated Search Solution installed that satisfies our requirements

3. We have a Federated Search Solution by are considering replacing it or enhancing its capabilities & features

Page 17: Federated Search Webinar for SLA (Special Libraries Assoc.)

17

Enterprising Solutions

Challenges

Authentication Showing each record‘s branding and copyright information

Licensed or subscription databases

True De-duplication Virtually impossible because DBs return 10-20 results at a

time

Vendors usually just de-dupe the first results set returned

Security Mapping user credentials and access rights to each

repository security model

Speed Limited by slowest search engine‘s performance

Page 18: Federated Search Webinar for SLA (Special Libraries Assoc.)

18

Enterprising Solutions

Challenges (continued)

Lack of data standardization Each source has a unique access method & needs

translation

Metadata mapping between FSS and underlying systems

Access methods to sources may change Requires an interface rewrite or modification

Rules for error handling Ex. Query term not available—exclude the query, the

repository, or proceed without the term?

Ex. Timeouts or connection problem

Complex searches usually not available Fielded searches

Known Items, i.e. Article Name Best to directly search database

Page 19: Federated Search Webinar for SLA (Special Libraries Assoc.)

19

Enterprising Solutions

Challenges (continued)

Relevancy scores Can‘t identify a single relevancy ranking model

Relevancy rankings for repository‘s results refers to its own

May be not be useful when comparing the results with those from another system

Access to content stored in a variety of places Results page may not let user obtain identified documents

This may involve a built-in viewer or invoking the owning product‘s interface.

Combining navigators from each result set i.e., faceted search, taxonomies and auto-generate

clusters

Selecting the right FS engine Depends on business goals, type of content sources –

structured vs. unstructured, licensed/subscriptions

Page 20: Federated Search Webinar for SLA (Special Libraries Assoc.)

20

Enterprising Solutions

Benefits

• Single master index• Quicker response times

• No need to access original data sources

• Relevancy algorithms applied uniformly

• Dynamic navigators are available for all documents

• Time savings• Searches many sources at one time

• Combines results into a single results page

• Quality of results• Client selects the sources to search

• Minimum impact on the data silos • Only accessed when a user performs a query

• Eliminates increased load crawling/indexing the data source

Page 21: Federated Search Webinar for SLA (Special Libraries Assoc.)

21

Enterprising Solutions

Benefits (continued)

• Improve productivity• Reduces number of searches executed to find relevant results

• Save, reuse, schedule, and share effective search queries

• Leverage security controls at queried source• Access repositories secured against crawls but can be accessed

by search queries

• Reduce costs• No additional capacity requirements for content index since its

not crawled by search server

• Most current content• Real time searches - as soon as the source is updated, the info is

available to the searcher on the very next query

• Increase awareness• Identify most relevant sources to search based on # of results

each source produced

Page 22: Federated Search Webinar for SLA (Special Libraries Assoc.)

22

Enterprising Solutions FDA Case Study Success(Federated „Master Index‟ Search System)

ACTIONS RESULT

Started small with high ‘pain points’.

Increased productivity & popularity.

Modified business processes. Standardized nomenclature improved efficiencies.

Users across organizationcould find content in silos.

Produced more timely & QUALITYwork products.

Indexed structured & unstructured content with document level security.

Grew from 1 repository of 500 docs to 50 with 30 million docs. Accessed on ‘need to know’ basis.

Introduced standardized search web services into applications.

Reduced development time & costs. Increased mgmt & user acceptance. Integrated in more applications.

Increased user awareness with training, newsletters & meetings.

Used more & content added. Search requirements now captured at BEGINNING of project development.

Page 23: Federated Search Webinar for SLA (Special Libraries Assoc.)

23

Enterprising Solutions

Evaluation Criteria Overview

Identify Goals

Create an Effective Search Strategy

Collect Business Requirements

Conduct needs assessment

Work Closely with User Community

Page 24: Federated Search Webinar for SLA (Special Libraries Assoc.)

24

Enterprising Solutions

Evaluation Criteria Overview(continued)

Define Features and Functions Eliminate emotional decisions re: product,

company or others using the product

High Precision

Return content relevant to user‘s focus

High Recall Recall everything relevant to user‘s need

Thoroughly Research Products, Users & Product Reviewers

Page 25: Federated Search Webinar for SLA (Special Libraries Assoc.)

25

Enterprising Solutions

Sample Evaluation Criteria

Rating Criteria Importance

(Rank 1-5)

Product #1

Score

(0-100)

Product #1

Weighted Score

(Rank x Score)

Product #2

Score

(0-100)

Product #2

Weighted Score

(Rank x Score)

Ease of Use 5 85 425 70 350

Ability to Customize UI 1 80 80 65 65

Speed 5 90 450 85 425

De-duplication 4 75 300 75 300

Clustering 4 85 340 80 320

Help Functionality 3 70 210 0 0

Alerts 4 90 360 50 200

# of Searchable Sources 3 90 270 80 240

Save Selections/Citations 2 85 170 0 0

Security 4 90 360 85 340

Product Cost 5 75 375 85 425

Vendor Credibility 4 95 380 85 340

Total Weighted Score 1010 3720 760 3005

-Courtesy of Federated Search Report & Tool Kit

Page 26: Federated Search Webinar for SLA (Special Libraries Assoc.)

26

Enterprising Solutions

FSS Example(uses FAST ESP – Vertical Search)

Features of Interest

Page 27: Federated Search Webinar for SLA (Special Libraries Assoc.)

27

Enterprising Solutions

FSS Example(uses MS & Vivisimo)

Features of Interest

Page 28: Federated Search Webinar for SLA (Special Libraries Assoc.)

28

Enterprising Solutions

FSS Example(uses Deep Web Technologies)

Features of Interest

Page 29: Federated Search Webinar for SLA (Special Libraries Assoc.)

29

Enterprising Solutions

FSS Example (uses Webfeat)

Features of Interest

Page 30: Federated Search Webinar for SLA (Special Libraries Assoc.)

30

Enterprising Solutions

Digital Library FSS Examplehttp://www.calisphere.universityofcalifornia.edu/

Features of Interest

Page 31: Federated Search Webinar for SLA (Special Libraries Assoc.)

31

Enterprising Solutions

Digital Library FSS Example http://www.calisphere.universityofcalifornia.edu

1 2

3

Page 32: Federated Search Webinar for SLA (Special Libraries Assoc.)

32

Enterprising Solutions

FSS Example(LibraryFind® developed by Oregon State Univ Libraries)

Features of Interest

Page 33: Federated Search Webinar for SLA (Special Libraries Assoc.)

33

Enterprising Solutions

Semantic Federated Search(prototype by Collexis & Deep Web Technologies)

SOURCES:

•PubMed

•NCI=Nat‘l Cancer Inst

•DTIC=Defense Tech. Info Ctr

•PMC=PubMed Central

•ScrDOEIB=DOE Info Bridge

•Eurekalert=Science News

THESAURI Used:

•MeSH

•DTIC=Defense Tech. Info Ctr

DeepWeb Technologies (a federated search provider) and

Collexis (a developer of semantic search & knowledge

discovery solutions) teamed up to deliver the world’s first

semantic federated search.

•How does semantic federated search work? •All results from your initial query are processed

through one or more thesauri. (i.e., MeSH & DTIC.)

•The system then returns terms that are found both in

the top results and in the thesauri.

Page 34: Federated Search Webinar for SLA (Special Libraries Assoc.)

34

Enterprising Solutions

Collexis & Deep Web Technologies(Search Results – screenshot 1)

2429 hits

Semantic terms.

Unlike clustering, which

simply lumps together

words that are

frequently found near

each other, these terms

are being suggested

from an expert-

developed thesaurus

(taxonomy) in which

terms are meaningfully

& consistently

organized.

The longer the

blue bar, the

more semantic

evidence found

for that term.

Page 35: Federated Search Webinar for SLA (Special Libraries Assoc.)

35

Enterprising Solutions

Collexis & Deep Web Technologies(Search Results – screenshot 2)

•Thesaurus-based search will

consistently suggest terms in

the same organized way.

•Clustering changes the way it

organizes suggestions with

every query.

• Clustering tends to be useful

for very broad, general or

unpredictable content.

•Clicking on term

“Mental Recall” from

prior screen added

term to search, reduced

relevant hits to 3; &

terms suggested are

organized.

*Thesaurus-based semantic search tends to be better

when you are working consistently in knowledge

domains, such as medicine, physics or electronics.

Page 36: Federated Search Webinar for SLA (Special Libraries Assoc.)

36

Enterprising Solutions

Best Practices

Strategically plan how to deliver your mission and just DO IT!

Do proof of concept – demos can be deceiving

Establish common set of standards & governance model

Measure results by establishing key performance indicators

Leverage lessons learned to reduce project cycles, increase trust and empower communities

Page 37: Federated Search Webinar for SLA (Special Libraries Assoc.)

37

Enterprising Solutions

Future Vision

• A simple, persistent box on a users‘ browser, cell, or entertainment screen that initiates a search based on what the user was doing, their previous keystrokes, & perhaps using historical data.

Personalized Search

• Number of results retrieved, Relevance Ranking, De-Duplication

Better Quality of Search Results

• Combine real-time searching with social networking tools, maps, etc.

Enterprise Mashups

• Know Web pages people display, what‘s on them & what apps are showing up on users' computers

Users build the index by their searching

Page 38: Federated Search Webinar for SLA (Special Libraries Assoc.)

38

Enterprising Solutions

Future Vision (continued)

• Business users expect to access info behind company firewalls & from the larger web world using the same tools and consistency

Query analysis & predictive modeling on the fly

• Filter result sets dynamically for more relevant results

Improved Navigators, Facets, Clustering

• Automate analysis of database structures and cross-reference results. Ex.- Health site cross-references data from pharmaceutical companies with the latest findings from medical researchers

Web of Interconnected Data

• Enable extreme-scale knowledge discovery

Visualization Technologies

Page 39: Federated Search Webinar for SLA (Special Libraries Assoc.)

39

Enterprising Solutions

Resources

1. Great resource for many Federated Search topics: www.federatedsearchblog.com – Author: Sol Lederman

2. Open Source & commercial search components & tools list: http://tinyurl.com/l3w8of

3. Federated Search Vendors: http://tinyurl.com/92s8qv

4. Deep Web Databases: http://tinyurl.com/yam3sw

5. Deep Web resources: http://www.internettutorials.net/deepweb.asp

6. Digital Image Resources on the Deep Web: http://tinyurl.com/46vcqp

7. Info on Vertical Search Engines: http://tinyurl.com/lpcufw

8. 50 Niche Search Engines: http://tinyurl.com/lukxwx

9. Library of Congress FS Portal Products/Vendors list: http://tinyurl.com/l6mdy8

10. Resources to Research & Mine the Deep Web: http://tinyurl.com/6g5768

Page 40: Federated Search Webinar for SLA (Special Libraries Assoc.)

40

Enterprising Solutions

References

1) ―What‟s in a Name: Federated Search‖ – Miles Kehoe, New Idea Engineering, Inc,Vol. 4 No.4 8/07

2) “Federated Search Engine Article” - Online (Weston, Conn.) 28 no2 16-19 Mr/Ap 2004 (Reprint of article by Donna Fryer www.SearchitRight.com )

3) “Growing Up With Federated Search” - by Walt Warnick, OSTI

4) “Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network: Part 3” – Walt Warnick, OSTI

5) “Vertical Search Engines & the Deep Web” - Laura B. Cohen http://www.internettutorials.net/

6) Blog: www.federatedsearchblog.com – by Sol Lederman

7) “Exploring a „Deep Web‟ that Google can‟t Grasp” - NYT 2-23-09 http://tinyurl.com/mvt42f

8) “Federated Search Primer, Part I-III” – by Sol Lederman

9) www.searchdoneright.com – by Vivisimo –Raoul – CEO & Cofounder

10) “Enterprise Search Grows Up‟”- Podcast from BizTalk

11) “Federation: Big Need, Still a Challenge” – Stephen Arnold, 4/25/08

12) “The Future of Federated Search or What Will the World Look Like in 10 Years” – Rich Turner

13) “Federated Search Report & Tool Kit” – Jill Hurst-Wahl, 10/08, © Free Pint Limited 2008

Page 41: Federated Search Webinar for SLA (Special Libraries Assoc.)

41

Enterprising Solutions

QUESTIONS

Page 42: Federated Search Webinar for SLA (Special Libraries Assoc.)

42

Enterprising Solutions

42

THANK YOU!

Helen L. Mitchell Curtis

Principal

Enterprising Solutions

[email protected]

410-472-4631(w)410-259-7766(m)

Page 43: Federated Search Webinar for SLA (Special Libraries Assoc.)

43

Enterprising Solutions

Enterprising Solutions

“Results Driven…Exceeding Expectations”