2010 01 vvbad brussel nieuwenhuysen

Post on 13-Dec-2014

1.156 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentatie van Paul Nieuwenhuysen over federated search engines op VVBAD-studiedag Federated Search Engines, 22 januari 2010 in Erasmushogeschool Brussel

TRANSCRIPT

1

Federated search engines: an introduction

Paul.Nieuwenhuysen @ vub.ac.bePrepared to support the opening lecture

at the 1-day conference about “Federated search engines”organized by VVBAD, section School Libraries, in Brussels, Belgium, on January 22, 2010.

2

These slides should be available from the WWW sitehttp://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/

(note: BIBLIO and not biblio)and also from the WWW site of the organisers of the

conference = VVBAD

3

1. Introduction and definition2. Problem statement3. Federated search engines as

a partial solution4. Meaning and confusion5. Advantages / benefits ☺6. Difficulties / limitations 7. Implementation8. Putting federated searching

in a wider context9. Some information sources

about federated searching

- contents - summary - structure- overview

of thispresentation

4

Federated searching

Introduction and definition

5

Introduction: scattering of sources

• Users want to exploit information sources fast and effectively.

• This is hindered by the fact that digital, electronic information sources that may contain relevant information are created and scattered, distributed on numerous computers all over the intranet of the user’s organization AND over the Internet and the WWW.

6

Introduction: scattering of sources

• In other words: integration / aggregation is still far from perfect.

7

Introduction: scattering of sources difficulties

• Using many information retrieval systems costs time:1. They must be used one after the other which requires

many decisions and actions.

8

Introduction: scattering of sources difficulties

• Using many information retrieval systems costs time:2. They offer different user interfaces in the retrieval phase,

which is confusing.

9

Introduction: scattering of sources difficulties

• Using many information retrieval systems costs time:3. They offer found information items in various data

formats.They display found items in different ways on a computer screen

10

Introduction: scattering of sources difficulties

Small = BEAUTIFUL

11

Introduction: scattering of sources difficulties

Small = BEAUTIFUL ?

12

Introduction: scattering of sources difficulties

Small = BEAUTIFUL ?

13

Federated searching

Problem statement

14

Problem statements

Which methods have been developed and applied to cope with this reality?

15

Federated searching

Federated search engines as a partial solution

16

Method 1: Merging = aggregating into a searchable database

Search engine Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

UserUser UserUser

o

17

Method 2: Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

18

Both methods offer benefits to the users

+ Saves the users time that would be needed to execute queries towards various servers or to browse through various systems.

19

Both methods offer benefits to the users

+ The users have to learn only 1 user interface for searching and only 1 search syntax, instead of a user interface and a search syntax for each database!

20

Both methods offer benefits to the users

+ The system offers a uniform / consistent display of results in the output phase.

21

Method 1: Merging = aggregating into a searchable database

Search engine Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

UserUser UserUser

o

22

Method 1: Merging = aggregating into a searchable database

Search engine Aggregated database

Database or web site

or…

Database or web site

or…

Database or web site

or…

UserUser UserUser

o

23

Method 2: Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

24

Federated searching: definition

An ideal federated search system1. allows a user to formulate a query, 2. it adapts/transforms this query,

so that it can be sent with a proper syntax to each search engine of a chosen set/group of disparate databases,

3. it broadcasts this query to those databases, 4. it collects results from each database,5. (perhaps: consolidates these results into 1 result set)6. (perhaps: detects and removes duplicate items)7. shows the final results to the user, in a unified format8. allows the user to sort the results by various criteria

25

Federated searching: approach

• This type of computer systems helps to integrate access to distributed databases in one search action, as far as possible.

• The catalogue of local library holdings can be one of the target databases.

26

Federated searching: scheme

informationsourcesportal for

meta-searching= federated searching= cross-database searching

portal formeta-searching

= federated searching= cross-database searching

End user

☺End user

End user

☺End user

27

Federated searchingthrough scattered databases: why?

The perfect trip:1. A cheap and nice flightThe perfect trip:1. A cheap and nice flight ☺

28

Federated searching: application:finding a suitable flight

Example:• http://CheapTickets.com/ for the USA

Example

29

Federated searchingthrough scattered databases: why?

The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel

The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel ☺

30

Federated searching: application:finding a hotel room in some city

Example

31

Federated searchingthrough scattered databases: why?

The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel3. A visit to a nice museum

The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel3. A visit to a nice museum

32

Federated searching: searching in a museum

Example

33

Federated searchingthrough scattered databases: why?

The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel3. A visit to a nice museum4. Something nice to read (free via your library)

The perfect trip:1. A cheap and nice flight2. A cheap and nice hotel3. A visit to a nice museum4. Something nice to read (free via your library)

34

Federated searching: searching in a library

Example

35

Meta-searching systemMeta-searching system

Catalog database(s)

of other libraries

Catalog database(s)

of other libraries

Federated searching: integrating access

Databases(full-text or bibliographic)

Databases(full-text or bibliographic)

PublishersPublishers

JournalsJournals

ArticlesArticlesIntranetIntranet

Local library catalog database(s)

Local library catalog database(s)

WWW search engines

WWW search engines

36

Federated searching: produce - distribute - implement

Producers = developers = creatorsProducers = developers = creators

Implementers = users (for instance a libraryImplementers = users (for instance a library

Intermediate sellers = distributorsIntermediate sellers = distributors

37

Federated searching: examples of commercial software

Groxis

360 SearchSerials Solutions

Deep Web Technologies Swets and others

Vivissimo

Distributing / selling company

MetaLibEx Libris

V-SpacesInfor (was GEAC)

MuseSearchMuseGlobal CSA and others

WebFeatSerials Solutions

Infotrieve

Product nameProducing company

38

Federated searching

Meaning and confusion

39

Federated searching: terminology / vocabulary / synonyms

federated searching= meta-searching = metasearching= cross-database searching= multi-database searching= multi-threaded searching= one-stop searching= poly-searching = polysearching= broadcast searching= searching through a portal (but the term “portal” is

used also with other meanings)

40

“Federated searching”meaning and confusion

Here and in many other contexts, the term “federated searching” is used as a synonym for “meta-searching”.

41

“Federated searching”meaning and confusion

However, some use the terms “federated searching” and “meta-searching” with DIFFERENT meanings.»“Federated searching” as searching through a database

that results from merging several databases. So this is certainly NOT equal to “meta-searching”.

»“Federated searching” as meta-searching that is followed by merging (federating) the items retrieved from various databases into only 1 set, ordered in one way or another.

This language problem creates confusion.

42

“Federated searching”meaning and confusion

• Furthermore:A federated search engine as software productis NOT the same asa federated searching system implemented as a servicethat can be available for all on the WWW, to search»public WWW search engines»bookshop databases»library catalogs / holdings»flight databases»hotel databases

43

Federated searching

Advantages / benefits

44

Federated searching: benefits for the users

+ The need to know which particular database is suitable for a particular search is reduced, because several ones can be searched in one action.

45

Federated searching: benefits for the users

+ The system can help the user to select appropriate sources.

46

Federated searching: benefits for the users

+ Can make users search and exploit databases that they would never use otherwise, that is without federated search system!

47

Federated searching: benefits for the users

+ Useful, relevant, interesting items/references can be found/uncovered from unexpected, unknown, unfamiliar databases!This is mainly beneficial in the case of interdisciplinary subjects/topics.

48

Federated searching: benefits for the users

+ The system can help in the process of authentication and authorization when this involves not only a simple recognition of IP-address of the user’s client computer, but when it involves user-id’s and passwords.

49

Federated searching: benefits for the users

+ The users have to learn only 1 user interface for searching and only 1 search syntax, instead of a user interface and a search syntax for each database!

50

Federated searching: benefits for the users

+ Can make users search and exploit databases that they would never use otherwise, that is without federated search system!

51

Federated searching: benefits for the users

+ Useful, relevant, interesting items/references can be found/uncovered from unexpected, unknown, unfamiliar databases!This is mainly beneficial in the case of interdisciplinary subjects/topics.

52

Federated searching: benefits for the users

+ Some systems offer tools to refine display of the results; for instance »to dedupe very similar items in the result set,»to sort the results, »to rank the results, »to search within the result set,»…

53

Federated searching: benefits for the users

+ Some systems offer interesting links from a retrieval result to various related sources or services (such as the full text or a document delivery service), using a link generator based on the OpenURL standard.

54

Federated searching: benefits for the users

+ Some systems check for each retrieved bibliographic description if the corresponding full text is immediately available online and indicate this immediately to the user, on the fly.

55

Federated searching: benefits for the users

+ Some systems further process the retrieved results and display them in an interesting way that is not always offered by the searched original systems. For instance:

» Clustering of results according to— subject— age— availability of full text

» Displaying the results in a graphical way

56

Federated searching: benefits for the users

So far so good !

57

Federated searching

Difficulties / challenges / problems / limitations

58

Federated searching: difficulties / challenges / problems

- Portal software tries to cope with several difficulties/challenges/problems/pitfalls that hinder the application of the “good idea”:The user does not notice most of these problems and shortcomings, because results from various databases are merged by the federated search system.

59

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

60

Federated searching: difficulties / challenges / problems

- Searching in a target database may be restricted by the federated search engine to a particular field (for example: a restriction to words occurring in the title, because this is the default way of searching of that system) while this restriction is not present in other target databases.Furthermore, this is perhaps not explained in the user interface.This may lead to a lower recall, which is of course NOT desirable. Even worse, the user is perhaps not aware of this.

61

Federated searching: difficulties / challenges / problems

- How to deduplicate/dedupe/cluster very similar entries/results/items= near-duplicates, from various target sources? When is similar similar enough? Which entry/result/item to choose/select as the representative of a cluster of similar entries?

62

Federated searching: difficulties / challenges / problems

- How to provide some useful relevance ranking of search results/entries, even when the target databases can be quite different in type and quality, and even when no index is created in advance, just-in-case, well before the search action, like Google and other Internet search engines do.

63

Federated searching: difficulties / challenges / problems

- Powerful / sophisticated / refined forms of searching may not be applicable in a federated search.Example: limiting to a particular type of document, such as a therapy in medicine.This may cause a LOSS of time, instead of winning time.

64

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engine Search engine Search engine

65

Federated searching: difficulties / challenges / problems

- Differences among target sources in the Internet application protocols that are applied normally, by default, for connection/communication and retrieval, such as»(telnet) HTTP»proprietary, non-standard protocols»Z39.50, ISO239.50, SRU, and related protocols that are

developed for federated-searching!

66

Federated searching: difficulties / challenges / problems

- Even when the target is compatible with a suitable set of protocols for standardised retrieval Z39.50, ISO239.50, SRU…,then difficulties can arise due to incomplete implementations (the target may lack features supported by the protocol and by the software for federated searching)

67

Federated searching: difficulties / challenges / problems

- When a suitable protocol can NOT be used and simple HTTP must be used for connection to the target source, and when simple HTML is used by the target source to present results,then the capture and analysis of the results by the federating search system is complicated and difficult and can be hindered by changes with time in the method of the presentation of results.

68

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

69

Federated searching: difficulties / challenges / problems

- Various search engines may act in different ways!For instance:Is truncation of a word in a search query possible?Is limitation to a particular field possible?

How can a federated search engine take these differences into account?

70

Federated searching: difficulties / challenges / problems

- A query with several words and without explicit Boolean operators can be interpreted in various ways by the various database retrieval systems.For instance, the retrieval software may apply the Boolean operator AND to combine all the query words, but it may also use OR. In the case that the federated search system does not take care of this well, then this may lead to lower recall and precision.

71

Federated searching: difficulties / challenges / problems

- When some special, non-standard, dedicated retrieval software is made available by a specific target source databases to offer special features to the user to exploit the database better than with a standard retrieval interface, then the source can probably not be exploited as well by the federated search system.Searches are reduced to the lowest common denominator.

72

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

73

Federated searching: difficulties / challenges / problems

- Differences among target sources in the formatting/structuring of their database records in fieldshinders - searching limited to a field

(for instance the author field)- displaying selected fields only

(such as the retrieved titles)- sorting of the displayed records on the contents of a

particular selected field (such as author or publication date)

74

Federated searching or merging:difficulties / challenges / problems

- In many cases there are differences among sources in the metadata schemes that are applied in the databases to improve retrieval, such as»classifications»taxonomies»thesaurus systems»ontologies

- This hinders the exploitation of the added value of such metadata.

75

Federated searching: difficulties / challenges / problems

- A user of a federated search system may perhaps incorrectly assume that ALL relevant databases are covered simply in 1 action, or that if a database is not included, then it must not be relevant/important.However, even a federated search system can only search a limited number of databases, so that perhaps somerelevant databases are NOT covered.

76

Federated searching: difficulties / challenges / problems

- Students who rely on a federated search system may perhaps not learn about the important subject-specific databases in their field, so that when they have no access anymore to the same federated search system, they still do not know which database may help them in their research and how to use it well.

77

Federated searching: difficulties / challenges / problems

- Some databases are accessible only by a limited number of concurrent/simultaneous users from one organisation, as agreed in the licence and controlled by the authorization software of the database. When such a database would be included automatically in all or in many federated searches, then some users who really require access to that particular database may perhaps not be able to use that database.

78

Federated searching: difficulties / challenges / problems

- When a database is accessible by an unlimited number of concurrent/simultaneous users from one organisation, and when such a database would be included automatically in all or in many federated searches, from many organisations (even when the searcher does not have any particular interest in that database), then the retrieval system of that database may be overburdened.This is mainly a concern for information vendors, who must maintain servers with sufficient capacity.

79

Federated searching: difficulties / challenges / problems

- Some databases can NOT be included as a target database in a federated searching engine, because their owners/producers do not allow this.This is a difficulty, because in this way interesting / valuable databases are perhaps not exploited by users who rely on federated searching.

80

Federated searchingthrough scattered databases

Federated search engine

Database Database Database

UserUser UserUser

Search engineSearch engine Search engine

81

Federated searching: difficulties / challenges / problems

- Users may be less impressed by a federated searching system than by the simple, common, familiar, famous Internet / WWW search engines, as response time is in most cases less impressive, due to differences as follows:- The computer hardware used by the systems- Slower distributed searching through several computer

systems, versus faster searching through a more centralised computer database of a priori compiled records

82

Federated searching: difficulties / challenges / problems

- The evaluation of the quality of each search result from a federated search action may be more difficult than when each database is searched separately, because the user may be less aware of the limitations, strengths, selection criteria and aims of the individual, separate databases that offer each result.For instance, peer-reviewed articles from reputable scientific journals may be mixed with more popular and more biased, unscientific texts from trade literature.

83

Federated searching

Implementation

84

Federated searching: local or remote hosting

• The federated searching system can be developed and maintained »on a local computer in-house, or »hosted on a more distant, external, remote computer;

this service is offered by some vendors of software for federated searching; partly outsourcing

85

Federated searching: local hosting: scheme

In-house portal formeta-searching

= federated searching= cross-database searching

In-house portal formeta-searching

= federated searching= cross-database searching

End user

☺End user

☺information

sources

End user

☺End user

86

Federated searching: remote hosting: scheme

Externally hosted portal formeta-searching

= federated searching= cross-database searching

Externally hosted portal formeta-searching

= federated searching= cross-database searching

End user

☺End user

☺information

sources

End user

☺End user

87

Federated searching: local versus remote hosting

• Remote hosting requires perhaps »a smaller initial investment in computer hardware and

skilled personnel»less time investment in installation and maintenance of

equipment and software

88

Federated searching: tasks for the library

• Of course providing a computer system for meta-searching

89

Federated searching: tasks for the library

• Maintaining a list of target information sources that are appropriate in the framework of the particular library:»subjects covered by the target databases should be relevant»subscriptions must have been made by the library for

access to the targets

90

Federated searching: tasks for the library

• Grouping databases in groups that correspond to subject fields and offer these as pre-selections in the user interface of the federated search system

91

Federated searching: tasks for the library

• Showing the system and its features to potential users

92

Federated searching in a library WWW site?

- Searching for books - Searching for articles

- Opening hours- Library services- Rules and regulations- Organisation of the

library

93

Federated searching in a library WWW site?

- Searching for books - Catalog of this library- Other catalogs- Other book databases- Electronic books- Federated searching for

books- Searching for articles

- Opening hours- Library services- Rules and regulations- Organisation of the

library

94

Federated searching in a library WWW site?

- Searching for books- Searching for articles

- Databases to find articles- Electronic journals- Collective catalog of

periodicals- Repositories of articles on

the Internet and WWW- Federated searching for

articles

- Opening hours- Library services- Rules and regulations- Organisation of the

library

95

Federated searching in a library WWW site!

- Find the information that you need

- The catalog- Databases- Opening hours- Library services- Rules and regulations- Organisation of the library

To a federated search engine To a federated search engine

96

Federated searching: conclusion

Federated searching - is a continuous challenge

for developers of the sophisticated software and for the implementers in libraries and information centers

- offers benefits for those end-users who are not enthusiastic to work with separate target source databases

- does not eliminate the need for access to individual databases

97

Libraries and information centres

Putting federating searching in a wider context

98

informationsources

informationsources

Federated searching + link generator

appropriatetarget

informationsource

appropriatetarget

informationsource

context-sensitivehyperlink generatorcontext-sensitive

hyperlink generator

database about local situation

“knowledgebase”

database about local situation

“knowledgebase”

federated searchingfederated searching

user

☺user

☺ full-text document !full-text document !

referencereference menumenu

99

Federated search systemand link resolver compared

!-How to bring a user from some information to related information?

-!How to bring a user to many information sourcesin 1 action?

Link resolverFederated search system

Problem to be solved

100

library WWW sitelibrary WWW site

Putting the digital tools togetherin a library system

catalogue(s) of local holdings

catalogue(s) of local holdings

context-sensitivehyperlink generatorcontext-sensitive

hyperlink generator

database about local situation

“knowledgebase”

database about local situation

“knowledgebase”

federated searchingfederated searching

user

☺user

101

Access to information sources: tools / methods / systems

In sequence of priority:

1. Online library catalogue(for hard copy and digital documents)

2. Library web site3. Link generator + “knowledgebase”4. Federated search system

5. …

102

Methods for efficient

information retrieval:

conclusions• The examples given

show at least that progress in this field is impressive.

103

Libraries and information centres

Good information sources about federated searching

104

Some good information sourcesabout federated searching

Baer, WilliamFederated searching: friend or foe?College & Research Libraries News, October 2004, pp. 518-519.

Hofstede, MartenPortals op de pijnbank.Informatie Professional, 2002, No. 10, pp. 34-39.

Jacso, PeterThoughts about federated searching.Information Today, October 2004, pp. 17, 20.

Joint, NicholasManaging the implementation of a federated search tool in an academic library.Library Review, Vol. 58, No. 1, 2009, pp. 11-16.

Linoski, Alexis and Walczyk, TineFederated search 101.Library Journal Netconnect Summer 2008, pp. 2-5.

Lockwood, Charles and Mac Donald, PatriciaImplementation of a federated search system in the academic library: lessons learned.Co-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. ½, 2007, pp. 73-91 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 73-91. Available online from: http://irsq.haworthpress.com

McHale, NinaAccidental federated searching: implementing federated searching in the smaller academic library.Co-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. 1-2, 2007, pp. 93-110 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 93-110. Available online from: http://irsq.haworthpress.com

Noerr, PeterScaling the digital divide: how interoperable systems are making information more accessible.In proceedings of the International Conference on Digital Libraries 2004: knowledge creation, preservation, access, and management, ICDL 2004, in Habitat Centre, New Delhi, India, 24-27 February 2004, Volume 1, 517 pp. New Delhi : TERI, The Energy and Resources Institute, 2004, ISBN 81-7993-029-7, pp. 66-68.

Reiss, KevinSRU, Open Data and the future of metasearchCo-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. ½, 2007, pp. 369-386 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 369-386. Available online from: http://irsq.haworthpress.com

Sadeh, TamarTo Google or not to Google: metasearch design in the quest for the ideal user experience. [online]In: Proceedings of the ELAG 2004 Conference, May 2004. Available from: http://www.elag.org/ [cited 2004]

Sadeh, TamarTransforming the metasearch concept into a friendly user experience.Co-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. ½, 2007, pp. 1-25 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 1-25. Available online from: http://irsq.haworthpress.com

Tennant, RoyThe right solution: federated search tools.Library Journal, June 15, 2003, p. 28.

Webster, Peter M.Challenges for federated searching.Co-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. ½, 2007, pp. 357-368 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 357-368. Available online from: http://irsq.haworthpress.com

105

Questions? Suggestions? Remarks?

106

• You are free to copy, distribute, display this work under the following conditions:»Attribution:

You must mention the author.»Noncommercial:

You may not use this work for commercial purposes.»No Derivative Works:

You may not change, modify, alter, transform, or build upon this work.

• For any reuse or distribution, you must make clear to others the license terms of this work.

top related