federated searching: the abc’s of hse, xml, & z39.50 harry samuels product manager linking...

17
Federated Federated Searching: The Searching: The ABC’s of HSE, XML, ABC’s of HSE, XML, & Z39.50 & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

Post on 22-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

Federated Federated Searching: The Searching: The

ABC’s of HSE, XML, ABC’s of HSE, XML, & Z39.50& Z39.50Harry Samuels

Product Manager Linking & Searching

August 27, 2004

Page 2: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

TopicsTopics

The Challenge of Federated Searching Z39.50 XML Gateways HTTP Searching So, Where Are We Now? The Future

SRW/SRU NISO Metasearch Initiative The Generic XML Gateway API

Page 3: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

The Challenge of Federated The Challenge of Federated SearchingSearching

To execute federated searching, one needs a protocol or mechanism to search each of the electronic resources one would like to search

But one protocol does not fit all in the federated search environment - different electronic resources require different mechanisms

The challenge is to figure out how an electronic resource can be searched and have the right mechanism in place for each situation

Page 4: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

Z39.50Z39.50

The protocol we love to hate Z39.50 is the oldest of the commonly used

search mechanisms Almost every integrated library system can

be searched using Z39.50 Despite the issues with Z39.50 it provides

a fairly dependable mechanism for searching

Page 5: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

Z39.50Z39.50

The main problem with Z39.50 is that very few content providers implemented Z39.50

But it is the content of the commercial providers that we really want to search from our federated search systems

Page 6: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

XML GatewaysXML Gateways

Enter the XML gateway But first of all, what does XML gateway mean? As in Z39.50, there must be an XML gateway

client that transmits search queries and accepts results – This is the part of the XML gateway that is in the federated search system

There must also be an XML gateway server that responds to search queries – This is the part of the XML gateway that is at the content provider site

Page 7: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

XML GatewaysXML Gateways

An XML gateway client sends a search query over http

The query is (1) packeded into the query string of a URL or (2) packaged into an XML document that is posted to the resource

Regardless of how the query is packaged the results are sent back in an XML document over http

The use of XML in at least one of the steps gave rise to the name XML Gateway

Page 8: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

XML GatewaysXML Gateways

XML gateways provide an alternative mechanism for searching an electronic resource

Every XML gateway is different and every XML gateway requires special programming or special configuration

As electronic resource providers implement search mechanisms they are implementing XML gateways and not Z39.50 servers

XML gateways are the future – the world of electronic resources and federated searching just needs to catch up with the future

Page 9: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

HTTP SearchingHTTP Searching

Z39.50 was implemented by very few content providers and XML gateways are just now catching on – so how do we search everything else

The same way a user does… The federated search system pretends to be

a user sitting at a web browser – it simulates the actions of a human user by generating URL’s that are understood by the electronic resource – and then extracting the information off of the web pages that are returned

Page 10: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

HTTP SearchingHTTP Searching

This is possible because almost all electronic resources are accessed over the web

At Endeavor, we simply call the HTTP Search Engine the HSE

It is capable of searching hundreds of web sites and databases that are inaccessible via Z39.50 or XML gateways

Some federated search engines use HTTP searching as the preferred search mechanism

Page 11: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

HTTP SearchingHTTP Searching

Despite its reach, there are issues with HTTP searching

It usually cannot retrieve a large set of metadata in its results sets

If the user interface of an electronic resource changes then the HSE connector for that resource usually breaks – this means that HTTP searching is fragile and requires constant maintenance

Page 12: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

So Where Are We Now?So Where Are We Now?

Adoption of Z39.50 has stalled XML gateway adoption is in the early

stages and many content providers do not yet have them

HTTP searching can search far more resources than Z39.50 or XML gateways, but it is fragile and usually does not retrieve a robust set of metadata

Page 13: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

The FutureThe Future

SRW/SRU NISO Metasearch Initiative The Generic XML Gateway API

Page 14: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

SRW/SRUSRW/SRU

The next generation of Z39.50 over the web “Search and Retrieve Web Service (SRW) and

Search and Retrieve URL Service (SRU) are Web Services-based protocols for querying databases and returning search results.”

Eric Lease Morgan http://www.loc.gov/z3950/agency/zing/srw/ It is a version of an XML gateway that holds

the promise of a standard XML Gateway protocol

Page 15: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

NISO Metasearch InitiativeNISO Metasearch Initiative“NISO's metasearch Initiative will identify, develop,

and frame the standards and other common understandings that are needed to enable an efficient and robust information environment. The goal of NISO's Metasearch Initiative is to enable:

metasearch service providers to offer more effective and responsive services

content providers to deliver enhanced content and protect their intellectual property

libraries to deliver services that distinguish their services from Google and other free web services. “

http://www.niso.org/committees/MS_initiative.html

Page 16: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

The The Generic XML Gateway APIGeneric XML Gateway API

We couldn’t wait… ENCompass already had an XML gateway

search infrastructure From that infrastructure, we created a

generic gateway and documented it It is freely available to Endeavor customers When content providers ask us “how to

build an XML gateway” we share the specification with them

Page 17: Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004

Questions?Questions?