© 2012 deep web technologies, inc. swetswise searcher powered by explorit research accelerator by...
TRANSCRIPT
© 2012 Deep Web Technologies, Inc.
Swetswise SearcherPowered by Explorit Research Accelerator
By Abe LedermanPresident and CTO
Copenhagen, Denmark11 June 2012
© 2012 Deep Web Technologies, Inc. 2
About Deep Web Technologies...
• Founded by Abe Lederman in 2002–A co-founder of Verity, acquired by Autonomy–BS & MS Degrees in Computer Science from
MIT–25 years experience in Information Retrieval
• 20 person company based in Santa Fe, New Mexico
• Over $5M in DOE SBIR Grants (2002-2011)
• Pioneer/trailblazer in federated search
© 2012 Deep Web Technologies, Inc. 3
Customers Include...
Academic:• Stanford University• George Mason University• Texas Medical Center• University College of Cork• Tennessee Community
College Consortia
Public Portals:• WorldWideScience.org• Science.gov• Biznar• Mednar• ScienceResearch.com
Government:• Defense Technical Info
Center (DTIC)• Office of Sci. & Tech. Info
(DOE-OSTI)• UNECA• European Space Agency
Corporate:• Boeing • BASF• Intel• HP• P&G
© 2012 Deep Web Technologies, Inc. 4
What is the Deep Web?
The Deep Web is a collection of internet information sources that are generally not accessible to web spiders or crawlers and can not, therefore, be indexed for search by popular search engines such as Google, Yahoo! or Bing (the Surface Web).
It is estimated that there is more than 500 times more content in the Deep Web than the Surface Web.
© 2012 Deep Web Technologies, Inc. 5
What is “Federated Search”?
“Federated Search is an application or service that allows users to submit a real-time search in parallel to multiple, distributed information sources and retrieve aggregated, ranked and de-duplicated results.”
© 2012 Deep Web Technologies, Inc. 6
Public WebSources
One Search, Many Sources
Blogs
eBooks
Enter Your Search… Begin Search
OPACs
Internal Databases
Journals
Wikis
SubscriptionSources
© 2012 Deep Web Technologies, Inc. 7
Why Federated Search? 4 Big Reasons…
1. Provides greater efficiency than searching sources one by one
2. Returns the most current information because sources are searched in real-time
3. Eliminates learning disparate publisher interfaces
4. Simplifies discovery of the most relevant results
© 2012 Deep Web Technologies, Inc. 8
Best Science-Focused Engines
5 of 9 created by DWT
Science.govWorldWideScience.orgScienceResearch.comScienceAcceleratorScitopia.org
© 2012 Deep Web Technologies, Inc. 14
Presentation available at: www.deepwebtech.com/ala2011.ppt
© 2012 Deep Web Technologies, Inc. 15
• It is too slow• Connectors break• Brings back too few results from
each source• Brings back too many results• Unable to rank results well (meta-
data differences, lack of info)
Federated Search Has Gotten a Bad Reputation
© 2012 Deep Web Technologies, Inc.
SW Searcher vs. Discovery Services
SwetsWise Searcher Discovery Service
Real-time search of multiple collections
Multiple collections are indexed to one database
Initial results returned in 3-4 seconds – Remaining results incrementally returned in up to 30 seconds
Results returned within 1-3 seconds
New results are available as soon as on publisher’s site
New results are available only after re-indexing
Searches full text where possible
Mostly indexes just metadata
Search any collection regardless of publisher
Search only collections the service subscribes to
© 2012 Deep Web Technologies, Inc. 17
Drawbacks of Discovery Services
• Lack of transparency of what’s in Service
• Incomplete coverage of publisher content
• Lag between when content appears on publisher site and when available on Discovery Service
• Normalized metadata loses content source-specific metadata
• Content in Service limited by relationships, content of general interest
© 2012 Deep Web Technologies, Inc. 18
Landscape is Not So Clear
• Summon (ProQuest)– Discovery Service
• EDS (EBSCO)– Discovery Service + Federated Search
• WorldCat Local (OCLC)– Discovery Service + Federated Search
• Primo (Ex Libris)– Discovery Service + Federated Search
• Encore Synergy (Innovative Interfaces)– Limited Discovery Service + Federated Search
• Explorit (Deep Web Technologies)– Federated Search
© 2012 Deep Web Technologies, Inc. 19
When Should You Choose Federated Search?
• Access to up-to-date information is important.
• You want control of your sources.• You want to search internal/non-
mainstream sources• Your research is specialized (ex. Medical
and legal)• You have a wide range of subscribed
content (ex. EBSCO and ProQuest)
© 2012 Deep Web Technologies, Inc. 21
Major Advantages of SwetsWise Searcher
• Rich, easy-to-use interface• Incremental display of results• Sophisticated connector technology• Retrieve 50-100 results or more per
source• Relevance ranking• Smart clustering• Alerts and Search Builder• Metrics
© 2012 Deep Web Technologies, Inc. 22
Easy-to-use Interface
Simple Search Box–One-Search, “Google-like” box
–Can be embedded in your home page, blog or intranet.
© 2012 Deep Web Technologies, Inc. 23
Advanced Search Page–Unlimited categories (sources can be in multiple categories)
–Select sources to search–One or Two columns–Fielded Searching–Boolean Searching
AND, OR, NOT
© 2012 Deep Web Technologies, Inc. 25
Connectors: Think “Connections”
Connectors make it possible to talk to other data sources
–Each source is unique so connectors “normalize” a query
–Submit proper authentication to sources
–Extract the right results
–Parse results to display the data
© 2012 Deep Web Technologies, Inc. 26
Connector Monitoring
• Proactively monitor connectors
• Monitor: source health, speed, responsiveness and errors
• Evaluated by dedicated software maintenance engineers
• Generally errors are discovered by our team before users ever notice a problem
© 2012 Deep Web Technologies, Inc. 27
Relevance Ranking
• Occurance of search terms within titles & snippets
• Assigning weight to sources
• More current reults are assigned greater weight
Read: “Ranking: The Secret Sauce for Searching the Deep Web”
© 2012 Deep Web Technologies, Inc. 28
Clustering
• Real-time semantic analysis of results creates clusters on-the-fly.
• Discover relationships behind the results, not just “keywords.”
Read: “Clusters That Think”
© 2012 Deep Web Technologies, Inc. 29
Alerts–Delivery online or via email–Daily, Weekly, Monthly–Pick and choose your sources
–Export to RSS reader–Maintain database of past results
© 2012 Deep Web Technologies, Inc. 30
Search Builder–Create search pages easily
–Choose collections and search fields
–Integrates with Course Management Software
–Embed search box using built-in widget
© 2012 Deep Web Technologies, Inc. 31
SwetsWise Searcher Metrics
• Graphics-based or tabular• Single day (hourly breakdown) or entire
month• Downloadable to spreadsheet• Reports include:
–Number of queries run–Number of results retrieved per source
–Average time to retrieve results from a source
–Average rank of results retrieved per source
–Timeouts/errors by source–Searches run (query strings)–Clickthrough stats
© 2012 Deep Web Technologies, Inc.
Deep Web Technologies hosts the application
Client hosts the application
Technical support through Deep Web Technologies
Client IT staff must support application
Deep Web Technologies can access application at any time
Deep Web Technologies has limited or no access to the application
Deep Web Technologies monitors and maintains connectors
Deep Web Technologies monitors and maintains accessible connectors
Limited or no ability to access internal sources
Can access internal sources
Hosted vs. Installed Solutions
Hosted Installed
© 2012 Deep Web Technologies, Inc. 35
WorldWideScience.org is an Excellent Candidate for
Multilingual Search• A global gateway to international science
databases and portals
• All content is from national governments or vetted by national governments
• Developed in partnership with the DOE Office of Scientific and Technical Information (OSTI), WWS Alliance and Microsoft Research
• One-stop searching
• Includes databases from China, Japan, Korea, Germany, and other non-English countries
© 2012 Deep Web Technologies, Inc. 36
How Multilingual Federated Search Works
Ranked resultstranslated by Microsoft to user’s language
Results returned to user
EXPLORIT
Microsoft Translator
German
Chinese
Russian
Queryin user’s language
Ranked resultsin user’s language
Queryto be translatedfor each source
Queryin source’slanguage
Foreign language
search engines
Resultsin source’slanguage
Ranking
© 2012 Deep Web Technologies, Inc. 38
Coming in the Fall
• Visualization• Full-Faceted Navigation• Mendeley Integration • Document Type and
Document Format Clusters• Full Text Filter
© 2012 Deep Web Technologies, Inc. 39
Visualization
Using our clustering technology, results visualization allows users to see relationships between topics easily.