Download - Faceted search using Solr and Ontopia
Faceted search using Solr and Ontopia
2009-11-03Geir Ove Grønmo, [email protected]
Agenda
• Short introductions to Solr and Ontopia• What is faceted search?• An integration of the two – a prototype• Demos
Apache Solr
• A search engine– implemented as HTTP service on top of Apache
Lucene– searching and indexing (no web-crawling)– adds support for faceted search (and more)– sharding and replication– distributed search– excellent interoperability (i.e not really Java-specific)
• Next release: Solr 1.4• Open source:
– http://lucene.apache.org/solr/– Apache Licence 2.0
Ontopia
• A Topic Maps toolkit:– data representation, persistence and querying– application development– written in Java
• Next release: Ontopia 5.1• Open source:– http://code.google.com/p/ontopia/– Apache Licence 2.0
Where the meat is...
• Solr– fast textual search and faceted search support
• Ontopia– rich semantic data and structured search
• User interface design– providing a useful interface to the user
But first, what is faceted search?
• A technique for refining search results– Integrates textual search and navigation
• Allows concept composition– slow + expensive + red + used + car– article + in english + about salmon– people + aged 20-30 + SQL expert– punk rock songs + < 1 minute + in norwegian
+ released 1980-1982
• Support exploration and learning• Never returns zero results
How is it done?
• Given a starting set– usually all documents– or the result of filling in the search input box
• ...do the following:– count the number of hits matching each facet
field– which fields to facet on are defined at query
time
An example without faceted search
Facet types
• Standard facets– a list of facet values
• Hierarchical facet values– taxonomy of facet values
• Range/query facets– dates– prices– alphabet buckets– intervals (lower and upper bounds)
Standard facets
Hierarchical facet values
Note: the facets can also be hierarchical
Alphabet buckets
Range facets
User interface considerations
• Single select– link– radio button
• Multi select– checkboxes
• Decide on which operator to use: AND/OR– within a facet– between facets
• How many facet values to display– given limited screen real estate
• How to provide intuitive undo operation
Examples
Scoring
• Some types of documents should be ranked higher than others
• Solr lets one boost the default score:– per document– per field
• The total score of a documents depends on:– the boost and score of the fields adjusted by
how relevant a field is relatively to the actual query
– the boost of the document
Sorting
• How to sort the list of facets?– by relevance
• How to sort the values of each facet?– by number of hits– alphabetically
• How to sort the search result?– by relevance– alphabetically– by date
Proposition
• “Concept composition, using faceted search, and Topic Maps is a perfect match”
Why not use Ontopia only?
• You can, but it is not optimizedfor this use case
• It lets you implement faceted search– but it’ll be too slow
• The reasons are:– all the expensive processing will have to
happen at runtime, and not indexing time– involves a lot of traversal– relies on the underlying fulltext search engine– search has limited cacheability
Trade-offs
• Considerations:– Search performance– Indexing performance– Consistency
• Ontopia– no indexing overhead– results always up-to-date
• Solr– very fast search– indexing overhead– index must be kept up-to-date regularly
Solr – the data model
• An index contains documents• Documents have fields• A field can have multiple values
{ “id”: “1234”, “title”: “Structure and Interpretation of Computer Programs”, “authors”: [“Harold Abelson”, “Gerald Jay Sussman”] }
Ontopia – the data model
• A topic map contains– topics– and information about them
• Identities• Names• Associations to other topics• Occurrences (read: non-association
properties)
Integrating Solr and Ontopia
• Proposed solution:– Solr indexes constructed from Ontopia
queries– For each document type create a query that
extracts data from the topic map to fields in documents
– Then do faceting on selected fields
• Use-case specific schema definition– should be project specific (to some degree)
• Perform full index or incremental reindex
Index rule set
Index rule: Organisasjonsenheter
Query result: Organisasjonsenheter
Solr index: Organisasjonsenhet
id title type lokalisering
T1001448 Grønnmyr barnehage
Organisasjonsenhet Åsane
T1009449 Sone Arna/Åsane Organisasjonsenhet Arna
T1009465 Sone Fana/Ytrebygda
Organisasjonsenhet Arna
T1009492 Bybanekontoret Organisasjonsenhet Arna
T1009507 Sone Fyllingsdalen/Laksevåg
Organisasjonsenhet Arna
Index rule: Artikler
Query result: Artikler
Solr index: Artikler
id title type description authorT1000005 En kunstner i arbeid Artikkel Kjersti Nygård
T1000010 Slagord for Brinken barnehage.
Artikkel Samspill og glede - det handler om å være tilstede. Slagordet sier noe om hva vi vektlegger i Brinken barnehage.
Siri Olsen
T1000016 Slagord for Brinken barnehage.
Artikkel Salhus barnehage er ein typisk nærmiljøbarnehage.Aktiv bruk av lokalmiljøet er ein viktig del av tilbodet.
Ingebjørg Gausemel
Demo
• A prototype for Bergen kommune
Ideas for the future
• Faceted search user-interface in Ontopoly– could be made declarative
• Incremental reindexing– requires tracking changes– usually done with a timestamp– implement last-modified field in Ontopoly
• Add optional fourth column for score boost?– a float between 0 and 1
• Ontopia extensions for interacting with Solr– JSP tag library– tolog predicates
More demos
• Epicurious: recipe search– http://www.epicurious.com/tools/searchresults
?search=
• Flickr photo search with hierarchical facets– http://people.csail.mit.edu/dfhuynh/projects/hi
erarchical-facets/test.html
• A collection of faceted navigation examples:– http://www.flickr.com/photos/morville/
collections/72157603789246885/
More information
• 3 Quick Design Patterns for Better Faceted Search– http://www.thingsontop.com/3-quick-patterns-b
etter-facet-design-889.html
• How to Make a Faceted Classification and Put It On the Web– http://www.miskatonic.org/library/facet-web-
howto.html
• Book: Faceted Search (Synthesis Lectures on Information Concepts, Retrieval, and Services), Daniel Tunkelang
...is easier to find when using faceted search.
Structured semantics-rich data...