intro to solr in drupal
DESCRIPTION
Does your website have a ton of data? How do your users find the relevant pages among all the noise in your site? Solr can help deliver the pertinent search results to your users regardless of your site's size. Apache Solr is a Java program that integrates with the Drupal contrib module that allows your users to quickly search millions of records and narrow down the results with minimal system impact.TRANSCRIPT
Intro to Solr
DrupalConPortland
Andrew RileyDirector of Drupal Development
@andrewmriley
Agenda
Search?WhySolr? Searching
Behindthe
Scenes
Search?
What is Search?
Search (v): to go or look through (a place, area, etc.) carefully in order to find something missing or lost: I searched the desk for the letter.
Source: http://dictionary.reference.com/browse/search
@Mediacurrent
Why Users Search
•Navigation doesn't make sense
• It can be faster
•Lots of data
•Frequent data changes
•Might just be looking for something
@Mediacurrent
Search Problems
•Search accuracy
•Too much data
•Slow response
•Wrong results
@Mediacurrent
Why
Solr?
History
Solr was initially created in 2004 as an in-house project for CNET. It was open sourced in 2006 and donated to the Apache Software Foundation.
@Mediacurrent
Lucene
•Solr is a layer on top of Lucene
•Lucene is a library
•Solr stores files in Lucene format
*http://wiki.apache.org/solr/SolrPerformanceData
@Mediacurrent
Speed
Search speed is important!
@Mediacurrent
Speed
Source: Web Performance Today http://j.mp/12h8wLZ
@Mediacurrent
Speed
• Important!
• It scales well
•No database required
•Clustering & Sharding
•Netflix runs 1.2MM q/day on 4 servers*
*http://wiki.apache.org/solr/SolrPerformanceData
@Mediacurrent
Natural Results
•Stemming: Blogging vs. Blog
•Stop Word Removal: The
•Synonyms: Tissue vs Kleenex
•Highly Configurable
@Mediacurrent
Drupal Search
•Not stemmed by default
•Queries the database
•Stores tokenized words in a single large table
•Much slower to index
@Mediacurrent
VS@Mediacurr
ent
Searching
Ordering
•Score
•Comes from Lucene
•Not "out of 100"
•Bigger score first
More Info: http://lucene.apache.org/core/3_6_1/scoring.html
???
201
200
199
184
@Mediacurrent
Facets
•Users do the work
•Fixes too much data
•Native to Solr
•Requires the Facet API module
•Shopping Sites
@Mediacurrent
Behind the
Scenes
Index?
• Index contains Documents
•Documents have Fields
•Fields have Terms
•~2 minutes for updates
•Uses Lucene syntax
@Mediacurrent
Tokenizing
•Splits words and numbers"this" "is" "blogging"
•Excludes Stopwords"this" "blogging"
•Handles Stemming (if enabled)"this" "blog"
•Very configurable
@Mediacurrent
Bias
•Adjusts the order of search results
•Works on: Content Type, Fields, Comments, Promoted to Home Page and more
•Can be dynamic with custom modules.
@Mediacurrent
Recap
Modules
•Apache Solr (apachesolr)
•Facet API (facetapi)
•Chaos tool suite (ctools)
@Mediacurrent
Overall
•Search is becoming more and more important
•You want to control your search results
• If you don't provide a good search experience, somebody else will.
•Solr doesn't have to be complex.
•Solr is fast and scales.
@Mediacurrent
Thank You!
Questions?
@Mediacurrent Mediacurrent.com
@andrewmriley
slideshare.net/mediacurrent