advanced search with solr & django-haystack

49
ADVANCED SEARCH WITH SOLR + DJANGO-HAYSTACK MARCEL CHASTAIN LA DJANGO – 2014-09-30

Upload: marcel-chastain

Post on 18-Dec-2014

425 views

Category:

Technology


0 download

DESCRIPTION

Search and information discovery is a huge part of almost any modern site. Solr is an incredibly powerful search tool that allows us to quickly add advanced search capabilities such as full-text search, faceting, autocomplete and spelling suggestions to our projects without much effort. We will be using 'django-haystack' to communicate between Django and Solr.

TRANSCRIPT

Page 1: Advanced Search with Solr & django-haystack

ADVANCED SEARCH WITH

SOLR + DJANGO-HAYSTACK

MARCEL CHASTAINLA DJANGO – 2014-09-30

Page 2: Advanced Search with Solr & django-haystack

WHAT WE’LL COVER

1. THE PITCH:

The Problem With Search

The Solution(s)

Overall Architecture of System with Django/Solr/Haystack

2. THE GOOD STUFF:

Indexing Data for Search

Querying the Search Index

Advanced Search Methods

Resources

Page 3: Advanced Search with Solr & django-haystack

THE PITCH

OR, “WHY ANY OF THIS MATTERS”

Page 4: Advanced Search with Solr & django-haystack

THE PROBLEM

1. Sites with stored information are ONLY as useful as they are at retrieving and displaying that information

Page 5: Advanced Search with Solr & django-haystack

THE PROBLEM

2. Users have high expectations of search (thanks, Google)

Page 6: Advanced Search with Solr & django-haystack

THE PROBLEM

2. Users have high expectations of search

• Spelling Suggestions:

Page 7: Advanced Search with Solr & django-haystack

THE PROBLEM

2. Users have high expectations of search

• Hit Highlighting:

Page 8: Advanced Search with Solr & django-haystack

THE PROBLEM

2. Users have high expectations of search

• “Related Searches”• Distance/GeoSpatial Search

Page 9: Advanced Search with Solr & django-haystack

THE PROBLEM

2. Users have high expectations of search• Faceting:

Page 10: Advanced Search with Solr & django-haystack

THE PROBLEM

3. Good search involves lots of challenges

Page 11: Advanced Search with Solr & django-haystack

THE PROBLEM

3. Good search involves lots of challenges

• Stemming:

“argue”“argues”“argued”

“argu”

“argument”“arguments”

“argument”

User Searches For Word “Stem”

Page 12: Advanced Search with Solr & django-haystack

THE PROBLEM

3. Good search involves lots of challenges

And more..!

• Synonyms• Acronyms• Non-ASCII characters• Stop words (“and”, “to”, “a”)• Calculating relevance• Performance with millions/billions(!) of documents

Page 13: Advanced Search with Solr & django-haystack

THE SOLUTION

“Information Retrieval Systems”a.k.a Search Engines

Page 14: Advanced Search with Solr & django-haystack

THE SOLUTION

“Information Retrieval Systems”a.k.a Search Engines

Page 15: Advanced Search with Solr & django-haystack

SOLR

THE BACKEND

Page 16: Advanced Search with Solr & django-haystack

WHAT IS SOLR?Open-source enterprise search

Java-based

Created in 2004

Built on Apache Lucene

Most popular enterprise search engine

Apache 2.0 License

Built for millions or billions of documents

Page 17: Advanced Search with Solr & django-haystack

WHAT DOES IT DO?• Full-text search

• Hit highlighting

• Faceted search

• Clustering/replication/sharding

• Database integration

• Rich document (word, pdf, etc) handling

• Geospatial search

• Spelling corrections/suggestions

• … loads and loads more

Page 18: Advanced Search with Solr & django-haystack

WHO USES SOLR?

Page 19: Advanced Search with Solr & django-haystack

HOW CAN WE USE IT WITH DJANGO?

Haystack

From the homepage:

(http://haystacksearch.org/)

Page 20: Advanced Search with Solr & django-haystack

LOOK FAMILIAR?

Query style

Declarative search index definitions

Page 21: Advanced Search with Solr & django-haystack

THE GOOD STUFFINSTALLING, CONFIGURING & USING SOLR/HAYSTACK

Page 22: Advanced Search with Solr & django-haystack

WHO DOES WHATSolr:

• Provides API for submitting to & querying from index

• Stores actual index data

• Manages fields/data types in xml config (‘schema.xml’)

Haystack:• Manages connection(s) to solr• Provides familiar API for querying • Uses templates and declarative search index definitions• Helps generate solr xml config• Management commands to index content• Generic views/forms for common search use-cases• Hooks into signals to keep data up-to-date

Page 23: Advanced Search with Solr & django-haystack

PART 1:LET’S MAKE AN INDEX

Page 24: Advanced Search with Solr & django-haystack

0. GITHUB REPO

git clone https://github.com/marcelchastain/haystackdemo

Page 25: Advanced Search with Solr & django-haystack

1. SETUP SOLR(from github repo root)

./solr_download.sh

(or, manually)

wget http://apache.mirrors.pair.com/lucene/solr/4.10.1/solr-4.10.1.tgz

tar –xzvf solr-4.10.1.tgz

ln –s ./solr-4.10.1 ./solr

The one file to care about:• solr/example/solr/collection1/conf/schema.xml

Stores field definitions and data types. Frequently updated during development

Page 26: Advanced Search with Solr & django-haystack

2. RUN SOLR

(from github repo root)

./solr_start.sh

(or, manually)

cd solr/example && java –jar start.jar

Requires java 1.7+. To install on debian/ubuntu:sudo apt-get install openjdk-7-jre-headless

Page 27: Advanced Search with Solr & django-haystack

3. INSTALL HAYSTACK

(CWD haystackdemo/)

apt-get install python-pip python-virtualenv

virtualenv env && source env/bin/activate

(from github repo root)

pip install –r requirements.txt

(or, manually)

pip install Django==1.6.7 django-haystack

Page 28: Advanced Search with Solr & django-haystack

4. HAYSTACK SETTINGSINSTALLED_APPS = [

# ‘django.contrib.admin’, etc

‘haystack’,

# then your usual apps

‘myapp’,

]

HAYSTACK_CONNECTIONS = {

‘default’: {

‘ENGINE’: ‘haystack.backends.solr_backend.SolrEngine’,

‘URL’: ‘http://127.0.0.1:8983/solr’

},

}

HAYSTACK_SIGNAL_PROCESSOR = ‘haystack.signals.RealtimeSignalProcessor’

Page 29: Advanced Search with Solr & django-haystack

5. THE MODEL(S)

Page 30: Advanced Search with Solr & django-haystack

6. SYNCDB & INITIAL DATA

(CWD haystackdemo/demo/)

./manage.py syncdb

./manage.py loaddata restaurants

Page 31: Advanced Search with Solr & django-haystack

7. DEFINE SEARCH INDEXmyapp/search_indexes.py

Page 32: Advanced Search with Solr & django-haystack

7.5 BOOSTING FIELD RELEVANCE

Some fields are simply more relevant!(Note: changes to field boosts require reindex)

Page 33: Advanced Search with Solr & django-haystack

8. CREATE A TEMPLATE FOR INDEXED TEXT

templates/search/indexes/myapp/note_text.txt

Page 34: Advanced Search with Solr & django-haystack

9. UPDATE SOLR SCHEMA

(CWD: haystackdemo/demo/)

./manage.py build_solr_schema >

../solr/example/solr/collection1/conf/schema.xml

Which adds:

*Restart solr for changes to go into effect

Page 35: Advanced Search with Solr & django-haystack

10. REBUILD INDEX

(CWD hackstackdemo/demo/)

$ ./manage.py update_index

Indexing 6 notes

Page 36: Advanced Search with Solr & django-haystack

10. REBUILD INDEX

(CWD hackstackdemo/demo/)

$ ./manage.py update_index

Indexing 6 notes

Page 37: Advanced Search with Solr & django-haystack

PART 2:LET’S GET TO QUERYIN’

Page 38: Advanced Search with Solr & django-haystack

SIMPLE SEARCHQUERYSETS

Page 39: Advanced Search with Solr & django-haystack

GREAT, WHAT ABOUT FROM A BROWSER?

Page 40: Advanced Search with Solr & django-haystack

EASY MODE

urls.py

templates/search/search.html

Full-document search

Page 41: Advanced Search with Solr & django-haystack

HAYSTACK COMPONENTS TO EXTEND

• haystack.forms.SearchFormdjango form with extendable .search() method. Define additional fields on the form, then incorporate them in the .search() method’s logic

• haystack.views.SearchViewClass-based view made to be flexible for common search cases

Page 42: Advanced Search with Solr & django-haystack

PART 3: FEATURES

Page 43: Advanced Search with Solr & django-haystack

HIT HIGHLIGHTING

Instead of referring to a context variable directly, use the {% highlight %} tag

Page 44: Advanced Search with Solr & django-haystack

SPELLING SUGGESTIONSUpdate connection’s settings dictionary + reindex

Use spelling_suggestion() method

Page 45: Advanced Search with Solr & django-haystack

AUTOCOMPLETECreate another search index field using EdgeNgramField + reindex

Use the .autocomplete() method on a SearchQuerySet

Page 46: Advanced Search with Solr & django-haystack

FACETINGAdd faceting to search index definition

Regenerate schema.xml and reindex content

./manage.py build_solr_schema >

../solr/example/solr/collection1/conf/schema.xml

./manage.py update_index

Page 47: Advanced Search with Solr & django-haystack

FACETINGFrom a shell:

Page 48: Advanced Search with Solr & django-haystack

RESOURCES

LET’S SAVE YOU A GOOGLE TRIP

Page 49: Advanced Search with Solr & django-haystack

RESOURCES

Solr in Action ($45)Apr 2014

Haystack Documentationhttp://django-haystack.readthedocs.org/

IRC (freenode):#django#haystack#solr