"searching with solr" - tyler harms, south dakota code camp 2012

30

Upload: blend-interactive

Post on 07-May-2015

582 views

Category:

Documents


2 download

DESCRIPTION

"Searching with Solr" by Tyler Harms, given November 10, 2012, at South Dakota Code Camp 2012 in Sioux Falls.

TRANSCRIPT

Page 1: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

Tyler HarmsDeveloper

@harmstyler

[email protected]

AN INTRODUCTION

Searching with Solr

1

Saturday, November 10, 12

Page 2: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

Why Implement Solr?

• Does your site need search?• Is google enough?• Do you need/want to control rankings?• Just text, or Structured Data?

2

Saturday, November 10, 12

Page 3: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

What is Solr?

3

Solr is a standalone enterprise search server with a REST-like API. You put documents in it [...] over HTTP. You query it via HTTP GET and receive [...] results.

Saturday, November 10, 12

Page 4: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

4

Saturday, November 10, 12

Page 5: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

• Current Version(s)• Solr 3.6.1• Solr 4

• Released Versions are always stable

5

Solr Versions

Saturday, November 10, 12

Page 6: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

6

$ wget http://(...)/3.6.1/apache-solr-3.6.1.tgz

$ tar -xzf apache-solr-3.6.1.tgz

$ cd apache-solr-3.6.1/example/

$ java -jar start.jar

(a lot of java log...)

Saturday, November 10, 12

Page 7: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

• Google• Lucene• elasticsearch• Whoosh• Xapien• Many Others

7

Search Alternatives

Saturday, November 10, 12

Page 8: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

NOT a Database Replacement

• Solr is designed to live alongside your website as a separate web app

8

Saturday, November 10, 12

Page 9: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

9

Frontend Servers[1..n]Database Master

Database Slaves[0..n]

Solr Master

Solr Slaves[0..n]

10

Saturday, November 10, 12

Page 10: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

Scaling Solr

• Master/Slave Architecture• Write to master -> Read from slaves

• Multicore Setup• Multiple Solr ‘cores’ running alongside each other within the same install

10

Saturday, November 10, 12

Page 11: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Solr’s Data Model

• Solr maintains a collection of documents• A document is a collection of fields and values• A field can occur multiple times in a doc• Documents are immutable• They can be deleted and replaced by new versions, however.

11

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 12: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Querying

• http request• http://localhost:8983/solr/select?q=blend&start=0&rows=10

12

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 13: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Solr Query Syntax

• blend (value)• company:blend (field:value)• title:”Searching with Solr” AND text:apache• id:[* TO *]• *:* (all fields : all values)

13

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 14: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Using Solr

• Getting Data into Solr• Getting Data out of Solr

14

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 15: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Getting Data into Solr

• POST it

15

SEARCHING WITH SOLR

<add> <doc> <field name="abstract">Lorem ipsum</field> <field name="company">Blend Interactive</field> <field name="text">Lorem Ipsum</field> <field name="title">Some Title</field> </doc> [<doc> ... </doc>[<doc> ... </doc>]]</add>

Saturday, November 10, 12

Page 16: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Getting Data into Solr

• POST it

16

SEARCHING WITH SOLR

<add> <doc> <field name="abstract">Lorem ipsum</field> <field name="company">Blend Interactive</field> <field name="text">Lorem Ipsum</field> <field name="title">Some Title</field> </doc> [<doc> ... </doc>[<doc> ... </doc>]]</add>

Saturday, November 10, 12

Page 17: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Getting Data into Solr

• POST it

17

SEARCHING WITH SOLR

<add> <doc> <field name="abstract">Lorem ipsum</field> <field name="company">Blend Interactive</field> <field name="text">Lorem Ipsum</field> <field name="title">Some Title</field> </doc> [<doc> ... </doc>[<doc> ... </doc>]]</add>

Saturday, November 10, 12

Page 18: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Commiting

• Nothing shows up in the index until you commit• You can just POST <commit/> to:• http://<host>:<port>/solr/update

18

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 19: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Getting Data out of Solr

• http://localhost:8983/solr/select/?q=solr

19

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 20: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

20

<response><lst name="responseHeader">

<int name="status">0</int><int name="QTime">19</int><lst name="params">

<str name="q">solr</str></lst>

</lst><result name="response" numFound="1" start="0">

<doc><str name="abstract">A brief introduction to using Apache Solr for implementing search for your website.</str><str name="django_ct">codecamp.session</str><str name="django_id">19</str><str name="id">codecamp.session.19</str><str name="text">Searching with Solr: An Introduction A brief introduction to using Apache Solr for implementing search for your website.</str><str name="title">Searching with Solr: An Introduction</str>

</doc></result>

</response>

Saturday, November 10, 12

Page 21: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

21

<response><lst name="responseHeader">

<int name="status">0</int><int name="QTime">19</int><lst name="params">

<str name="q">solr</str></lst>

</lst><result name="response" numFound="1" start="0">

<doc><str name="abstract">A brief introduction to using Apache Solr for implementing search for your website.</str><str name="django_ct">codecamp.session</str><str name="django_id">19</str><str name="id">codecamp.session.19</str><str name="text">Searching with Solr: An Introduction A brief introduction to using Apache Solr for implementing search for your website.</str><str name="title">Searching with Solr: An Introduction</str>

</doc></result>

</response>

Saturday, November 10, 12

Page 22: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

22

<response><lst name="responseHeader">

<int name="status">0</int><int name="QTime">19</int><lst name="params">

<str name="q">solr</str></lst>

</lst><result name="response" numFound="1" start="0">

<doc><str name="abstract">A brief introduction to using Apache Solr for implementing search for your website.</str><str name="django_ct">codecamp.session</str><str name="django_id">19</str><str name="id">codecamp.session.19</str><str name="text">Searching with Solr: An Introduction A brief introduction to using Apache Solr for implementing search for your website.</str><str name="title">Searching with Solr: An Introduction</str>

</doc></result>

</response>

Saturday, November 10, 12

Page 23: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Getting Data out of Solr: JSON

• http://localhost:8983/solr/select/?q=solr&wt=json

23

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 24: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

24

{"responseHeader": {

"status":0,"QTime":0,"params": {

"wt":"json","q":"solr"

}},"response": {

"numFound":1,"start":0,"docs":[{

"django_id":"19","title":"Searching with Solr: An Introduction","text":"Searching with Solr: An Introduction\nA brief introduction to using Apache Solr for implementing search for your website.","abstract":"A brief introduction to using Apache Solr for implementing search for your website.","django_ct":"codecamp.session","id":"codecamp.session.19"

}]}

}

Saturday, November 10, 12

Page 25: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Deleting Data from Solr

• POST it

25

SEARCHING WITH SOLR

<delete><id>codecamp.session.19</id></delete><delete><query>company:blend</query></delete>

Saturday, November 10, 12

Page 26: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

The Solr Schema

• schema.xml• Defines ‘types’ used in the webapp• Defines the fields• Defines ‘copyfields’• Read the schema inside the example project for more

26

Saturday, November 10, 12

Page 27: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

The Solr Schema

• Types• Define how a field and query should be processed• Word Stemming• Case Folding• How would you handle a search for ‘C.I.A.’?

• Dates, ints, floats, etc.. are defined here as well• 2 Modes• Index Time• Query Time

27

Saturday, November 10, 12

Page 28: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

28

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"><analyzer type="index">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer><analyzer type="query">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer></fieldType>

Saturday, November 10, 12

Page 29: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

29

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"><analyzer type="index">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer><analyzer type="query">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer></fieldType>

Saturday, November 10, 12

Page 30: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

30

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"><analyzer type="index">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer><analyzer type="query">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer></fieldType>

Saturday, November 10, 12

Page 31: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

Fields

• The elements of a document• Both Predefined and Dynamic• Fields may occur multiple times• May be indexed and/or stored

31

Saturday, November 10, 12

Page 32: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

32

<fields><!-- general --><field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/><field name="django_ct" type="string" indexed="true" stored="true" multiValued="false" /><field name="django_id" type="string" indexed="true" stored="true" multiValued="false" /><!-- dynamic --><dynamicField name="*_i" type="sint" indexed="true" stored="true"/><dynamicField name="*_s" type="string" indexed="true" stored="true"/><dynamicField name="*_l" type="slong" indexed="true" stored="true"/><dynamicField name="*_t" type="text" indexed="true" stored="true"/><dynamicField name="*_b" type="boolean" indexed="true" stored="true"/><dynamicField name="*_f" type="sfloat" indexed="true" stored="true"/><dynamicField name="*_d" type="sdouble" indexed="true" stored="true"/><dynamicField name="*_dt" type="date" indexed="true" stored="true"/><!-- app --><field name="bio" type="text" indexed="true" stored="true" multiValued="false" /><field name="title" type="text" indexed="true" stored="true" multiValued="false" /><field name="text" type="text" indexed="true" stored="true" multiValued="false" /><field name="abstract" type="text" indexed="true" stored="true" multiValued="false" /><field name="full_name" type="text" indexed="true" stored="true" multiValued="false" /><field name="company" type="text" indexed="true" stored="true" multiValued="false" />

</fields>

Saturday, November 10, 12

Page 33: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

33

<fields><!-- general --><field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/><field name="django_ct" type="string" indexed="true" stored="true" multiValued="false" /><field name="django_id" type="string" indexed="true" stored="true" multiValued="false" /><!-- dynamic --><dynamicField name="*_i" type="sint" indexed="true" stored="true"/><dynamicField name="*_s" type="string" indexed="true" stored="true"/><dynamicField name="*_l" type="slong" indexed="true" stored="true"/><dynamicField name="*_t" type="text" indexed="true" stored="true"/><dynamicField name="*_b" type="boolean" indexed="true" stored="true"/><dynamicField name="*_f" type="sfloat" indexed="true" stored="true"/><dynamicField name="*_d" type="sdouble" indexed="true" stored="true"/><dynamicField name="*_dt" type="date" indexed="true" stored="true"/><!-- app --><field name="bio" type="text" indexed="true" stored="true" multiValued="false" /><field name="title" type="text" indexed="true" stored="true" multiValued="false" /><field name="text" type="text" indexed="true" stored="true" multiValued="false" /><field name="abstract" type="text" indexed="true" stored="true" multiValued="false" /><field name="full_name" type="text" indexed="true" stored="true" multiValued="false" /><field name="company" type="text" indexed="true" stored="true" multiValued="false" />

</fields>

Saturday, November 10, 12

Page 34: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

34

<fields><!-- general --><field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/><field name="django_ct" type="string" indexed="true" stored="true" multiValued="false" /><field name="django_id" type="string" indexed="true" stored="true" multiValued="false" /><!-- dynamic --><dynamicField name="*_i" type="sint" indexed="true" stored="true"/><dynamicField name="*_s" type="string" indexed="true" stored="true"/><dynamicField name="*_l" type="slong" indexed="true" stored="true"/><dynamicField name="*_t" type="text" indexed="true" stored="true"/><dynamicField name="*_b" type="boolean" indexed="true" stored="true"/><dynamicField name="*_f" type="sfloat" indexed="true" stored="true"/><dynamicField name="*_d" type="sdouble" indexed="true" stored="true"/><dynamicField name="*_dt" type="date" indexed="true" stored="true"/><!-- app --><field name="bio" type="text" indexed="true" stored="true" multiValued="false" /><field name="title" type="text" indexed="true" stored="true" multiValued="false" /><field name="text" type="text" indexed="true" stored="true" multiValued="false" /><field name="abstract" type="text" indexed="true" stored="true" multiValued="false" /><field name="full_name" type="text" indexed="true" stored="true" multiValued="false" /><field name="company" type="text" indexed="true" stored="true" multiValued="false" />

</fields>

Saturday, November 10, 12

Page 35: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SEARCHING WITH SOLR

Copy Fields

• Two Main Uses• Analyze fields in different ways• Concatenate Fields

35

Saturday, November 10, 12

Page 36: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

36

<copyField source="bio" dest="df_text" /><copyField source="year" dest="century" maxChars="2"/>

Saturday, November 10, 12

Page 37: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

37

<copyField source="bio" dest="df_text" /><copyField source="year" dest="century" maxChars="2"/>

Saturday, November 10, 12

Page 38: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

38

<copyField source="bio" dest="df_text" /><copyField source="year" dest="century" maxChars="2"/>

2000 would be stored as 20Useful for custom faceting

Saturday, November 10, 12

Page 39: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

The Solr Config File

• solrconfig.xml• Defines request handlers, defaults, & caches• Read the solrconfig.xml inside the example project for more

39

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 40: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Other Solr Tools

• Debug Query• Boost Functions• Search Faceting• Search Filters• Search Highlighting• Solr Admin

40

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 41: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Debug Query Option

• Add &debugQuery=on to request parameters• Returns a parsed form of the query

41

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 42: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

42

<lst name="debug"><str name="rawquerystring">solr</str><str name="querystring">solr</str><str name="parsedquery">text:solr</str><str name="parsedquery_toString">text:solr</str><lst name="explain">

<str name="codecamp.session.19">1.2147729 = (MATCH) fieldWeight(text:solr in 17), product of: 1.4142135 = tf(termFreq(text:solr)=2) 3.9267395 = idf(docFreq=2, maxDocs=56) 0.21875 = fieldNorm(field=text, doc=17)</str>

</lst>

Saturday, November 10, 12

Page 43: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

43

<lst name="debug"><str name="rawquerystring">solr</str><str name="querystring">solr</str><str name="parsedquery">text:solr</str><str name="parsedquery_toString">text:solr</str><lst name="explain">

<str name="codecamp.session.19">1.2147729 = (MATCH) fieldWeight(text:solr in 17), product of: 1.4142135 = tf(termFreq(text:solr)=2) 3.9267395 = idf(docFreq=2, maxDocs=56) 0.21875 = fieldNorm(field=text, doc=17)</str>

</lst>

Saturday, November 10, 12

Page 44: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Boost Function

• Allows you to influence results at query time• Really useful for tuning scoring• You can also boost at index time

44

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 45: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Boost Function

• Allows you to influence results at query time• Really useful for tuning scoring• You can also boost at index time

45

SEARCHING WITH SOLR

q=blend&qf=text^2 company

Saturday, November 10, 12

Page 48: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Solr Faceting

• What is a facet?• “Interaction style where users filter a set of items by

progressively selecting from only valid values of a  faceted classification system” - Keith Instone, SOASIS&T, July 8, 2004

• What does it look like?• Make sure to use an untokenized field (e.g. string)• “San Jose” != “san”+“jose”

48

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 49: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

49

q=*:*facet=onfacet.field=company

Saturday, November 10, 12

Page 50: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Solr Filter Query

• Used to narrow your search query• Restrict the super set of documents that can be returned

• ‘fq’ parameter (short for Filter Query)

50

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 51: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Solr Filter Query

• Used to narrow your search query• Restrict the super set of documents that can be returned

• ‘fq’ parameter (short for Filter Query)

51

SEARCHING WITH SOLR

q=*:*fq=company:blend

Saturday, November 10, 12

Page 52: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Search Highlighting

• Allow Solr to generate your highlight

52

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 53: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Search Highlighting

• Allow Solr to generate your highlight

53

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 54: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

54

hl=truehl.simple.pre=<b>hl.simple.post=</b>hl.fragsize=200hl.requireFieldMatch=falsehl.fl=text bio titlehl.snippets=1

Saturday, November 10, 12

Page 55: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Solr Admin

• http://localhost:8983/solr/admin/• Built in app for testing all search options• Field Analysis• Schema Browser• Full Query Interface• Solr Statistics• Solr Information• Many More Options

55

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 56: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Solr/Browse

• Test your search configuration using the /browse requestHandler

56

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 57: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

SUB HEADLINE

Resources

• Apache Solr Website• http://lucene.apache.org/solr/• Wiki, mailing list, bugs/features

• Books

57

SEARCHING WITH SOLR

Saturday, November 10, 12

Page 58: "Searching with Solr" - Tyler Harms, South Dakota Code Camp 2012

58

Saturday, November 10, 12