add powerful full text search to your web app with solr

33
Powerful Full-Text Search with Solr Yonik Seeley [email protected] Web 2.0 Expo, Berlin 8 November 2007 download at http://www.apache.org/~yonik

Upload: adunne

Post on 17-Jan-2015

12.062 views

Category:

Technology


6 download

DESCRIPTION

Speaker: Yonik Seeley

TRANSCRIPT

Page 1: Add Powerful Full Text Search to Your Web App with Solr

Powerful Full-Text Search with Solr

Yonik [email protected]

Web 2.0 Expo, Berlin8 November 2007

download athttp://www.apache.org/~yonik

Page 2: Add Powerful Full Text Search to Your Web App with Solr

What is Lucene• High performance, scalable, full-text

search library• Focus: Indexing + Searching Documents

– “Document” is just a list of name+value pairs• No crawlers or document parsing• Flexible Text Analysis (tokenizers + token

filters)• 100% Java, no dependencies, no config

files

Page 3: Add Powerful Full Text Search to Your Web App with Solr

What is Solr• A full text search server based on Lucene• XML/HTTP, JSON Interfaces• Faceted Search (category counting)• Flexible data schema to define types and fields• Hit Highlighting• Configurable Advanced Caching• Index Replication• Extensible Open Architecture, Plugins• Web Administration Interface• Written in Java5, deployable as a WAR

Page 4: Add Powerful Full Text Search to Your Web App with Solr

admin update select

Standard request handler

Custom request handler

XML response writer

JSON response writer

XML Update Handler

CSV Update Handler

Lucene

Basic App

Documentsuper_name: Mr. Fantasticname: Reed Richardscategory: superheropowers: elasticity

Query Response(matching docs)

Query(powers:agility)

http://solr/update http://solr/select

Ser

vlet

Con

tain

er Solr

HTML

WebappIndexer

Page 5: Add Powerful Full Text Search to Your Web App with Solr

Indexing Data

HTTP POST to http://localhost:8983/solr/update

<add><doc><field name=“id”>05991</field><field name=“name”>Peter Parker</field><field name=“supername”>Spider-Man</field><field name=“category”>superhero</field><field name=“powers”>agility</field><field name=“powers”>spider-sense</field>

</doc></add>

Page 6: Add Powerful Full Text Search to Your Web App with Solr

Indexing CSV data

Iron Man, Tony Stark, superhero, powered armor | flightSandman, William Baker|Flint Marko, supervillain, sand transformWolverine,James Howlett|Logan, superhero, healing|adamantiumMagneto, Erik Lehnsherr, supervillain, magnetism|electricity

http://localhost:8983/solr/update/csv?fieldnames=supername,name,category,powers&separator=,&f.name.split=true&f.name.separator=|&f.powers.split=true&f.powers.separator=|

Page 7: Add Powerful Full Text Search to Your Web App with Solr

Data upload methodsURL=http://localhost:8983/solr/update/csv

• HTTP POST body (curl, HttpClient, etc)curl $URL -H 'Content-type:text/plain; charset=utf-8' --data-binary @info.csv

• Multi-part file upload (browsers)• Request parameter?stream.body=‘Cyclops, Scott Summers,…’

• Streaming from URL (must enable)?stream.url=file://data/info.csv

Page 8: Add Powerful Full Text Search to Your Web App with Solr

Indexing with SolrJ// Solr’s Java Client API… remote or embedded/local!SolrServer server = new

CommonsHttpSolrServer("http://localhost:8983/solr");

SolrInputDocument doc = new SolrInputDocument();doc.addField("supername","Daredevil");doc.addField("name","Matt Murdock");doc.addField(“category",“superhero");

server.add(doc);server.commit();

Page 9: Add Powerful Full Text Search to Your Web App with Solr

Deleting Documents• Delete by Id, most efficient<delete><id>05591</id><id>32552</id>

</delete>

• Delete by Query<delete><query>category:supervillain</query>

</delete>

Page 10: Add Powerful Full Text Search to Your Web App with Solr

Commit• <commit/> makes changes visible

– Triggers static cache warming in solrconfig.xml

– Triggers autowarming from existing caches• <optimize/> same as commit, merges all

index segments for faster searching_0.fnm_0.fdt_0.fdx_0.frq_0.tis_0.tii_0.prx_0.nrm

_0_1.del

_1.fnm_1.fdt_1.fdx[…]

Lucene Index Segments

Page 11: Add Powerful Full Text Search to Your Web App with Solr

Searchinghttp://localhost:8983/solr/select?q=powers:agility

&start=0&rows=2&fl=supername,category

<response><result numFound=“427" start="0"><doc> <str name=“supername">Spider-Man</str><str name=“category”>superhero</str>

</doc> <doc> <str name=“supername">Msytique</str><str name=“category”>supervillain</str>

</doc></result>

</response>

Page 12: Add Powerful Full Text Search to Your Web App with Solr

Response Format• Add &wt=json for JSON formatted response

{“result": {"numFound":427, "start":0,"docs": [

{“supername”:”Spider-Man”, “category”:”superhero”},{“supername”:” Msytique”, “category”:” supervillain”}

]}

• Also Python, Ruby, PHP, SerializedPHP, XSLT

Page 13: Add Powerful Full Text Search to Your Web App with Solr

Scoring• Query results are sorted by score descending• VSM – Vector Space Model• tf – term frequency: numer of matching terms in field• lengthNorm – number of tokens in field• idf – inverse document frequency• coord – coordination factor, number of matching

terms• document boost• query clause boost

http://lucene.apache.org/java/docs/scoring.html

Page 14: Add Powerful Full Text Search to Your Web App with Solr

Explainhttp://solr/select?q=super fast&indent=on&debugQuery=on

<lst name="debug"><lst name="explain"><str name="id=Flash,internal_docid=6">

0.16389132 = (MATCH) product of:0.32778263 = (MATCH) sum of:0.32778263 = (MATCH) weight(text:fast in 6), product of:0.5012072 = queryWeight(text:fast), product of:2.466337 = idf(docFreq=5)0.20321926 = queryNorm

0.65398633 = (MATCH) fieldWeight(text:fast in 6), product of:1.4142135 = tf(termFreq(text:fast)=2)2.466337 = idf(docFreq=5)0.1875 = fieldNorm(field=fast, doc=6)

0.5 = coord(1/2)</str><str name="id=Superman,internal_docid=7">

0.1365761 = (MATCH) product of:

Page 15: Add Powerful Full Text Search to Your Web App with Solr

Lucene Query Syntax1. justice league

• Equiv: justice OR league• QueryParser default operator is “OR”/optional

2. +justice +league –name:aquaman• Equiv: justice AND league NOT name:aquaman

3. “justice league” –name:aquaman4. title:spiderman^10 description:spiderman5. description:“spiderman movie”~100

Page 16: Add Powerful Full Text Search to Your Web App with Solr

Lucene Query Examples21. releaseDate:[2000 TO 2007]2. Wildcard searches: sup?r, su*r, super*3. spider~

• Fuzzy search: Levenshtein distance• Optional minimum similarity: spider~0.7

4. *:*5. (Superman AND “Lex Luthor”) OR

(+Batman +Joker)

Page 17: Add Powerful Full Text Search to Your Web App with Solr

DisMax Query Syntax• Good for handling raw user queries

– Balanced quotes for phrase query– ‘+’ for required, ‘-’ for prohibited– Separates query terms from query structure

http://solr/select?qt=dismax&q=super man // the user query&qf=title^3 subject^2 body // field to query&pf=title^2,body // fields to do phrase queries&ps=100 // slop for those phrase q’s&tie=.1 // multi-field match reward&mm=2 // # of terms that should match &bf=popularity // boost function

Page 18: Add Powerful Full Text Search to Your Web App with Solr

DisMax Query Form• The expanded Lucene Query:

+( DisjunctionMaxQuery( title:super^3 | subject:super^2 | body:super)DisjunctionMaxQuery( title:man^3 | subject:man^2 | body:man)

)DisjunctionMaxQuery(title:”super man”~100^2

body:”super man”~100)FunctionQuery(popularity)

• Tip: set up your own request handler with default parameters to avoid clients having to specify them

Page 19: Add Powerful Full Text Search to Your Web App with Solr

Function Query

• Allows adding function of field value to score– Boost recently added or popular documents

• Current parser only supports function notation• Example: log(sum(popularity,1))• sum, product, div, log, sqrt, abs, pow• scale(x, target_min, target_max)

– calculates min & max of x across all docs• map(x, min, max, target)

– useful for dealing with defaults

Page 20: Add Powerful Full Text Search to Your Web App with Solr

Boosted Query

• Score is multiplied instead of added– New local params <!...> syntax added

&q=<!boost b=sqrt(popularity)>super man

• Parameter dereferencing in local params&q=<!boost b=$boost v=$userq>&boost=sqrt(popularity)&userq=super man

Page 21: Add Powerful Full Text Search to Your Web App with Solr

Analysis & Search Relevancy

LexCorp BFG-9000

LexCorp BFG-9000

BFG 9000Lex Corp

LexCorp

bfg 9000lex corp

lexcorp

WhitespaceTokenizer

WordDelimiterFilter catenateWords=1

LowercaseFilter

Lex corp bfg9000

Lex bfg9000

bfg 9000Lex corp

bfg 9000lex corp

WhitespaceTokenizer

WordDelimiterFilter catenateWords=0

LowercaseFilter

Query Analysis

A Match!

Document Indexing Analysis

corp

Page 22: Add Powerful Full Text Search to Your Web App with Solr

Configuring Relevancy<fieldType name="text" class="solr.TextField"><analyzer><tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/><filter class="solr.SynonymFilterFactory"

synonyms="synonyms.txt“/><filter class="solr.StopFilterFactory“

words=“stopwords.txt”/><filter class="solr.EnglishPorterFilterFactory"

protected="protwords.txt"/></analyzer>

</fieldType>

Page 23: Add Powerful Full Text Search to Your Web App with Solr

Field Definitions• Field Attributes: name, type, indexed, stored,

multiValued, omitNorms, termVectors

<field name="id“ type="string" indexed="true" stored="true"/><field name="sku“ type="textTight” indexed="true" stored="true"/><field name="name“ type="text“ indexed="true" stored="true"/><field name=“inStock“ type=“boolean“ indexed="true“ stored=“false"/><field name=“price“ type=“sfloat“ indexed="true“ stored=“false"/><field name="category“ type="text_ws“ indexed="true" stored="true“

multiValued="true"/>

• Dynamic Fields

<dynamicField name="*_i" type="sint“ indexed="true" stored="true"/><dynamicField name="*_s" type="string“ indexed="true" stored="true"/><dynamicField name="*_t" type="text“ indexed="true" stored="true"/>

Page 24: Add Powerful Full Text Search to Your Web App with Solr

copyField• Copies one field to another at index time• Usecase #1: Analyze same field different ways

– copy into a field with a different analyzer– boost exact-case, exact-punctuation matches– language translations, thesaurus, soundex

<field name=“title” type=“text”/><field name=“title_exact” type=“text_exact”

stored=“false”/><copyField source=“title” dest=“title_exact”/>

• Usecase #2: Index multiple fields into single searchable field

Page 25: Add Powerful Full Text Search to Your Web App with Solr
Page 26: Add Powerful Full Text Search to Your Web App with Solr
Page 27: Add Powerful Full Text Search to Your Web App with Solr
Page 28: Add Powerful Full Text Search to Your Web App with Solr

Facet Queryhttp://solr/select?q=foo&wt=json&indent=on&facet=true&facet.field=cat&facet.query=price:[0 TO 100]&facet.query=manu:IBM

{"response":{"numFound":26,"start":0,"docs":[…]},“facet_counts":{

"facet_queries":{ "price:[0 TO 100]":6,“manu:IBM":2},

"facet_fields":{ "cat":[ "electronics",14, "memory",3,

"card",2, "connector",2]}}}

Page 29: Add Powerful Full Text Search to Your Web App with Solr

Filters• Filters are restrictions in addition to the query• Use in faceting to narrow the results• Filters are cached separately for speed

1. User queries for memory, query sent to solr is&q=memory&fq=inStock:true&facet=true&…

2. User selects 1GB memory size&q=memory&fq=inStock:true&fq=size:1GB&…

3. User selects DDR2 memory type&q=memory&fq=inStock:true&fq=size:1GB

&fq=type:DDR2&…

Page 30: Add Powerful Full Text Search to Your Web App with Solr

Highlightinghttp://solr/select?q=lcd&wt=json&indent=on&hl=true&hl.fl=features

{"response":{"numFound":5,"start":0,"docs":[ {"id":"3007WFP", “price”:899.95}, …]

"highlighting":{"3007WFP":{ "features":["30\" TFT active matrix <em>LCD</em>, 2560 x 1600”

"VA902B":{ "features":["19\" TFT active matrix <em>LCD</em>, 8ms response time, 1280 x 1024 native resolution"]}}}

Page 31: Add Powerful Full Text Search to Your Web App with Solr

MoreLikeThis• Selects documents that are “similar” to the

documents matching the main query.&q=id:6H500F0

&mlt=true&mlt.fl=name,cat,features"moreLikeThis":{

"6H500F0":{"numFound":5,"start":0,"docs”: [

{"name":"Apple 60 GB iPod with Video Playback Black", "price":399.0,

"inStock":true, "popularity":10, […]}, […]

][…]

Page 32: Add Powerful Full Text Search to Your Web App with Solr

High Availability

Load Balancer

Appservers

Solr Searchers

Solr Master

DBUpdaterupdates

updatesadmin queries

Index Replication

admin terminal

HTTP search requests

Dynamic HTML Generation

Page 33: Add Powerful Full Text Search to Your Web App with Solr

Resources• WWW

– http://lucene.apache.org/solr– http://lucene.apache.org/solr/tutorial.html– http://wiki.apache.org/solr/

• Mailing Lists– [email protected][email protected]