solr 3.1 and beyond
Post on 10-Apr-2018
219 Views
Preview:
TRANSCRIPT
-
8/8/2019 Solr 3.1 and Beyond
1/33
Solr 3.1 and Beyond
yonik@lucidimagination.com
October 8, 2010
2
Lucid ImaginationYonik Seeley
http://www.lucidimagination.com/events/revolution2010http://www.lucidimagination.com/events/revolution2010http://www.lucidimagination.com/events/revolution2010 -
8/8/2019 Solr 3.1 and Beyond
2/33
Agenda
Goal : Introduce new features you can try & use now inSolr development versions 3.1 or 4.0
Relevancy (Extended Dismax Parser)Spatial/Geo SearchSearch Result Grouping / Field CollapsingFaceting (Pivot, Range, Per-segment)Scalability (Solr Cloud)Odds & EndsQ&A
10/12/10 3
-
8/8/2019 Solr 3.1 and Beyond
3/33
Solr 3.1? What happened to 1.5?
Lucene/Solr merged (March 2010) Single set of committers Single dev mailing list (dev@lucene.apache.org) Single shared subversion trunk Keep separate downloads, user mailing lists Other former lucene subprojects spun off (Nutch, Tika, Mahout, etc)
Development trunk is now always next major release (currently 4.0) branch_3x will be base for all 3.x releases Branch together, Release together, Share version numbers
-
8/8/2019 Solr 3.1 and Beyond
4/33
RELEVANCE
-
8/8/2019 Solr 3.1 and Beyond
5/33
Extended Dismax Parser
Superset of dismax&defType=edismax&q=foo&qf=body
Fixes edge cases where dismax could still throwexceptionsORANDNOT-
Full lucene syntax support Tries lucene syntax first Smart escaping is done if syntax errors
Optionally supports treating and/or as AND/OR inlucene syntax
Fielded queries (e.g. myfield:foo) even in degradedmode uf parameter controls what field names may be directly specified in q
-
8/8/2019 Solr 3.1 and Beyond
6/33
Extended Dismax Parser (continued)
boost parameter for multiplicative boost-by-functionPure negative query clauses
Example: solrOR(-solr)
Enhanced term proximity boosting pf2=myfield results in term bigrams in sloppy phrase queriesmyfield:aabbcc->myfield:aabbmyfield:bbcc
Enhanced stopword handling stopwords omitted in main query, but added in optional proximity boosting part
Example: q=solrisawesome&qf=myfield&pf2=myfield->
+myfield:(solrawesome)(myfield:solrismyfield:is
awesome)
Currently controlled by the absence of StopWordFilter in index analyzer, andpresence in query analyzer
-
8/8/2019 Solr 3.1 and Beyond
7/33
SPATIAL SEARCH
8
-
8/8/2019 Solr 3.1 and Beyond
8/33
Spatial Search
10/12/10 9
Step1: Index some locations!
The Alpine Shop44.013617,-73.168264
Step2: Decide where you are&pt=44.0153371,-73.16734
&d=1&sfield=store
Step3: Profit!
Spatial Filter: &fq={!geofilt}
Bounding Box: &fq={!bbox}
Distance Function: &sort=geodist() asc
-
8/8/2019 Solr 3.1 and Beyond
9/33
RESULT GROUPING /FIELD COLLAPSING
-
8/8/2019 Solr 3.1 and Beyond
10/33
Field Collapsing Definition
Field collapsing Limit the number of results per category category normally defined by unique values in a field
Uses Web Search collapse by web site Email threads collapse by thread id Ecommerce/retail
Show the top 5 items for each store category (music, movies,etc)
-
8/8/2019 Solr 3.1 and Beyond
11/33
Field Collapsing by Site
-
8/8/2019 Solr 3.1 and Beyond
12/33
Field Collapse on Product Type
Result Grouping by Category
-
8/8/2019 Solr 3.1 and Beyond
13/33
Group by Field
http://...&fl=id,name&q=ipod&group=true&group.field=manu_exact
10/12/10 14
"grouped":{
"manu_exact":{
"matches":3,
"groups":[{"groupValue":"Belkin",
"doclist":{"numFound":2,"start":0,"docs":[
{
"id":"IW-02",
"name":"iPod & iPod Mini USB 2.0 Cable"}]}},
{
"groupValue":"Apple Computer Inc.",
"doclist":{"numFound":1,"start":0,"docs":[
{"id":"MA147LL/A",
-
8/8/2019 Solr 3.1 and Beyond
14/33
Group by Query
10/12/10 15
http://...&group=true&group.query=price:[0 TO 99.99]
&group.query=price:[100 TO *]&group.limit=5
"grouped":{
"price:[0 TO 99.99]":{
"matches":3,
"doclist":{"numFound":2,"start":0,"docs":[{
"id":"IW-02",
"name":"iPod & iPod Mini USB 2.0 Cable"},
{
"id":"F8V7067-APL-KIT","name":"Belkin Mobile Power Cord for iPod"}]
}},
"price:[100 TO *]":{
"matches":3,
"doclist":{"numFound":1,"start":0,"docs":[{
-
8/8/2019 Solr 3.1 and Beyond
15/33
Grouping Params
parameter meaning default
group.field= Like facet.field group by unique field
values
group.query= Like facet.query top docs that also
match
group.function=
Group by unique values produced bythe function query
group.limit= How many docs per group 1
group.sort= How to sort documents within a group Same as
sort
param
rows= How many groups to return 10
sort= How to sort the groups relative to
each other (based on top doc)
10/12/10 16
-
8/8/2019 Solr 3.1 and Beyond
16/33
FACETING
-
8/8/2019 Solr 3.1 and Beyond
17/33
Pivot Faceting
Other names that could have made sense: Grid Faceting, Cross-Product Faceting, Matrix Faceting
Syntax: facet.pivot=field1,field2,field3,
10/12/10 18
#docs #docs w/
inStock:true
#docs w/
instock:false
cat:electronics 14 10 4
cat:memory 3 3 0
cat:connector 2 0 2
cat:graphics card 2 0 2
cat:hard drive 2 2 0
facet.pivot=cat,inStock
-
8/8/2019 Solr 3.1 and Beyond
18/33
Pivot Faceting
"facet_counts":{
"facet_pivot":{
"cat,popularity":[{
"field":"cat","value":"electronics",
"count":14,
"pivot":[{
"field":"popularity",
"value":"6","count":5},
{
"field":"popularity",
"value":"7",
"count":4},
10/12/10 19
http://...&facet=true&facet.pivot=cat,popularity(continued)
{
"field":"popularity","value":"1",
"count":2}]},
{
"field":"cat",
"value":"memory","count":3,
"pivot":[]},
[]
14 docs w/
cat==electronics
5 docs w/
cat==electronics&& popularity==6
-
8/8/2019 Solr 3.1 and Beyond
19/33
Range Faceting
Like Date faceting, butmore generic
http://...&facet=true
&facet.range=price
&facet.range.start=0
&facet.range.end=500
&facet.range.gap=50
"facet_counts":{"facet_ranges":{
"price":{"counts":{
"0.0":5,
"50.0":2,"100.0":0,
"150.0":2,
"200.0":0,
"250.0":1,"300.0":2,
"350.0":2,"400.0":0,"450.0":1},
"gap":50.0,
"start":0.0,
"end":500.0}}}}
10/12/10 20
-
8/8/2019 Solr 3.1 and Beyond
20/33
5
3
5
14
5
2
1
(null)
batman
flash
spidermansuperman
wolverine
order: for each
doc, an index into
the lookup arraylookup: the
string values
Lucene FieldCache Entry
(StringIndex) for the hero field
02
7
01
0
0
0
2
Documents
matching the
base query
Juggernaut
accumulator
increment
lookupq=Juggernaut&facet=true&facet.field=hero
Priority queue
Batman, 3flash, 5
Existing single-valued faceting
algorithm
-
8/8/2019 Solr 3.1 and Beyond
21/33
Segment1
FieldCacheEntry
Segment2
FieldCacheEntry
Segment3
FieldCacheEntry
Segment4
FieldCacheEntry
0
2
7
0
3
5
0
1
2
0
2
1
0
1
3
0
4
0
1
0
Priority queue
Batman, 3flash, 5
Base
DocSet
lookupinc accumulator1 accumulator2 accumulator3 accumulator4
FieldCache +
accumulatormerger
(Priority queue)
thread1
thread2 thread3thread4
Per-segment single-valued
algorithm
-
8/8/2019 Solr 3.1 and Beyond
22/33
Per-segment faceting
Enable with facet.method=fcsControllable multi-threading
facet.field={!threads=4}myfield
Disadvantages Larger memory use (FieldCaches + accumulators) Slower (extra FieldCache merge step needed)
Advantages Rebuilds FieldCache entries only for new segments (NRT friendly) Multi-threaded
-
8/8/2019 Solr 3.1 and Beyond
23/33
-
8/8/2019 Solr 3.1 and Beyond
24/33
Faceting Performance Improvements
For facet.method=enum, speed up initialpopulation of the filterCache (i.e. first time
facet): from 30% to 32x improvement
Optimized facet.method=fc for multi-valuedfields and large facet.limit up to 3x faster
Optimized deep facet paging up to 10x fasterwith really large facet.offsets
Less memory consumed by field cache entries
10/12/10 25
-
8/8/2019 Solr 3.1 and Beyond
25/33
SCALABILITY
-
8/8/2019 Solr 3.1 and Beyond
26/33
SolrCloud
First steps toward simplifying cluster managementIntegrates Zookeeper
Central configuration (schema.xml, solrconfig.xml, etc) Tracks live nodes + shards of collections
Removes need for external load balancersshards=localhost:8983/solr|localhost:8900/solr,
localhost:7574/solr|localhost:7500/solr
Can specify logical shard idsshards=NY_shard,NJ_shard
Clients dont need to know shards at all:http://localhost:8983/solr/collection1/select?distrib=true
-
8/8/2019 Solr 3.1 and Beyond
27/33
SolrCloud : The Future
Eliminate all single points of failureRemove Master/Searcher distinction
Enables near real-time search in a highly scalable environmentHigh Availability for Writes
Eventual consistency model (like Amazon Dynamo, Cassandra)Elastic
Simply add/subtract servers, cluster will rebalance automatically By default, Solr will handle document partitioning
-
8/8/2019 Solr 3.1 and Beyond
28/33
ODDS & ENDS
-
8/8/2019 Solr 3.1 and Beyond
29/33
Auto-Suggest
Many people currently use terms component Can be slow for a large corpus
New auto-suggest builds off SpellCheck component Compact memory based trie for really fast completions Based on a field in the main index, or on a dictionary file
http://localhost:8983/solr/suggest?wt=json&indent=true&q=ult
10/12/10 30
"spellcheck":{
"suggestions":[
"ult",{"numFound":1,"startOffset":0,
"endOffset":3,
"suggestion":["ultrasharp"]},
"collation","ultrasharp"]}}
-
8/8/2019 Solr 3.1 and Beyond
30/33
Index with JSON
$URL=http://localhost:8983/solr/update/json$curl$URL-H'Content-type:application/json'-d'
{"add":{"doc":{"id":"978-0641723445",
"cat":["book","hardcover"],"title":"TheLightningThief","author":"RickRiordan","series_t":"PercyJacksonandtheOlympians","sequence_i":1,"genre_s":"fantasy",
"inStock":true,"price":12.50,"pages_i":384}}}'
31
-
8/8/2019 Solr 3.1 and Beyond
31/33
Query Results in CSV
http://localhost:8983/solr/select?q=ipod&fl=name,price,cat,popularity&wt=csv
name,price,cat,popularity
iPod & iPod Mini USB 2.0 Cable,11.5,"electronics,connector",1
Belkin Mobile Power Cord for iPod w/ Dock,19.95,"electronics,connector",1Apple 60 GB iPod with Video Playback Black,399.0,"electronics,music",10
Can handle multi-valued fields (see cat field in example) Completely compatible with the CSV update handler (can round-trip) Results are streamed good for dumping entire parts of the index
10/12/10 32
-
8/8/2019 Solr 3.1 and Beyond
32/33
http://localhost:8983/solr/browse
10/12/10 33
-
8/8/2019 Solr 3.1 and Beyond
33/33
Q&A
For more information about Solr visit
www.lucidimagination.com
http://www.lucidimagination.com/events/revolution2010http://www.lucidimagination.com/events/revolution2010http://www.lucidimagination.com/events/revolution2010
top related