solr: 4 big features

18
APACHE SOLR Four Big Features: Faceting Query auto-complete Geospatial Scaling 2014 March Presented by David Smiley at the Boston Java Meetup Group

Upload: david-smiley

Post on 26-Jan-2015

107 views

Category:

Technology


1 download

DESCRIPTION

Four Big Features: * Faceting * Query auto-complete * Geospatial * Scaling Presented at a Meetup in Boston, 11 March 2014.

TRANSCRIPT

Page 1: Solr: 4 big features

APACHE SOLR

Four Big Features: •  Faceting •  Query auto-complete •  Geospatial •  Scaling

2014 March Presented by David Smiley at the Boston Java Meetup Group

Page 2: Solr: 4 big features

About David Smiley ➢  Software Engineer (14 years)

○  Search (5 years) ○  Java, Web, Spatial

➢  Part-time employed at MITRE ➢  Part-time search consultant ➢  Apache Lucene / Solr committer & PMC ➢  Published 1st book on Solr ➢  Presented at several conferences ➢  Taught several Solr classes

Page 3: Solr: 4 big features

Faceting • Do you know what I mean by “faceting”?

•  AKA: faceted navigation, or parametric search

• Popular Apps:

•  eBay, Amazon, and many e-commerce sites

• Apps I use that don’t use faceting but I wish they did: •  http://search.maven.org and all Maven repository software: Nexus,

Artifactory, Archiva •  JIRA

•  Compare this to: http://jirasearch.mikemccandless.com/

Page 4: Solr: 4 big features

Faceted Navigation & Analytics by example…

Notice the counts

Optionally start with a keyword search or

filter

Extremely useful feature supported by very few platforms: Solr, ElasticSearch, Sphinx, … (no DBs)

Credit: Trey Grainger; CareerBuilder

Page 5: Solr: 4 big features

How to: Field Faceting •  Index setup: schema.xml: <field name=“category” type=“string” />

<field name=“manufacturer” type=“string” />

•  Facet search: http://localhost:8983/solr/

collection1/ select?

q=*:*&

facet=true&

facet.field=category&

facet.field=manufacturer

Page 6: Solr: 4 big features

How to: Numeric/Date Faceting •  Index setup: schema.xml: <field name=“timestamp” type=“tdate” />

•  Facet search: http://localhost:8983/solr/

collection1/ select?

q=*:*&

facet=true&

facet.range=timestamp&

facet.range.start=NOW/YEAR-10YEAR

facet.range.end=NOW/YEAR+1YEAR facet.range.gap=+1YEAR

Page 7: Solr: 4 big features

Query Suggest / Autocomplete

If you aren’t doing this then you really should!

Page 8: Solr: 4 big features

Several Types •  Instant search

•  Direct navigation to documents, usually by name/title/id, etc. •  Implement via edge n-grams or a Suggester •  Ex: iTunes, Netflix, …

• Query log completion •  Searches user queries you’ve captured & indexed •  Implement via edge n-grams or FreeTextSuggester •  Ex: Google

•  Term completion •  Completes indexed words •  Implement via facet.prefix technique or a Suggester

•  Facet / field value completion •  Ex: Mint.com

Not mutually exclusive!

Page 9: Solr: 4 big features

Tools for Completing / Suggesting •  The Suggester

•  A specialization of the spell-check Solr component •  8 implementations to choose from! Different pros/cons

•  Weighted? Analyzing? Infix? Highlight? Fuzzy? N-gram model?

•  Faceting with facet.prefix •  Respects your current filters – don’t suggest a 0-result response

• Edge n-grams, with standard search •  Terms component

Page 10: Solr: 4 big features

Sample Suggester Search

Search

http://localhost:8983/solr/

mbartists/ a_term_suggest? q=sma

Response

{ "responseHeader":{ "status":0, "QTime":1}, "spellcheck":{ "suggestions":[ "sma",{ "numFound":4, "startOffset":0, "endOffset":3, "suggestion":[ "small", "smart", "smash" “smalley”]}, "collation","small"]}}

Page 11: Solr: 4 big features

Geospatial Features •  Lucene/Solr can index text, numbers, dates, and spatial

data •  Features:

•  Index latitude & longitude coordinates or any X Y pairs •  Index polygons or other geometry •  Query by point-radius, rectangle, or polygon geometry

•  Including “IsWithin” vs “Intersects” vs “Contains” predicates •  2d/flat Euclidean OR geodetic spherical world model •  Sort or relevancy-boost by distance to indexed points

The NoSQL solutions with the best spatial are CouchDB, MongoDB, Solr, and ElasticSearch

Page 12: Solr: 4 big features

How to: Spatial Filter & Sort •  Index setup: schema.xml: <field name=“geo” type=“location_rpt” />

•  Index latitude comma longitude in your document: 37.7752,-100.0232

• Filter : http://localhost:8983/solr/

collection1/ select?

q=*:*& fq={!geofilt}& sort=geodist() asc& sfield=geo& pt=45.15,-93.85& d=5

Page 13: Solr: 4 big features

Cool Technology Under the Hood • Grid / tile based recursive indexed structure using a prefix tree / trie indexing approach on standard Lucene inverted index

• Future: • Precise indexed shapes • Geodetic polygons • Hilbert curve ordering

Page 14: Solr: 4 big features

Scaling Solr Solr’s mechanisms for scaling:

•  Replication •  Eliminates single point of failure •  Reduces query load on any one node •  Backups

•  Distributed-search (for sharded indexes) •  For collections of large multi-million document collections

•  SolrCloud •  Combines distributed-search and real-time replicated indexing •  Centrally manages configuration •  A higher level logical API, manages lots of coordination underneath •  Advanced: doc routing, shard splitting, migration

Page 15: Solr: 4 big features

Replication & Sharding Illustrated with a metaphor of an encyclopedia at a library

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

26 Shards

3 Replicas A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Page 16: Solr: 4 big features

Nice Admin Screen UI

Page 17: Solr: 4 big features

More Advanced SolrCloud Features • Document routing customization

•  Answers: Which shard does a document belong in? •  Hash (i.e. random) distribution •  Or keep certain related documents together (ex: for same user)

•  Helps scale when searching by a subset •  Or manage it yourself manually (ex: index by month)

• Shard splitting •  When your shard(s) get to be too big •  Live; no down-time

•  Inter-collection document migration •  Copies a subset of one collection to another, possibly new

collection •  Live; no down-time

Page 18: Solr: 4 big features

That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me: [email protected]

ETA: June 2014