ease of use in apache solr

25

Upload: anshum-gupta

Post on 29-Nov-2014

535 views

Category:

Software


1 download

DESCRIPTION

Presentation from my talk at the Minneapolis Apache Lucene/Solr meetup on Sep 23, 2014 hosted by and at Target.

TRANSCRIPT

Page 1: Ease of use in Apache Solr
Page 2: Ease of use in Apache Solr

Who am I?

• Anshum Gupta, Apache Lucene/Solr committer, Lucidworks Employee.

• Search and related stuff for 9+ years.

• Apache Lucene since 2006 and Solr since 2010 but consistent community involvement since 2012

• Organizations I am or have been a part of:

Page 3: Ease of use in Apache Solr

Apache Solr has a huge install base and tremendous momentum

most widely used search solution on the planet. 8M+

total downloads

Solr is both established & growing

250,000+monthly downloads

Solr has tens of thousands of applications in production.

You use Solr everyday.

2500+open Solr jobs.

Activity Summary30 Day summary

Aug 18 - Sep 17 2014

• 128 Commits • 18 Contributors

via https://www.openhub.net/p/solr

12 Month Summary Sep 17, 2013 - Sep 17, 2014

• 1351 Commits • 29 Contributors

Page 4: Ease of use in Apache Solr

Solr - Releases

Page 5: Ease of use in Apache Solr

Search - Until recently

• Large organizations (Enterprise)

• Expensive

• Complex

• $$$$$

Page 6: Ease of use in Apache Solr

–Someone

“Easy is good”

Page 7: Ease of use in Apache Solr

New Age Search• Everyone… startups, websites

• Special use cases

• E-commerce

• Mails and personal data

• Personal data - Across devices

• Social and Local!

• Analytics

Page 8: Ease of use in Apache Solr

Decision making!

• Short time frame

• Confidence measure:

• Getting started quick

• Configure and see the tip of the iceberg

• Issues only uncover later in the story

Page 9: Ease of use in Apache Solr

Until recently…• Getting started:

• Download

• java -jar start.jar

• SolrCloud, getting started….

• Download

• Copy example directory ‘x’ times over.

• java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar

• java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar

• It runs!

Page 10: Ease of use in Apache Solr

Times… they are a changin…

• Download

• cd solr

• Standalone: bin/solr start

• SolrCloud, example, interactive:

• bin/solr start -e cloud (< 2 minutes!)

Page 11: Ease of use in Apache Solr

Let’s index some data…

• Auto Generation of Unique Key

• Solr accepts a single doc

Page 12: Ease of use in Apache Solr

Managed Schema

• Solr is the schema owner

• REST APIs - Hide the implementation details

• When you know what you got

• Or when you don’t! (Schema-less mode)

• Update and Addition of Fields and FieldTypes

More reading: https://lucidworks.com/blog/schemaless-solr-part-1/

Page 13: Ease of use in Apache Solr

Configuration APIs

• Configure Solr using APIs

• solrconfig.xml… What did you say?

Page 14: Ease of use in Apache Solr

Data Import Handler

• Rocket science no more!

• Make things work

Page 15: Ease of use in Apache Solr

Command Line Utils

• Ping and other tasks for already running instance.

• Works for *nix and Windows too!

Page 16: Ease of use in Apache Solr

Query DSL

q=*:*&rows=0&wt=json

&facet.field=cat&indent=true

&facet.pivot=cat,popularity,inStock

&facet.pivot=popularity,cat

&facet.pivot.mincount=2

&facet.limit=5&facet=true

{ “q” : ”*:*”,

“rows” : “0”,

“facet” : {

“” : “true”,

“pivot” : {

“” : [

“cat,popularity,inStock”,

“popularity,cat” ],

“mincount” : “2”

},

“field” : “cat”,

“limit” : “5”

}

Page 17: Ease of use in Apache Solr

Solr Scale Toolkit

• Easily deploy SolrCloud clusters

• Live patching and rolling restarts

• Dependency on AWS soon to go away

• Chef or Puppet still are valid approaches

More reading: http://lucidworks.com/blog/introducing-the-solr-scale-toolkit/

Page 18: Ease of use in Apache Solr

Talking about the Admin UI…

• Already improved from 3.x

• Uploading documents

• Collections API is coming soon

Collection Actions

Page 19: Ease of use in Apache Solr

There’s so much more…

• Self describing handlers

• Improved SolrJ API

• More support for other languages

• HDFS: Auto addition of replicas

• Cross Data-center replication

• SOLR - Make an application, not ‘war’.

Page 20: Ease of use in Apache Solr

It’s easy.. and stable!

• Benchmarking

• Tons of users testing it

• Evolving test framework

Page 21: Ease of use in Apache Solr

Solr scalability is unmatched.

• 10TB+ Index Size • 10 Billion+ Documents • 100 Million+ Daily Requests

Page 22: Ease of use in Apache Solr

Solr scalability is unmatched.

Page 23: Ease of use in Apache Solr

Where is it headed?• Download

• See that server directory?

• Use start scripts

• Send a document, or a few…

• Things don’t really look the way they should?

• Use the schema APIs

• Add fields… not enough?

• Add field types and then add fields

• Configure Solr using REST APIs

For Production:

• Use Solr Scale Toolkit to deploy, patch and manage!

• Configure Solr using REST APIs

Page 24: Ease of use in Apache Solr

Lucidworks Fusion

Intelligent Search Services/API

Recommendation Module Signal Processing Analytics Service

Discovery Engine

Analytics StoreEnrichment Services⚒

Analyst Workbench

eCommerce Solution

Admin/ Management

SiLK Log Analysis

Search/ Discovery

Partner Solutions

Connector Framework

Page 25: Ease of use in Apache Solr

Connect @

http://www.twitter.com/anshumgupta

http://www.linkedin.com/in/anshumgupta/

[email protected]