tutorial on developing a solr search component plugin

Develop a Solr SearchComponent Plugin

[email protected]

Solr is◦ Blazing fast open source enterprise search

platform ◦ Lucene-based search server◦ Written in Java ◦ Has REST-like HTTP/XML and JSON APIs◦ Extensive plugin architecture

Introduction

http://lucene.apache.org/solr/



Allows for the development of plugins which provide advanced operations

Types of plugins:◦ RequestHandlers

Uses url parameters and returns own response ◦ SearchComponents

Responses are embedded in other responses (such as /select)

◦ ProcessFactory Response is stored into a field along with the

document during index time

Plugin Framework

A quick tutorial on how to program a SearchComponent to◦ Be initialized◦ Parse configuration file arguments◦ Do something useful on search request (counts

some words in indexed documents)◦ Format and return response

We’ll name our plugin “DemoSearchComponent” and show how to stick it into the solrconfig.xml for loading

Goal of this Presentation

In the next slide, we’ll specify a list of variables called “words”, and each list subtype is a string “word”

We want to load these specific words and then count them in all result sets of queries.

Ex: config file has “body”, “fish”, “dog”◦ Indexed Document has: dog body body body fish

fish fish fish orange◦ Result should be:

body=3.0 fish=4.0 dog=1.0

Plugin Goal

<searchComponent class="com.searchbox.DemoSearchComponent" name="democomponent"> <str name=“field">myfield</str> <lst name="words"> <str name="word">body</str> <str name="word">fish</str> <str name="word">dog</str> </lst></searchComponent>

Add Component to solrconfig.xml

• We tell Solr the name of the class which has our component

• Variables will be loaded from this section during the init method

• We set a default field for analyzing the documents

• We specify a list of words we’d like to have counts of

We can see that we’re asking for Solr to load com.searchbox.DemoSearchComponent.

This will be the output of our project in .jar file format

Copy the .jar file to the lib directory in the Solr installation so that Solr can find it.

That’s it!

Last of the setup

package com.searchbox;

import java.io.IOException;

import java.util.Date;

import java.util.HashMap;

import java.util.HashSet;

import java.util.List;

import java.util.Set;

import java.util.logging.Level;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.IndexableField;

import org.apache.solr.common.SolrException;

import org.apache.solr.common.params.SolrParams;

import org.apache.solr.common.util.NamedList;

import org.apache.solr.common.util.SimpleOrderedMap;

import org.apache.solr.core.SolrCore;

import org.apache.solr.core.SolrEventListener;

import org.apache.solr.handler.component.ResponseBuilder;

import org.apache.solr.handler.component.SearchComponent;

import org.apache.solr.schema.SchemaField;

import org.apache.solr.search.DocIterator;

import org.apache.solr.search.DocList;

import org.apache.solr.search.SolrIndexSearcher;

import org.apache.solr.util.plugin.SolrCoreAware;

import org.slf4j.Logger;

import org.slf4j.LoggerFactory;

In the beginning…

Just some of the common packages

we’ll need to import to get things rolling!

public class DemoSearchComponent extends SearchComponent {

private static Logger LOGGER = LoggerFactory.getLogger(DemoSearchComponent.class); volatile long numRequests; volatile long numErrors; volatile long totalRequestsTime; volatile String lastnewSearcher; volatile String lastOptimizeEvent; protected String defaultField; private List<String> words;

In the beginning… (part 2)

• We specify that our class extends SearchComponent, so we know we’re in business!

• We decide that we’ll keep track of some basic statistics for future usage• Number of requests/errors • Total time

• Make a variable to store our defaultField and our words.

Initialization is called when the plugin is first loaded

This most commonly occurs when Solr is started up

At this point we can load things from file (models, serialized objects, etc)

Have access to the variables set in solrconfig.xml

Initialization

We have selected to pass a list called “words” and have also provided the list “fish”, ”body”, ”cat” of words we’d like to count.

During initialization we need to load this list from solrconfig.xml and store it locally

Parse Config File Arguments

@Override

public void init(NamedList args) {

super.init(args);

defaultField = (String) args.get("field");

if (defaultField == null) {

throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify the default for analysis");

}

words = ((NamedList) args.get("words")).getAll("word");

if (words.isEmpty()) {

throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify at least one word in searchComponent config!");

}

}

Doing Initialization

Notice that we’ve loaded the list “words” and then all of its attributes called “word” and put them into the class level variable words.

Also we’ve identified our defaultField

There are 2 phases in a searchComponent◦ Prepare◦ Process

During a query the prepare method is called on all components before any work is done.

This allows modifying, adding or substracting variables or components in the stack

Afterwards, the process methods are called for the components in the exact order specified by the solrconfig

Search Components in two phases

@Override public void prepare(ResponseBuilder rb) throws IOException { //none necessary }

Empty Prepare

Nothing going on here, but we need to override it otherwise

we can’t extend SearchComponent

@Override

public void process(ResponseBuilder rb) throws IOException {

numRequests++;

SolrParams params = rb.req.getParams();

long lstartTime = System.currentTimeMillis();

SolrIndexSearcher searcher = rb.req.getSearcher();

NamedList response = new SimpleOrderedMap();

String queryField = params.get("field");

String field = null;

if (defaultField != null) {

field = defaultField;

}

if (queryField != null) {

field = queryField;

}

if (field == null) {

LOGGER.error("Fields aren't defined, not performing counting.");

return;

}

Do something useful -1

• We start off by keeping track in a volatile variable the number of requests we’ve seen (for use later in statistics), and we’d like to know how long the process takes so we note the time.

• We create a new NamedList which will hold this components response

• We look at the URL parameters to see if there is a “field” variable present. We have set this up to override the default we loaded from the config file

DocList docs = rb.getResults().docList;

if (docs == null || docs.size() == 0) {

LOGGER.debug("No results");

}

LOGGER.debug("Doing This many docs:\t" + docs.size());

Set<String> fieldSet = new HashSet<String>();

SchemaField keyField = rb.req.getCore().getSchema().getUniqueKeyField();

if (null != keyField) {

fieldSet.add(keyField.getName());

}

fieldSet.add(field);

Do something useful - 2

• Since the search has already been completed, we get a list of documents which will be returned.

• We also need to pull from the schema the field which contains the unique id. This will let us correlate our results with the rest of the response

DocIterator iterator = docs.iterator();

for (int i = 0; i < docs.size(); i++) {

try {

int docId = iterator.nextDoc();

HashMap<String, Double> counts = new HashMap<String, Double>();

Document doc = searcher.doc(docId, fieldSet);

IndexableField[] multifield = doc.getFields(field);

for (IndexableField singlefield : multifield) {

for (String string : singlefield.stringValue().split(" ")) {

if (words.contains(string)) {

Double oldcount = counts.containsKey(string) ? counts.get(string) : 0;

counts.put(string, oldcount + 1);

}

}

}

String id = doc.getField(keyField.getName()).stringValue();

NamedList<Double> docresults = new NamedList<Double>();

for (String word : words) {

docresults.add(word, counts.get(word));

}

response.add(id, docresults);

} catch (IOException ex) {

java.util.logging.Logger.getLogger(DemoSearchComponent.class.getName()).log(Level.SEVERE, null, ex);

}

}

Do something useful - 3• Get a document iterator to look

through all docs• Setup count variable this doc• Load the document through the

searcher• Get the value of the field• BEWARE if it is a multifield, using

getField will only return the first instance, not ALL instances

• Do our basic word counting• Get the document unique id from

the keyfield• Add each word to the results for

the doc• Add the doc result to the overall

response, using its id value

rb.rsp.add("demoSearchComponent", response);

totalRequestsTime += System.currentTimeMillis() - lstartTime;

}

Wrapping it up

• Add all results to the final response

• The name we pick here will show up in the Solr output

• Note down how long it took for the entire process

@Override public String getDescription() { return "Searchbox DemoSearchComponent"; } @Override public String getVersion() { return "1.0"; } @Override public String getSource() { return "http://www.searchbox.com"; } @Override public NamedList<Object> getStatistics() { NamedList all = new SimpleOrderedMap<Object>(); all.add("requests", "" + numRequests); all.add("errors", "" + numErrors); all.add("totalTime(ms)", "" + totalTime); return all; }

Bits and Bobs• In order to have a production

grade plugin, users expect to see certain pieces of information available in their Solr admin panel

• Description, version and source are just Strings

• We see getStatistics() actually uses the volatile variables we were keeping track of before, sticks them into another named list and returns them. These appear under the statistics panel in Solr.

That’s it!

http://www.google.com/search?hl=en&q=allinurl:string+java.sun.com&btnI=I'm%20Feeling%20Lucky



<requestHandler name="/demoendpoint" class="solr.SearchHandler">

<arr name="last-components">

<str>democomponent</str>

</arr>

</requestHandler>

Adding Component to a Handler

We need some way to run our searchComponent, so we’ll add a quick requestHandler to test it. This is done simply by overriding the normal searchHandler and telling it to run the component we defined on an earlier slide. Of course you could use your component directly in the select handler and/or add it to a chain of other components! Solr is super versatile!

http://192.168.56.101:8983/solr/corename/demoendpoint?q=*%3A*&wt=xml&rows=2&fl=id,myfield

Testing

<response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">79</int> </lst> <result name="response" numFound="13262" start="0"> <doc> <str name="id">f73ca075-3826-45d5-85df-64b33c760efc</str> <arr name="myfield"> <str>dog body body body fish fish fish fish orange</str> </arr> </doc> <doc> <str name="id">bc72dbef-87d1-4c39-b388-ec67babe6f05</str> <arr name="myfield"> <str>the fish had a small body. the dog likes to eat fish</str> </arr> </doc> </result> <lst name="demoSearchComponent"> <lst name="f73ca075-3826-45d5-85df-64b33c760efc"> <double name="body">3.0</double> <double name="fish">4.0</double> <double name="dog">1.0</double> </lst> <lst name="bc72dbef-87d1-4c39-b388-ec67babe6f05"> <double name="body">1.0</double> <double name="fish">2.0</double> <double name="dog">1.0</double> </lst> </lst></response>

Query results

Our results

Same order + ids for correlation

Stats

• Because we’ve overridden the getStatistics() method, we can get real-time stats from the admin panel!

• In this case since it’s a component of the SearchHandler, our fields are concatenated with the other statistics

Happy Developing!

That’s All!

Full Source Code available at: http://www.searchbox.com/developing-a-solr-plugin/

tutorial on developing a solr search component plugin

Technology