tutorial on developing a solr search component plugin
DESCRIPTION
In this set of slides we give a step by step tutorial on how to develop a fully functional solr search component plugin. Additionally we provide links to full source code which can be used as a template to rapidly start creating your own search components.TRANSCRIPT
Develop a Solr SearchComponent Plugin
Solr is◦ Blazing fast open source enterprise search
platform ◦ Lucene-based search server◦ Written in Java ◦ Has REST-like HTTP/XML and JSON APIs◦ Extensive plugin architecture
Introduction
http://lucene.apache.org/solr/
Allows for the development of plugins which provide advanced operations
Types of plugins:◦ RequestHandlers
Uses url parameters and returns own response ◦ SearchComponents
Responses are embedded in other responses (such as /select)
◦ ProcessFactory Response is stored into a field along with the
document during index time
Plugin Framework
A quick tutorial on how to program a SearchComponent to◦ Be initialized◦ Parse configuration file arguments◦ Do something useful on search request (counts
some words in indexed documents)◦ Format and return response
We’ll name our plugin “DemoSearchComponent” and show how to stick it into the solrconfig.xml for loading
Goal of this Presentation
In the next slide, we’ll specify a list of variables called “words”, and each list subtype is a string “word”
We want to load these specific words and then count them in all result sets of queries.
Ex: config file has “body”, “fish”, “dog”◦ Indexed Document has: dog body body body fish
fish fish fish orange◦ Result should be:
body=3.0 fish=4.0 dog=1.0
Plugin Goal
<searchComponent class="com.searchbox.DemoSearchComponent" name="democomponent"> <str name=“field">myfield</str> <lst name="words"> <str name="word">body</str> <str name="word">fish</str> <str name="word">dog</str> </lst></searchComponent>
Add Component to solrconfig.xml
• We tell Solr the name of the class which has our component
• Variables will be loaded from this section during the init method
• We set a default field for analyzing the documents
• We specify a list of words we’d like to have counts of
We can see that we’re asking for Solr to load com.searchbox.DemoSearchComponent.
This will be the output of our project in .jar file format
Copy the .jar file to the lib directory in the Solr installation so that Solr can find it.
That’s it!
Last of the setup
package com.searchbox;
import java.io.IOException;
import java.util.Date;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.logging.Level;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexableField;
import org.apache.solr.common.SolrException;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.common.util.SimpleOrderedMap;
import org.apache.solr.core.SolrCore;
import org.apache.solr.core.SolrEventListener;
import org.apache.solr.handler.component.ResponseBuilder;
import org.apache.solr.handler.component.SearchComponent;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.search.DocIterator;
import org.apache.solr.search.DocList;
import org.apache.solr.search.SolrIndexSearcher;
import org.apache.solr.util.plugin.SolrCoreAware;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
In the beginning…
Just some of the common packages
we’ll need to import to get things rolling!
public class DemoSearchComponent extends SearchComponent {
private static Logger LOGGER = LoggerFactory.getLogger(DemoSearchComponent.class); volatile long numRequests; volatile long numErrors; volatile long totalRequestsTime; volatile String lastnewSearcher; volatile String lastOptimizeEvent; protected String defaultField; private List<String> words;
In the beginning… (part 2)
• We specify that our class extends SearchComponent, so we know we’re in business!
• We decide that we’ll keep track of some basic statistics for future usage• Number of requests/errors • Total time
• Make a variable to store our defaultField and our words.
Initialization is called when the plugin is first loaded
This most commonly occurs when Solr is started up
At this point we can load things from file (models, serialized objects, etc)
Have access to the variables set in solrconfig.xml
Initialization
We have selected to pass a list called “words” and have also provided the list “fish”, ”body”, ”cat” of words we’d like to count.
During initialization we need to load this list from solrconfig.xml and store it locally
Parse Config File Arguments
@Override
public void init(NamedList args) {
super.init(args);
defaultField = (String) args.get("field");
if (defaultField == null) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify the default for analysis");
}
words = ((NamedList) args.get("words")).getAll("word");
if (words.isEmpty()) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify at least one word in searchComponent config!");
}
}
Doing Initialization
Notice that we’ve loaded the list “words” and then all of its attributes called “word” and put them into the class level variable words.
Also we’ve identified our defaultField
There are 2 phases in a searchComponent◦ Prepare◦ Process
During a query the prepare method is called on all components before any work is done.
This allows modifying, adding or substracting variables or components in the stack
Afterwards, the process methods are called for the components in the exact order specified by the solrconfig
Search Components in two phases
@Override public void prepare(ResponseBuilder rb) throws IOException { //none necessary }
Empty Prepare
Nothing going on here, but we need to override it otherwise
we can’t extend SearchComponent
@Override
public void process(ResponseBuilder rb) throws IOException {
numRequests++;
SolrParams params = rb.req.getParams();
long lstartTime = System.currentTimeMillis();
SolrIndexSearcher searcher = rb.req.getSearcher();
NamedList response = new SimpleOrderedMap();
String queryField = params.get("field");
String field = null;
if (defaultField != null) {
field = defaultField;
}
if (queryField != null) {
field = queryField;
}
if (field == null) {
LOGGER.error("Fields aren't defined, not performing counting.");
return;
}
Do something useful -1
• We start off by keeping track in a volatile variable the number of requests we’ve seen (for use later in statistics), and we’d like to know how long the process takes so we note the time.
• We create a new NamedList which will hold this components response
• We look at the URL parameters to see if there is a “field” variable present. We have set this up to override the default we loaded from the config file
DocList docs = rb.getResults().docList;
if (docs == null || docs.size() == 0) {
LOGGER.debug("No results");
}
LOGGER.debug("Doing This many docs:\t" + docs.size());
Set<String> fieldSet = new HashSet<String>();
SchemaField keyField = rb.req.getCore().getSchema().getUniqueKeyField();
if (null != keyField) {
fieldSet.add(keyField.getName());
}
fieldSet.add(field);
Do something useful - 2
• Since the search has already been completed, we get a list of documents which will be returned.
• We also need to pull from the schema the field which contains the unique id. This will let us correlate our results with the rest of the response
DocIterator iterator = docs.iterator();
for (int i = 0; i < docs.size(); i++) {
try {
int docId = iterator.nextDoc();
HashMap<String, Double> counts = new HashMap<String, Double>();
Document doc = searcher.doc(docId, fieldSet);
IndexableField[] multifield = doc.getFields(field);
for (IndexableField singlefield : multifield) {
for (String string : singlefield.stringValue().split(" ")) {
if (words.contains(string)) {
Double oldcount = counts.containsKey(string) ? counts.get(string) : 0;
counts.put(string, oldcount + 1);
}
}
}
String id = doc.getField(keyField.getName()).stringValue();
NamedList<Double> docresults = new NamedList<Double>();
for (String word : words) {
docresults.add(word, counts.get(word));
}
response.add(id, docresults);
} catch (IOException ex) {
java.util.logging.Logger.getLogger(DemoSearchComponent.class.getName()).log(Level.SEVERE, null, ex);
}
}
Do something useful - 3• Get a document iterator to look
through all docs• Setup count variable this doc• Load the document through the
searcher• Get the value of the field• BEWARE if it is a multifield, using
getField will only return the first instance, not ALL instances
• Do our basic word counting• Get the document unique id from
the keyfield• Add each word to the results for
the doc• Add the doc result to the overall
response, using its id value
rb.rsp.add("demoSearchComponent", response);
totalRequestsTime += System.currentTimeMillis() - lstartTime;
}
Wrapping it up
• Add all results to the final response
• The name we pick here will show up in the Solr output
• Note down how long it took for the entire process
@Override public String getDescription() { return "Searchbox DemoSearchComponent"; } @Override public String getVersion() { return "1.0"; } @Override public String getSource() { return "http://www.searchbox.com"; } @Override public NamedList<Object> getStatistics() { NamedList all = new SimpleOrderedMap<Object>(); all.add("requests", "" + numRequests); all.add("errors", "" + numErrors); all.add("totalTime(ms)", "" + totalTime); return all; }
Bits and Bobs• In order to have a production
grade plugin, users expect to see certain pieces of information available in their Solr admin panel
• Description, version and source are just Strings
• We see getStatistics() actually uses the volatile variables we were keeping track of before, sticks them into another named list and returns them. These appear under the statistics panel in Solr.
That’s it!
<requestHandler name="/demoendpoint" class="solr.SearchHandler">
<arr name="last-components">
<str>democomponent</str>
</arr>
</requestHandler>
Adding Component to a Handler
We need some way to run our searchComponent, so we’ll add a quick requestHandler to test it. This is done simply by overriding the normal searchHandler and telling it to run the component we defined on an earlier slide. Of course you could use your component directly in the select handler and/or add it to a chain of other components! Solr is super versatile!
http://192.168.56.101:8983/solr/corename/demoendpoint?q=*%3A*&wt=xml&rows=2&fl=id,myfield
Testing
<response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">79</int> </lst> <result name="response" numFound="13262" start="0"> <doc> <str name="id">f73ca075-3826-45d5-85df-64b33c760efc</str> <arr name="myfield"> <str>dog body body body fish fish fish fish orange</str> </arr> </doc> <doc> <str name="id">bc72dbef-87d1-4c39-b388-ec67babe6f05</str> <arr name="myfield"> <str>the fish had a small body. the dog likes to eat fish</str> </arr> </doc> </result> <lst name="demoSearchComponent"> <lst name="f73ca075-3826-45d5-85df-64b33c760efc"> <double name="body">3.0</double> <double name="fish">4.0</double> <double name="dog">1.0</double> </lst> <lst name="bc72dbef-87d1-4c39-b388-ec67babe6f05"> <double name="body">1.0</double> <double name="fish">2.0</double> <double name="dog">1.0</double> </lst> </lst></response>
Query results
Our results
Same order + ids for correlation
Stats
• Because we’ve overridden the getStatistics() method, we can get real-time stats from the admin panel!
• In this case since it’s a component of the SearchHandler, our fields are concatenated with the other statistics
Happy Developing!
That’s All!
Full Source Code available at: http://www.searchbox.com/developing-a-solr-plugin/