apache solr liferay
Post on 26-Jan-2015
143 Views
Preview:
DESCRIPTION
TRANSCRIPT
Apache SolrEnterprise search platform
from the Apache Lucene project
Rivet Logic Corporation1800 Alexander Bell DriveSuite 400Reston, VA 20191Ph: 703.955.3480 Fax: 703.234.7711
What is Solr?
● Search Server● Built upon Apache Lucene ● Fast, very● Scalable, query load and collection size● Interoperable● Extensible● Lucene power exposed over HTTP● Spell checking, highlighting, faceting and etc.● Caching● Replication● Distributed search
How stuff works?
schema.xml
● Field types○ <fieldType name="text" class="solr.TextField" indexed="true" />
● Fields○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/>
● Unique key (optional) ○ <uniqueKey>id</uniqueKey>
● copy fields○ <copyField source="developers" dest="df"/>
● dynamic fields○ <dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
● similarity configuration○ Similarity is the scoring routine for each document vs. a query
solrconfig.xml
● Lucene indexing parameters○ <mergeFactor>10</mergeFactor>○ <ramBufferSizeMB>32</ramBufferSizeMB>
● Cache settings○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="
32"/>
● Request handler configuration○ <requestHandler name="dismax" class="solr.SearchHandler" >
● HTTP cache settings○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr">
● Search components, response writers, query parsers○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
○ <queryResponseWriter name="velocity" class="org.apache.solr.request.VelocityResponseWriter"/>
○ <queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>
Request Handler
<requestHandler name="/itas" class="solr.SearchHandler"> <lst name="defaults"> <str name="v.template">browse</str> <str name="v.properties">velocity.properties</str> <str name="title">Solritas</str>
<str name="wt">velocity</str> <str name="defType">dismax</str> <str name="q.alt">*:*</str> <str name="rows">10</str> <str name="fl">*,score</str> <str name="facet">on</str> <str name="facet.field">df</str> <str name="facet.mincount">1</str> <str name="hl">true</str> <str name="hl.fl">developers</str> <str name="qf"> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> </lst> </requestHandler>
Response Writer
● A Response Writer generates the formatted response of a search.
● The wt parameter selects the Response Writer to be used
● json, php, phps, python, ruby, xml, xslt, velocity
<queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"> <int name="xsltCacheLifetimeSeconds">5</int> </queryResponseWriter>
Analyzers, Tokenizers, Filters
● The Analyzer class is a native Lucene concept that determines how tokens are produced from a piece of text
<fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/></fieldType>
● The job of a tokenizer is to break up a stream of text into tokens
<fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer></fieldType>
● A token looks at each Token in the stream sequentially and decides whether to pass it along, replace it or discard it
Other features
● Highlighting○ &hl=true&hl.fl=developers
● Synonyms○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
● Spell check○ The spell check component can return a list of alternative spelling
suggestions. ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
● Content Streams○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in
solrconfig.xml ● Solr Cell
○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many
other types ● More like this
○ http://wiki.apache.org/solr/MoreLikeThis
Indexing with solrJ
SolrServer solr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr"));SolrInputDocument doc = new SolrInputDocument();doc.addField("id", "EXAMPLEDOC01");doc.addField("title", "NOVAJUG SolrJ Example");solr.add(doc);solr.commit(); // after a batch, not per documentsolr.optimize(); // periodically, if/when needed
Data Import Handler
● Indexes relational database, XML data, and e-mail sources
● Supports full and incremental/delta indexing● Highly extensible with custom data sources,
transformers, etc● http://wiki.apache.org/solr/DataImportHandler
Replication
● Master is polled● Replicant pulls Lucene index and optionally also Solr
configuration files● Query throughput scaling: replicate and load balance● http://wiki.apache.org/solr/SolrReplication
Demo
● Download solr ○ http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/
● Start solr○ cd <solr_home>/example○ java -jar start.jar
● Post documents○ cd <solr_home>/example/exampledocs○ java -jar post.jar *.xml○ java -jar post.jar cw.xml
● Access Solr○ http://localhost:8983/solr/admin/
● Querying solr○ http://localhost:8983/solr/select/?q=binesh○ http://localhost:8983/solr/select/?q=binny○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1○ http://localhost:8983/solr/itas/
● Luke○ http://www.getopt.org/luke/
Liferay + Solr: Motivation
● Centralizing search index in clustered Liferay environment
● Performance improvement○ Re-indexing costs too much for large DB's○ Often time indexes of Liferay deployments in a cluster are not
synchronized
Liferay + Solr: Configuration 1
Install Solr (http://lucene.apache.org/solr)
Setting up environment variables● SOLR_HOME = /${solr installed folder}● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data"
solr.xml● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content
<?xml version="1.0" encoding="utf-8"> <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="$SOLR_HOME" override="true" /> </Context>
Liferay + Solr: Configuration 2
schema.xml● This file tells Solr how to index the data coming from Liferay, and can be
customized for your installation. ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have
to create the conf directory) in your Solr home folder.... <fields><field name="comments" type="text" indexed="true" stored="true" /><field name="content" type="text" indexed="true" stored="true" /><field name="description" type="text" indexed="true" stored="true" /><field name="name" type="text" indexed="true" stored="true" /><field name="properties" type="text" indexed="true" stored="true" /><field name="title" type="text" indexed="true" stored="true" /><field name="uid" type="string" indexed="true" stored="true" /><field name="url" type="text" indexed="true" stored="true" /><field name="userName" type="text" indexed="true" stored="true" /><field name="version" type="text" indexed="true" stored="true" /><dynamicField name="*" type="string" indexed="true" stored="true" /></fields><uniqueKey>uid</uniqueKey><defaultSearchField>content</defaultSearchField> ... <copyField source="comments" dest="content"/> ... ...
Liferay + Solr: Configuration 3
Copy WAR file● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war
into $SOLR_HOME/example; where ${solr.version} represents Solr version number, i.e., 1.4.0.
Start Liferay/tomcat● Solr will be picked up and "solr" will be deployed automatically under
${tomcat}/webapps folder
Install solr-web Liferay plugin● Latest Liferay plugin can be checked out from the following location
http://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web● Build the checked out plugin and deploy it
Liferay + Solr: Configuration 4
Final Step● We need to rebuild Liferay search indexes● Control Panel > Server Administration
Liferay + Solr: How it works
... <bean id="solrServer" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> <constructor-arg type="java.lang.String" value="http://localhost:8080/solr" /> </bean> <bean id="indexSearcher.solr" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl"><property name="solrServer" ref="solrServer" /> </bean> <bean id="indexWriter.solr" class="com.liferay.portal.search.solr.SolrIndexWriterImpl"><property name="commit" value="true" /><property name="solrServer" ref="solrServer" /> </bean> ...
solr-spring.xml (from solr-web plugin)
Liferay + Solr: Back to the default?
● Simply undeploy solr-web plugin● Rebuild search indexes using the control panel described
in the previous step
top related