coffee at dbg- solr introduction

Post on 26-Jan-2015

108 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

An event conducted at DBG about Apache Solr as part of Coffee at DBG program.

TRANSCRIPT

Apache SolrPrepared by

Nithin S, Sajin TMDigital Brand Group

Apache solr is a search server written in Java using the java search library “lucene”.

Open source Get results using web service as JSON/XML UTF-8 support

Introduction

Ebay Hp Guardian Cisco At&t Intoit Ford http://wiki.apache.org/solr/PublicServers

Who uses Solr?

Text based library in Java Fast , feature rich with active apache

development community Inverted Index mechanism - Index the

content related to the terms/words

What is Lucene?

Server

Solr 4.3.0 Java server containers ( Tomcat/Jetty Servers ) Java 1.6 and above

Client

Any system which can post and get data through http

Requirements

Solr Model

Schema – can consider as a db table

Core - schema container

Collection – multiple core handling

DIH - Data import handler

Request handler - StandardRequestHandler , DisMaxRequestHandler (multiple fields), IndexInfoRequestHandler 

Response handler - xml , json , python,ruby

Common terms

Start Solr java -jar start.jar

This will start up t he Jetty application server on port 8983, and use your terminal to display the logging information from Solr.

Index your data java -jar post.jar *.xml

Interface http://localhost:8983/solr

Start server

The Solr Home directory typically contains the following sub-directories...

conf/ This directory is mandatory and must contain your solrconfig.xml and schema.xml. Any other optional configuration files would also be kept here.

data/ This directory is the default location where Solr will keep your index, and is used by the replication scripts for dealing with snapshots. You can override this location in the conf/solrconfig.xml. Solr will create this directory if it does not already exist.

lib/ This directory is optional. If it exists, Solr will load any Jars found in this directory and use them to resolve any "plugins" specified in your solrconfig.xml or schema.xml (ie: Analyzers, Request Handlers, etc...). Alternatively you can use the <lib> syntax in conf/solrconfig.xml to direct Solr to your plugins. See the example conf/solrconfig.xml file for details.

Basic Directory Structure

solr-php-client Pecl extention for solr

PHP Clients

Structuring Solr schema

Field options

Indexed Stored multiValued compressed

add/update  - allows you to add or update a document to Solr. Additions and updates are not available for searching until a commit takes place.

commit  - tells Solr that all changes made since the last commit should be made available for searching.

optimize  - restructures Lucene's files to improve performance for searching. Optimization is generally good to do when indexing has completed. If there are frequent updates, you should schedule optimization for low-usage times. An index does not need to be optimized to work properly. Optimization can be a time-consuming process. 

delete  - can be specified by id or by query. Delete by id deletes the document with the specified id; delete by query deletes all documents returned by a query.

Indexing options

Supported formats  XML, JSON, CSV, or javabin.Supported document types are Microsoft office docs, PDF’s

curl http://localhost:8983/solr/collection1/update/csv -H Content-type:text/csv; charset=utf-8 --data-binary @D:/Projects/solr-4.3.0/example/exampledocs/books.csv

http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/%3E

Upload schema data

Query parametersq The query to search with in Solr. See "Lucene QueryParser

Syntax" in Resources for a full description of the syntax. Sorting information can be included by appending a semi-colon and the name of an indexed, non-tokenized field (explained below). The default sort is score desc, which means sort by descending score.

q=myField:Java AND otherField:developerWorks; date ascThis query searches the two fields specified and sorts the results based on a date field.

start Specifies the starting offset into the result set. Useful for paging through results. The default value is 0.

start=15Returns results starting with the fifteenth ranked result.

rows The maximum number of documents to return. The default value is 10.

rows=25

fq Provide an optional filtering query. Results of the query are restricted to searching only those results returned by the filter query. Filtered queries are cached by Solr. They are very useful for improving the speed of complex queries.

Any valid query that could be passed in the q parameter, not including sort information.

hl When hl=true, highlight snippets in the query response. Default is false. See the Solr Wiki section on highlighting parameters for more options (in Resources).

hl=true

fl Specify as a comma-separated list the set of Fields that should be returned in the document results. "*" is the default and means all fields. "score" indicates the score should be returned as well.

*,score

Full text search http://localhost:8983/solr/select?q=Searchtext

Search only within a field http://localhost:8983/solr/select?q=fieldname:searchtext

Control which fields are displayed in result http://localhost:8983/solr/select?q=video&fl=id,category

Provide ranges to fields http://localhost:8983/solr/select?q=price:[0 TO400]&fl=id,name,price

More like this (MLT) http://localhost:8983/solr/select?

q=Searchtext&mlt=true&mlt.fl=headline&mlt.mindf=1&mlt.mintf=1&fl=id,score&rows=100

More information on how this works and the options available can be found at http://wiki.apache.org/solr/MoreLikeThis

Search

Sample search result

Faceted searchhttp://localhost:8983/solr/query?q=camera&facet=true&facet.field=manu

Features Hit Highlight Auto suggest Spell suggestion Spatial search

Removing Data from Indexcurl http://localhost:8983/solr/collection1/update -H "Content-Type: text/xml“ --data-binary “<delete><query>*:*</query></delete>”

Thank you

top related