apache solr

34
Apache Open Source Full Text Search Server

Upload: semih-hakkioglu

Post on 11-Apr-2017

180 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Apache Solr

Apache

Open Source Full Text Search Server

Page 2: Apache Solr

• What is Solr?

• Solr Architecture

• Install & Configure

• Search, Index, Update & Delete

Page 3: Apache Solr

What is Solr ?

• Solr is full text search server with REST-like API

• Document index with JSON, XML, CSV or binary over HTTP

• Query document with HTTP GET

• Receive JSON, XML, CSV or binary result

Page 4: Apache Solr

Solr History

• 2004 - Solr was created by Yonik Seeley• 2006 - Solr was joined in Apache• 2006 - Solr version 1.1.0. was released• 2010 - Solr and Lucene merged• 2012 - Solr version 4.0 was released• 2015 - Solr version 5.0 was released• 2016 - Solr version 6.0 was released

Page 5: Apache Solr

Solr Features

• Fuzzy & Proximity Search• Filter Query• Faceting• Highlighting• Stats• Spellcheck• Grouping• Admin Panel

Page 6: Apache Solr

Who uses Solr ?

Page 7: Apache Solr

Solr Architecture

Page 8: Apache Solr

SolrTerminology

• Core

• Document

• Field

• FieldType

• Analyzer

• Filter

• Tokenizer

Page 9: Apache Solr

CommonField Attribute

• name

• type

• indexed

• stored

• multivalued

• required

• compressed

Page 10: Apache Solr

Install&configure

• brew install solr

• schema.xml

• solrconfig.xml

• solr start -p port

Page 11: Apache Solr

schema • uniquekey

• fieldtype

• analyzer

• filter

• tokenizer

• field

• dynamic Field

• copyField

Page 12: Apache Solr

solrconfig• Data directory

• Query Cache parameters

• Request Handlers

• Update Handler (update log, autocommit )

• Lucene version

Page 13: Apache Solr

Custom Field Type

<fieldType name="text_general" class="solr.TextField”positionIncrementGap="100"> <analyzer type="index">

<tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>

</analyzer> <analyzer type="query">

<tokenizer class="solr.StandardTokenizerFactory"/><filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true"

synonyms="synonyms.txt"/><filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>

</analyzer> </fieldType>

Page 14: Apache Solr
Page 15: Apache Solr

Loggingdefault log folder : solr/logs/*

log4j.properties : solr/resources/*

Page 16: Apache Solr

SearchDocument

• q

• fq

• start

• row

• sort

• fl

• wt

Page 17: Apache Solr

QuerySyntax

Keyword Matchingname:foo bar

name:”foo bar”

name:foo -name:bar

Wildcard Matchingtitle:foo*

title:foo*bar

Range Search ( field:[0 TO 1] )Boosts ( title:foo^1.5 OR body:foo )

Page 18: Apache Solr

Fuzzy&ProximitySearch

• Fuzzy Search

• title”iphone”~0.5

• Proximity Search

• Title:”foo bar”~2

• foo abc def bar

• bar abc foo

Page 19: Apache Solr

FilterQuery return result without influence document score

faster than query

Page 20: Apache Solr

Faceting • facet.query

• facet.field

• facet.mincount -> f.<field.name>.facet.mincount

• facet.limit -> f.<field.name>.facet.limit

• facet.offset -> f.<field.name>.facet.offset

• facet.sort count, facet.sort index

• tagging & excluding Filter

Page 21: Apache Solr

Faceting• Facet.range

• Facet.range.start

• Facet.range.finish

• facet.range.gap

Page 22: Apache Solr

Faceting

Page 23: Apache Solr

Highlighting hl=true

fl

simple.pre

simple.post

"highlighting": {

"37477": {

"name": ["Apple <em>IPhone</em> 6S"]

}

}

Page 24: Apache Solr
Page 25: Apache Solr

Statsstats=true&stats.field=field.name

• min

• max

• count

• sum

• sumOfSquares

Page 26: Apache Solr

Spellingspellcheck.q=Keyword&spellcheck=on

"spellcheck": {"suggestions":

["father",{"numFound": 3,"startOffset": 0,"endOffset": 6,"origFreq": 20,"suggestion": [

{"word": "feather","freq": 3},{"word": "farmer","freq": 4},{"word": "fisher","freq": 3}]

}],

"correctlySpelled": false}

Page 27: Apache Solr

Groupinggroup=true&group.field=year

"grouped":{ "year":{ "matches":10683, "groups":[{ "groupValue":1995, "doclist":{"numFound":361,"start":0,"docs":[ { "movie_id":"movie_32", "id":"32", "name":"12 Monkeys (Twelve Monkeys)", "year":1995, "genre":["Sci-Fi", "Thriller"], "_version_":1545364353246560258}] }}, { "groupValue":1994, "doclist":{"numFound":307,"start":0,"docs":[ { "movie_id":"movie_889", "id":"889", "name":"1-900 (06)", "year":1994, "genre":["Drama", "Romance"], "_version_":1545364353356660743}] }}}

Page 28: Apache Solr

IndexData • post command -c coreName -p port

• Rest API

• SolrJ, Spring Data Solr or Other libraries

• DataImportHandler

Page 29: Apache Solr

REST API Sample (XML)

curl -X POST "http://localhost:8080/solr/films/update?commit=true" -H "Content-Type: text/xml"

-d '<add> <doc> <field name="id">100000</field>

<field name="name">Toy2 Story</field> </doc>

</add>'

Page 30: Apache Solr

REST API Sample (JSON)

curl -X POST 'http://localhost:8983/solr/new_core/update?commit=true' -H 'Content-Type: application/json' -d'[

{"id": "1","name": "movie name 1"

},{

"id": "1","name": "movie name 2"}

]'

Page 31: Apache Solr

DataImportHandler (Mysql)

<dataConfig><dataSource type=”JdbcDataSource”

driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/db" user="username" password="password" />

<document><entity name="film"

query="select id,name from film" deltaQuery="select id from film where last_modified > '$

{dataimporter.last_index_time}'"> <field column="id" name="id" /><field column="name" name="name" /> </entity> </document></dataConfig>

Page 32: Apache Solr

Update Data

curl -X POST "http://localhost:8983/solr/new_core/update?commit=true" -H "Content-Type: text/xml" -d ‘[{"id":"1","movie_id":{"set":”new_movie_id"}}]'

Page 33: Apache Solr

Delete Data

http://localhost:8983/solr/new_core/update?commit=true&stream.body=<delete><query>*:*</query></delete>

Page 34: Apache Solr

Admin Panel&

Demo