scaling solr with solrcloud

57
Scaling Solr with SolrCloud Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com

Upload: lucenerevolution

Post on 11-May-2015

1.115 views

Category:

Entertainment & Humor


2 download

DESCRIPTION

Configure your Solr cluster to handle hundreds of millions of documents without even noticing, handle queries in milliseconds, use Near Real Time indexing and searching with document versioning. Scale your cluster both horizontally and vertically by using shards and replicas. In this session you'll learn how to make your indexing process blazing fast and make your queries efficient even with large amounts of data in your collections. You'll also see how to optimize your queries to leverage caches as much as your deployment allows and how to observe your cluster with Solr administration panel, JMX, and third party tools. Finally, learn how to make changes to already deployed collections —split their shards and alter their schema by using Solr API.

TRANSCRIPT

Page 1: Scaling Solr with SolrCloud

Scaling Solr with SolrCloud

Rafał  Kuć  – Sematext Group, Inc. @kucrafal @sematext sematext.com

Page 2: Scaling Solr with SolrCloud

Ta  me…

Sematext consultant & engineer Solr.pl co-founder Father and husband

Page 3: Scaling Solr with SolrCloud

Solr History

Y. Seeley creates Solr

Incubator graduation

Solr 1.4 released

Solr 4.0 released

Solr 4.1 and counting

Lucene / Solr merge

Solr 1.3 released

Solr donated to ASF

Page 4: Scaling Solr with SolrCloud

The Past

Page 5: Scaling Solr with SolrCloud

Master – Slave Deployment

Application

Solr Master

Solr Slave Solr Slave Solr Slave Solr Slave

Page 6: Scaling Solr with SolrCloud

Master as SPOF

Application

Solr Slave Solr Slave Solr Slave Solr Slave

Solr Master

Page 7: Scaling Solr with SolrCloud

R

Replication Time

Indexing App Solr Slave

Solr Slave

Solr Master

Solr Slave

Querying App

Page 8: Scaling Solr with SolrCloud

Solr Slave Solr Slave

Solr Master

Too Much for a Single Shard

Application

Solr Master Solr Master

Solr Slave Solr Slave Solr Slave Solr Slave

Page 9: Scaling Solr with SolrCloud

Solr Slave Solr Slave

Solr Master

Too Much for a Single Shard

Application

Solr Master

Solr Slave Solr Slave Solr Slave Solr Slave

Solr Master

Page 10: Scaling Solr with SolrCloud

Doc Response Response

Shar

d1, s

hard

2,

shar

d3

Shar

d1, s

hard

2,

shar

d3

Querying in Multi Master Deployment

Solr Slave Shard 2

Solr Slave Shard 3

Solr Slave Shard 1

Application

Page 11: Scaling Solr with SolrCloud

SolrCloud Comes Into Play

Page 12: Scaling Solr with SolrCloud

Basic Glossary

https://cwiki.apache.org/confluence/display/solr/SolrCloud+Glossary

Cluster Node Collection Shard Leader & Replica Overseer

Page 13: Scaling Solr with SolrCloud

Apache ZooKeeper Quorum is required

Sample configuration clientPort=2181 dataDir=/usr/share/zookeeper/data tickTime=2000 initLimit=10 syncLimit=5 server.1=192.168.1.1:2888:3888 server.2=192.168.1.2:2888:3888 server.3=192.168.1.3:2888:3888

ZooKeeper ZooKeeper ZooKeeper

Page 14: Scaling Solr with SolrCloud

Solr Instances

ZooKeeper ZooKeeper ZooKeeper

Solr Server Solr Server

-DzkHost=192.168.1.2:2181, 192.168.1.1:2181,192.168.1.3:2181

Solr Server Solr Server

-DzkHost=192.168.1.1:2181, 192.168.1.2:2181,192.168.1.3:2181

-DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181

-DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181

Page 15: Scaling Solr with SolrCloud

Collection Creation

ZooKeeper ZooKeeper ZooKeeper

Solr Server Solr Server

Solr Server Solr Server $ cloud-scripts/zkcli.sh –cmd upconfig -zkhost 192.168.1.2:2181 -confdir /usr/share/config/revolution/conf -conf revolution

$ curl 'http://solr1:8983/solr/admin/collections?action=CREATE&name=revolution&numShards=2&replicationFactor=1'

Page 16: Scaling Solr with SolrCloud

Solr Server

Single Collection Deployment

Solr Server

Solr Server Solr Server

Shard1

Application

Shard2

Page 17: Scaling Solr with SolrCloud

Collection with Replica

ZooKeeper ZooKeeper ZooKeeper

Solr Server Solr Server

Solr Server Solr Server $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE&name=revolution&numShards=2&replicationFactor=2'

Page 18: Scaling Solr with SolrCloud

Solr Server

Collection with Replicas

Solr Server

Solr Server Solr Server

Shard1 Replica

Shard2 Replica

Shard2 Shard1

Application

Page 19: Scaling Solr with SolrCloud

Solr Server

Querying

Solr Server

QUE

RY

Application

Id,score Id,score Shard1 Shard2

Solr Server

Page 20: Scaling Solr with SolrCloud

Solr Server

Querying

Solr Server

Application

doc doc

Results

Shard2 Shard1

Solr Server

Page 21: Scaling Solr with SolrCloud

Shard and Replica Number

How your data looks Expected data growth Target performance Target node number

Max number of nodes = number of shards * (number of replicas + 1)

Page 22: Scaling Solr with SolrCloud

What should I go for?

More data? Shard

Replica Replica

Shard Shard

Replica More queries ? Replica Replica Replica

Page 23: Scaling Solr with SolrCloud

Custom Routing

Default (numShards present, pre 4.5) Implicit (numShards not present, pre 4.5)

Page 24: Scaling Solr with SolrCloud

Solr Server Solr Server

id=userB!3 id=userA!2

Custom Routing Example

id=userA!1

Shard2 Shard1

Page 25: Scaling Solr with SolrCloud

Querying Solr – Default Routing

Shard 1 Shard 2 Shard 3 Shard 4

Shard 5 Shard 6 Shard 7 Shard 8

Solr Collection

Application

Page 26: Scaling Solr with SolrCloud

Shard 1 Shard 2 Shard 3 Shard 4

Shard 5 Shard 6 Shard 7 Shard 8

Solr Collection

Application

Quering Solr – Custom Routing

q=revolution&_route_=userA!

Page 27: Scaling Solr with SolrCloud

Collection Manipulation Commands Create Delete Reload Split Create Alias Delete Alias Shard Creation/Deletion

http://wiki.apache.org/solr/SolrCloud

Page 28: Scaling Solr with SolrCloud

Collection Creation

name numShards replicationFactor maxShardsPerNode createNodeSet collection.configName

Page 29: Scaling Solr with SolrCloud

Collection Split Example

$ curl 'http://solr1:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=1'

Page 30: Scaling Solr with SolrCloud

Collection Split Example

$ curl 'http://localhost:8983/solr/admin/collections? action=SPLITSHARD&collection=collection1&shard=shard1'

Page 31: Scaling Solr with SolrCloud

Collection Aliasing

$ curl 'http://solr1:8983/solr/admin/collections? action=CREATEALIAS&name=weekly&collections=20131107,20131108,20131109,20131110,20131111,20131112,20131113'

$ curl 'http://solr1:8983/solr/admin/collections? action=DELETEALIAS&name=weekly'

$ curl 'http://solr1:8983/solr/weekly/select?q=revolution'

Page 32: Scaling Solr with SolrCloud

Caches

Solr Cache

Refreshed with IndexSearcher Configurable Different purposes Different implementations

Page 33: Scaling Solr with SolrCloud

Filter Cache

q=*:*&fq={!cache=false}city:Dublin

q=*:*&fq={!frange l=0 u=10 cache=false cost=200}sum(price,pro)

q=lucene+revolution&fq=city:Dublin

<filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="128" />

q=lucene+revolution+city:Dublin

Page 34: Scaling Solr with SolrCloud

Document Cache

<documentCache class="solr.LRUCache" size="512" initialSize="512" />

Page 35: Scaling Solr with SolrCloud

Query Result Cache

q=lucene+revolution&fq=city:Dublin&sort=date+desc&start=0&rows=10

<queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="128"/>

q=lucene+revolution+city:Dublin&sort=date+desc&start=0&rows=10

<queryResultWindowSize>20</queryResultWindowSize>

<queryResultMaxDocsCached>200</queryResultMaxDocsCached>

Page 36: Scaling Solr with SolrCloud

Warming <listener event="newSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener>

<listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener>

<useColdSearcher>false</useColdSearcher>

Page 37: Scaling Solr with SolrCloud

The Right Directory

_0.fdt _0.fdx _0.fnm _0.nvd

_1.fdt _1.fdx _1.fnm _1.nvd

StandardDirectory SimpleFSDirectory NIOFSDirectory MMapDirectory NRTCachingDirectory RAMDirectory <directoryFactory name="DirectoryFactory"

class="solr.NRTCachingDirectoryFactory" />

Page 38: Scaling Solr with SolrCloud

Column oriented fields - DocValues

<field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true"/>

<field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true" docValuesFormat="Disk"/>

NRT compatible Better compression than field cache Can store data outside of JVM heap Can improve things for dynamic indices

Page 39: Scaling Solr with SolrCloud

Segment Merge

a b c d e

Level 0 Level 1

c f g

Page 40: Scaling Solr with SolrCloud

Segment Merge Under Control

Merge policy Merge scheduler Merge factor Merge policy configuration

Page 41: Scaling Solr with SolrCloud

Configuring Segment Merge

<mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicy>

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>

<mergeFactor>10</mergeFactor>

<mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/>

Page 42: Scaling Solr with SolrCloud

Indexing Throughput Tuning Maximum indexing threads RAM buffer size Maximum buffered documents Bulk, bulks and bulks CloudSolrServer Autocommit Cutting off unnecessary stuff

Page 43: Scaling Solr with SolrCloud

TransactionLog

<updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog>

Updates durability Recovering peer replay Performant Realtime Get

<requestHandler name="/get" class="solr.RealTimeGetHandler"> </requestHandler>

Page 44: Scaling Solr with SolrCloud

Autocommit or Not?

<autoCommit> <maxTime>15000</maxTime> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit>

<autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>

Automatic data flush Automatic index view refresh

Page 45: Scaling Solr with SolrCloud

Autocommit & openSearcher=true <autoCommit> <maxDocs>10</maxDocs> <openSearcher>true</openSearcher> </autoCommit>

Page 46: Scaling Solr with SolrCloud

AutoSoftCommit & openSearcher=false <autoCommit> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher>

</autoCommit>

<autoSoftCommit> <maxDocs>10</maxDocs> </autoSoftCommit>

Page 47: Scaling Solr with SolrCloud

Postings Formats to the Rescue

Lucene 4.0 >= Flexible Indexing Postings == docs, positions, payloads Different postings formats available

<codecFactory class="solr.SchemaCodecFactory" />

<field name="id" type="string_pulsing" indexed="true" stored="true" /> <fieldType name="string_pulsing" class="solr.StrField" postingsFormat="Pulsing41" />

Bloom Pulsing Simple text Direct Memory

Page 48: Scaling Solr with SolrCloud

Monitoring Cluster state Nodes utilization Memory usage Cache utilization Query response time Warmup times Garbage collector work

Page 49: Scaling Solr with SolrCloud

JMX and Solr

Page 50: Scaling Solr with SolrCloud

JMX and Solr

Page 51: Scaling Solr with SolrCloud

Administration Panel

Page 52: Scaling Solr with SolrCloud

Administration Panel

Page 53: Scaling Solr with SolrCloud

Monitoring with SPM

Page 54: Scaling Solr with SolrCloud

Monitoring with SPM

Page 55: Scaling Solr with SolrCloud

Other Monitoring Tools

Ganglia http://ganglia.sourceforge.net/

New Relic http://www.newrelic.com/

Opsview http://www.opsview.com

Page 56: Scaling Solr with SolrCloud

We Are Hiring !

Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! http://sematext.com/about/jobs.html