Transcript
Page 1: Scaling Solr with SolrCloud

Scaling Solr with SolrCloud

Rafał  Kuć  – Sematext Group, Inc. @kucrafal @sematext sematext.com

Page 2: Scaling Solr with SolrCloud

Ta  me…

Sematext consultant & engineer Solr.pl co-founder Father and husband

Page 3: Scaling Solr with SolrCloud

Solr History

Y. Seeley creates Solr

Incubator graduation

Solr 1.4 released

Solr 4.0 released

Solr 4.1 and counting

Lucene / Solr merge

Solr 1.3 released

Solr donated to ASF

Page 4: Scaling Solr with SolrCloud

The Past

Page 5: Scaling Solr with SolrCloud

Master – Slave Deployment

Application

Solr Master

Solr Slave Solr Slave Solr Slave Solr Slave

Page 6: Scaling Solr with SolrCloud

Master as SPOF

Application

Solr Slave Solr Slave Solr Slave Solr Slave

Solr Master

Page 7: Scaling Solr with SolrCloud

R

Replication Time

Indexing App Solr Slave

Solr Slave

Solr Master

Solr Slave

Querying App

Page 8: Scaling Solr with SolrCloud

Solr Slave Solr Slave

Solr Master

Too Much for a Single Shard

Application

Solr Master Solr Master

Solr Slave Solr Slave Solr Slave Solr Slave

Page 9: Scaling Solr with SolrCloud

Solr Slave Solr Slave

Solr Master

Too Much for a Single Shard

Application

Solr Master

Solr Slave Solr Slave Solr Slave Solr Slave

Solr Master

Page 10: Scaling Solr with SolrCloud

Doc Response Response

Shar

d1, s

hard

2,

shar

d3

Shar

d1, s

hard

2,

shar

d3

Querying in Multi Master Deployment

Solr Slave Shard 2

Solr Slave Shard 3

Solr Slave Shard 1

Application

Page 11: Scaling Solr with SolrCloud

SolrCloud Comes Into Play

Page 12: Scaling Solr with SolrCloud

Basic Glossary

https://cwiki.apache.org/confluence/display/solr/SolrCloud+Glossary

Cluster Node Collection Shard Leader & Replica Overseer

Page 13: Scaling Solr with SolrCloud

Apache ZooKeeper Quorum is required

Sample configuration clientPort=2181 dataDir=/usr/share/zookeeper/data tickTime=2000 initLimit=10 syncLimit=5 server.1=192.168.1.1:2888:3888 server.2=192.168.1.2:2888:3888 server.3=192.168.1.3:2888:3888

ZooKeeper ZooKeeper ZooKeeper

Page 14: Scaling Solr with SolrCloud

Solr Instances

ZooKeeper ZooKeeper ZooKeeper

Solr Server Solr Server

-DzkHost=192.168.1.2:2181, 192.168.1.1:2181,192.168.1.3:2181

Solr Server Solr Server

-DzkHost=192.168.1.1:2181, 192.168.1.2:2181,192.168.1.3:2181

-DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181

-DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181

Page 15: Scaling Solr with SolrCloud

Collection Creation

ZooKeeper ZooKeeper ZooKeeper

Solr Server Solr Server

Solr Server Solr Server $ cloud-scripts/zkcli.sh –cmd upconfig -zkhost 192.168.1.2:2181 -confdir /usr/share/config/revolution/conf -conf revolution

$ curl 'http://solr1:8983/solr/admin/collections?action=CREATE&name=revolution&numShards=2&replicationFactor=1'

Page 16: Scaling Solr with SolrCloud

Solr Server

Single Collection Deployment

Solr Server

Solr Server Solr Server

Shard1

Application

Shard2

Page 17: Scaling Solr with SolrCloud

Collection with Replica

ZooKeeper ZooKeeper ZooKeeper

Solr Server Solr Server

Solr Server Solr Server $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE&name=revolution&numShards=2&replicationFactor=2'

Page 18: Scaling Solr with SolrCloud

Solr Server

Collection with Replicas

Solr Server

Solr Server Solr Server

Shard1 Replica

Shard2 Replica

Shard2 Shard1

Application

Page 19: Scaling Solr with SolrCloud

Solr Server

Querying

Solr Server

QUE

RY

Application

Id,score Id,score Shard1 Shard2

Solr Server

Page 20: Scaling Solr with SolrCloud

Solr Server

Querying

Solr Server

Application

doc doc

Results

Shard2 Shard1

Solr Server

Page 21: Scaling Solr with SolrCloud

Shard and Replica Number

How your data looks Expected data growth Target performance Target node number

Max number of nodes = number of shards * (number of replicas + 1)

Page 22: Scaling Solr with SolrCloud

What should I go for?

More data? Shard

Replica Replica

Shard Shard

Replica More queries ? Replica Replica Replica

Page 23: Scaling Solr with SolrCloud

Custom Routing

Default (numShards present, pre 4.5) Implicit (numShards not present, pre 4.5)

Page 24: Scaling Solr with SolrCloud

Solr Server Solr Server

id=userB!3 id=userA!2

Custom Routing Example

id=userA!1

Shard2 Shard1

Page 25: Scaling Solr with SolrCloud

Querying Solr – Default Routing

Shard 1 Shard 2 Shard 3 Shard 4

Shard 5 Shard 6 Shard 7 Shard 8

Solr Collection

Application

Page 26: Scaling Solr with SolrCloud

Shard 1 Shard 2 Shard 3 Shard 4

Shard 5 Shard 6 Shard 7 Shard 8

Solr Collection

Application

Quering Solr – Custom Routing

q=revolution&_route_=userA!

Page 27: Scaling Solr with SolrCloud

Collection Manipulation Commands Create Delete Reload Split Create Alias Delete Alias Shard Creation/Deletion

http://wiki.apache.org/solr/SolrCloud

Page 28: Scaling Solr with SolrCloud

Collection Creation

name numShards replicationFactor maxShardsPerNode createNodeSet collection.configName

Page 29: Scaling Solr with SolrCloud

Collection Split Example

$ curl 'http://solr1:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=1'

Page 30: Scaling Solr with SolrCloud

Collection Split Example

$ curl 'http://localhost:8983/solr/admin/collections? action=SPLITSHARD&collection=collection1&shard=shard1'

Page 31: Scaling Solr with SolrCloud

Collection Aliasing

$ curl 'http://solr1:8983/solr/admin/collections? action=CREATEALIAS&name=weekly&collections=20131107,20131108,20131109,20131110,20131111,20131112,20131113'

$ curl 'http://solr1:8983/solr/admin/collections? action=DELETEALIAS&name=weekly'

$ curl 'http://solr1:8983/solr/weekly/select?q=revolution'

Page 32: Scaling Solr with SolrCloud

Caches

Solr Cache

Refreshed with IndexSearcher Configurable Different purposes Different implementations

Page 33: Scaling Solr with SolrCloud

Filter Cache

q=*:*&fq={!cache=false}city:Dublin

q=*:*&fq={!frange l=0 u=10 cache=false cost=200}sum(price,pro)

q=lucene+revolution&fq=city:Dublin

<filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="128" />

q=lucene+revolution+city:Dublin

Page 34: Scaling Solr with SolrCloud

Document Cache

<documentCache class="solr.LRUCache" size="512" initialSize="512" />

Page 35: Scaling Solr with SolrCloud

Query Result Cache

q=lucene+revolution&fq=city:Dublin&sort=date+desc&start=0&rows=10

<queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="128"/>

q=lucene+revolution+city:Dublin&sort=date+desc&start=0&rows=10

<queryResultWindowSize>20</queryResultWindowSize>

<queryResultMaxDocsCached>200</queryResultMaxDocsCached>

Page 36: Scaling Solr with SolrCloud

Warming <listener event="newSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener>

<listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener>

<useColdSearcher>false</useColdSearcher>

Page 37: Scaling Solr with SolrCloud

The Right Directory

_0.fdt _0.fdx _0.fnm _0.nvd

_1.fdt _1.fdx _1.fnm _1.nvd

StandardDirectory SimpleFSDirectory NIOFSDirectory MMapDirectory NRTCachingDirectory RAMDirectory <directoryFactory name="DirectoryFactory"

class="solr.NRTCachingDirectoryFactory" />

Page 38: Scaling Solr with SolrCloud

Column oriented fields - DocValues

<field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true"/>

<field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true" docValuesFormat="Disk"/>

NRT compatible Better compression than field cache Can store data outside of JVM heap Can improve things for dynamic indices

Page 39: Scaling Solr with SolrCloud

Segment Merge

a b c d e

Level 0 Level 1

c f g

Page 40: Scaling Solr with SolrCloud

Segment Merge Under Control

Merge policy Merge scheduler Merge factor Merge policy configuration

Page 41: Scaling Solr with SolrCloud

Configuring Segment Merge

<mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicy>

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>

<mergeFactor>10</mergeFactor>

<mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/>

Page 42: Scaling Solr with SolrCloud

Indexing Throughput Tuning Maximum indexing threads RAM buffer size Maximum buffered documents Bulk, bulks and bulks CloudSolrServer Autocommit Cutting off unnecessary stuff

Page 43: Scaling Solr with SolrCloud

TransactionLog

<updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog>

Updates durability Recovering peer replay Performant Realtime Get

<requestHandler name="/get" class="solr.RealTimeGetHandler"> </requestHandler>

Page 44: Scaling Solr with SolrCloud

Autocommit or Not?

<autoCommit> <maxTime>15000</maxTime> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit>

<autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>

Automatic data flush Automatic index view refresh

Page 45: Scaling Solr with SolrCloud

Autocommit & openSearcher=true <autoCommit> <maxDocs>10</maxDocs> <openSearcher>true</openSearcher> </autoCommit>

Page 46: Scaling Solr with SolrCloud

AutoSoftCommit & openSearcher=false <autoCommit> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher>

</autoCommit>

<autoSoftCommit> <maxDocs>10</maxDocs> </autoSoftCommit>

Page 47: Scaling Solr with SolrCloud

Postings Formats to the Rescue

Lucene 4.0 >= Flexible Indexing Postings == docs, positions, payloads Different postings formats available

<codecFactory class="solr.SchemaCodecFactory" />

<field name="id" type="string_pulsing" indexed="true" stored="true" /> <fieldType name="string_pulsing" class="solr.StrField" postingsFormat="Pulsing41" />

Bloom Pulsing Simple text Direct Memory

Page 48: Scaling Solr with SolrCloud

Monitoring Cluster state Nodes utilization Memory usage Cache utilization Query response time Warmup times Garbage collector work

Page 49: Scaling Solr with SolrCloud

JMX and Solr

Page 50: Scaling Solr with SolrCloud

JMX and Solr

Page 51: Scaling Solr with SolrCloud

Administration Panel

Page 52: Scaling Solr with SolrCloud

Administration Panel

Page 53: Scaling Solr with SolrCloud

Monitoring with SPM

Page 54: Scaling Solr with SolrCloud

Monitoring with SPM

Page 55: Scaling Solr with SolrCloud

Other Monitoring Tools

Ganglia http://ganglia.sourceforge.net/

New Relic http://www.newrelic.com/

Opsview http://www.opsview.com

Page 56: Scaling Solr with SolrCloud

We Are Hiring !

Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! http://sematext.com/about/jobs.html


Top Related