scaling search with solr cloud
DESCRIPTION
Enterprise search can grow big, really big! And growing. Tens, yes hundreds of servers may be involved, locally or in the cloud. Managing this has been complex and time consuming - until now :) SolrCloud to the rescue Using the world's most popular Open Source search engine, Apache Solr™, we will show you how the new upcoming version 4.0 makes scaling search in the cloud really simple and robust. A new feature called SolrCloud adds centralized configuration, distributed indexing & searching, automatic failover, recovery and leader election. Scaling is now as simple as adding a new server to your cluster and it will find its role where it is most needed and start serving searches.TRANSCRIPT
Scaling search withSolrCloud
Jan HøydahlCominvent AS
1
1995: Developer telecom1998: Java developer2000: Search - FAST2006: Lucene2007: new Cominvent()2009: Lucene/Solr2011: Lucene committer2012: Lucene PMC
> 100 projects
Jan Høydahl
2
3
Business critical searchDomain knowledge & best practices:
Consulting Training Support
About Cominvent4
5
7DLORUHG�WUDLQLQJ���FRQVXOWLQJ
�
,QWURGXFWLRQ�WR�6ROU��QRQ�WHFK�
,QWURGXFWLRQ�WR�6ROU��WHFK�
6ROU�'HYHORSHU
6FDOLQJ�DQG�WXQLQJ
'HYHORSLQJ�6ROU�3OXJLQV
Calendar from www.calendar-of-2012.com
SEPTEMBER 2012MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
SolrTraining.com
Next course in Oslo:
http://www.meetup.com/Oslo-Solr-Community
6
CommunityZone talk:«Solr 101»
Thursday 14:20
http://www.meetup.com/Oslo-Solr-Community
7
next MeetUp:
8
•Sinsheim, Germany•November 5-8
•Lucene/Solr track•www.apachecon.eu
ApacheCon Europe 2012
Agenda
• Intro to Solr•Scaling search - before• Introduction to SolrCloud•Demo with Wikipedia data•Plans for Solr going forward•Q&A
9
Intro to Solr
10
Apache Solr
11
Search Server
Completely HTTP based
12
13
Areas of use
14
Boosting by function
16
Boosting on review popularity and sales numbers:
log(sum(popularity,numsold))
Auto suggest & phonetic normalization17
Who use Apache Lucene/Solr™ ?
19
..and many more:http://wiki.apache.org/solr/PublicServers
Versions
•Current stable = 3.6.1•Latest release = 4.0-beta•Next release = 4.0-FINAL --- «soon» :-)
20
01/2007
v1.1
09/2008
v1.3
11/2009
v1.4
03/2011
v3.1
04/2012
v3.6
06/2012
v4.0a
08/2012
v4.0ß
??/2012
v4.0
06/2011
v3.3
07/2012
v3.6.1
Scaling search
21
Why scale?
22
•One single Solr server handles...–millions of documents (per shard)–hundreds of queries per second (per replica)
•We need to scale if...–data volume increases–query volume increases–we need high availability / fault tolerance
Scaling search - before
23
Solr shard 1
- config, schema- synonyms
Scaling search - before
23
Solr shard 1
- config, schema- synonyms
Solr shard 2
- config, schema- synonyms
- Add shard node- Manually copy config- Manually index to right shard- Manually shards query parameter
Scaling search - before
23
Solr shard 1
- config, schema- synonyms
Solr 1 replica
- config, schema- synonyms
Solr shard 2
- config, schema- synonyms
Solr 2 replica
- config, schema- synonyms - Add replica node
- Copy config- Setup poll based replication- No indexing failover- Monitor every node
- Add shard node- Manually copy config- Manually index to right shard- Manually shards query parameter
Solr Cloud
24
What is SolrCloud?
•New in Solr 4.0•Easier scaling•Centralized config•Fault tolerant indexing and querying•Using Apache ZooKeeper as «registry»
25
ZooKeeper: «Because coordinating distributed systems is a Zoo»
What is SolrCloud
26
What is SolrCloud
26
What is SolrCloud
26
What is SolrCloud
26
Logical collection
What is SolrCloud
26
Logical collection
Transaction log
Soft commit
Scaling search - with SolrCloud
27
Solr master 1
ZK awareApache
ZooKeeper
Scaling search - with SolrCloud
27
Solr master 1
ZK aware
Solr master 2
ZK awareApache
ZooKeeper
- Add shard node, point it to ZK- It assumes the role of shard 2- Automatic document distribution- Automatic querying across cluster- Centralized config & monitoring
Scaling search - with SolrCloud
27
Solr master 1
ZK aware
Solr replica 1
ZK aware
Solr master 2
ZK aware
Solr replica 2
ZK aware
Apache ZooKeeper
- Add shard node, point it to ZK- It assumes the role of shard 2- Automatic document distribution- Automatic querying across cluster- Centralized config & monitoring
- Add replica node(s)- Auto role assignment- Push based replication- Indexing failover- Leader election through ZK
Scaling search - with SolrCloud
27
Solr master 1
ZK aware
Solr replica 1
ZK aware
Solr master 2
ZK aware
Solr replica 2
ZK aware
Apache ZooKeeper
Scaling search - with SolrCloud
27
Solr master 1
ZK aware
Solr replica 1
ZK aware
Solr master 2
ZK aware
Solr replica 2
ZK aware
Apache ZooKeeper
Solr master 2
ZK aware
Scaling search - with SolrCloud
27
Solr master 1
ZK aware
Solr replica 1
ZK aware
Solr master 2
ZK aware
Solr replica 2
ZK aware
Apache ZooKeeper
Solr master 2
ZK aware
Solr replica 2
ZK aware
Configuration
28
Solr master 1
ZK aware
Solr replica 1
ZK aware
Solr master 2
ZK aware
Solr replica 2
ZK aware
ZK
Configuration
28
Solr master 1
ZK aware
Solr replica 1
ZK aware
Solr master 2
ZK aware
Solr replica 2
ZK aware
ZK
-DzkRun-Dcollection.configName=jz-DnumShards=2-Dbootstrap_confdir=./solr/coll/conf
Configuration
28
Solr master 1
ZK aware
Solr replica 1
ZK aware
Solr master 2
ZK aware
Solr replica 2
ZK aware
ZK
-DzkHost=localhost:xxxx
-DzkRun-Dcollection.configName=jz-DnumShards=2-Dbootstrap_confdir=./solr/coll/conf
-DzkHost=localhost:xxxx-DzkHost=localhost:xxxx
Demoindexing & querying
29
Solr 4.0 and beyond
30
•Other news in v4.0 FINAL (expected later this autumn)–NRT–Real-time GET–Smaller index & memory footprint–New «modern» Admin GUI–Incremental updates–Pseudo-join
•Future plans–More shard distribution mechanisms–Re-balancing cluster (split shards)–...
Recap
•Apache Solr open source enterprise search•Scaling Solr was hard•Solr 4.0 with SolrCloud makes it easy :)
–Centralized config–Effortless scaling of cluster–Fault tolerant indexing & querying
•Download the 4.0-beta today, 4.0-FINAL soon
31
32
Remember
Calendar from www.calendar-of-2012.com
SEPTEMBER 2012MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
Next Solr course in Oslo:
CommunityZone talk:«Solr 101»
Thursday 14:20
www.solrkurs.no
33
Jan HøydahlCominvent AS@cominvent
?