sunbirst

SunbirstA distributed worker model for Apache Solr

@sleepyfox for sourcesense

What’s in the box

• Context• Problem definition• One possible solution• Discussion• ...

Where we are now

Existing system

• Usual Solr production configuration:• High-volume search• Low volume indexing

Existing system

• Usual Solr production configuration:• High-volume search• Low volume indexing

• Our customer:• High volume indexing• Low volume search

Volumes

• 3m new docs indexed/day

Volumes

• 3m new docs indexed/day• 60 day archive

Volumes

• 3m new docs indexed/day• 60 day archive • = 180m docs indexed

Volumes

• 3m new docs indexed/day• 60 day archive • = 180m docs indexed• 10k searches/day

Volumes

• 3m new docs indexed/day• 60 day archive • = 180m docs indexed• 10k searches/day• = 1 search per few seconds-ish

Existing architecture

How it works

• 2 rows, each 20 shards + coordinator• Partitioning algorithm = (id % 20)• Each shard has:

• Solr instance• Indexer• Optimizer• Committer• Purger

How it works

• Documents retrieved by coordinator in blocks of 500

• These are allocated by id to shards according to the partitioning scheme

• Shards poll metabases for their content• Shards index content• Coordinator archives content

Challenges

• Coordinator responsible for 2 things:• Archiving content• Routing searches

• Redundant data flow from metabases• Partitioning scheme means (n-1/n)*100

percent of docs move on adding shard

One possible future

Distributed workflow

• Different worker pools:• Indexer• Searcher• Archiver• Coordinator• Content enricher...

Ingest Pipeline

Ingester ArchiverEnricher

DiskDisk

Ref. data

Indexer

Archive SolrIngest queue

Coordinator

• Orchestration, workflow and EI patterns by Apache ServiceMix

• Messaging by ApacheMQ• REST by Apache CXF• Runtime container by Apache Karaf• 100% Open Source Software

Call to arms

• Designed to be more generic than initial itch that needed scratching

• Have Solr/Lucene committers • Happy to accept outside contributors • May eventually become Apache incubator• Contact: Nigel Runnels-Moss

• @sleepyfox on Twitter• n.runnels-moss@sourcesense.com

Questions

sunbirst

new docs indexedday

percent of docs

apache solr

day archive

low volume search

apache karaf

coordinator inblocks

high volume

Technology

become a photographer

expert positioning using linkedin

leaving positive impression

campaign tracking and adwords integration

apple 03 - management mistakes

simple steps to ux/ui web design

automotive20

the value of user experience (from web 2.0 expo berlin 2008)

psychology of the winner

trendwatching.com's internet of caring things

zipcar millennials survey

the art of colors

social, digital & mobile in the americas

refresh okc

evangelizing yourself

user centered design overview

design principles: the philosophy of ux

best practice for ux deliverables - eventhandler, london, 05...

google analytics new feature

the new brand landscape 2