Download - Sunbirst
![Page 1: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/1.jpg)
SunbirstA distributed worker model for Apache Solr
@sleepyfox for sourcesense
![Page 2: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/2.jpg)
@sleepyfox for sourcesense
What’s in the box
• Context• Problem definition• One possible solution• Discussion• ...
![Page 3: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/3.jpg)
Where we are now
![Page 4: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/4.jpg)
@sleepyfox for sourcesense
Existing system
![Page 5: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/5.jpg)
@sleepyfox for sourcesense
Existing system
• Usual Solr production configuration:• High-volume search• Low volume indexing
![Page 6: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/6.jpg)
@sleepyfox for sourcesense
Existing system
• Usual Solr production configuration:• High-volume search• Low volume indexing
• Our customer:• High volume indexing• Low volume search
![Page 7: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/7.jpg)
@sleepyfox for sourcesense
Volumes
![Page 8: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/8.jpg)
@sleepyfox for sourcesense
Volumes
• 3m new docs indexed/day
![Page 9: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/9.jpg)
@sleepyfox for sourcesense
Volumes
• 3m new docs indexed/day• 60 day archive
![Page 10: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/10.jpg)
@sleepyfox for sourcesense
Volumes
• 3m new docs indexed/day• 60 day archive • = 180m docs indexed
![Page 11: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/11.jpg)
@sleepyfox for sourcesense
Volumes
• 3m new docs indexed/day• 60 day archive • = 180m docs indexed• 10k searches/day
![Page 12: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/12.jpg)
@sleepyfox for sourcesense
Volumes
• 3m new docs indexed/day• 60 day archive • = 180m docs indexed• 10k searches/day• = 1 search per few seconds-ish
![Page 13: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/13.jpg)
@sleepyfox for sourcesense
Existing architecture
![Page 14: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/14.jpg)
@sleepyfox for sourcesense
How it works
• 2 rows, each 20 shards + coordinator• Partitioning algorithm = (id % 20)• Each shard has:
• Solr instance• Indexer• Optimizer• Committer• Purger
![Page 15: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/15.jpg)
@sleepyfox for sourcesense
How it works
• Documents retrieved by coordinator in blocks of 500
• These are allocated by id to shards according to the partitioning scheme
• Shards poll metabases for their content• Shards index content• Coordinator archives content
![Page 16: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/16.jpg)
@sleepyfox for sourcesense
Challenges
• Coordinator responsible for 2 things:• Archiving content• Routing searches
• Redundant data flow from metabases• Partitioning scheme means (n-1/n)*100
percent of docs move on adding shard
![Page 17: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/17.jpg)
One possible future
![Page 18: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/18.jpg)
@sleepyfox for sourcesense
Distributed workflow
• Different worker pools:• Indexer• Searcher• Archiver• Coordinator• Content enricher...
![Page 19: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/19.jpg)
@sleepyfox for sourcesense
Ingest Pipeline
Ingester ArchiverEnricher
DiskDisk
Ref. data
Indexer
Archive SolrIngest queue
Coordinator
![Page 20: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/20.jpg)
@sleepyfox for sourcesense
ESB
• Orchestration, workflow and EI patterns by Apache ServiceMix
• Messaging by ApacheMQ• REST by Apache CXF• Runtime container by Apache Karaf• 100% Open Source Software
![Page 21: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/21.jpg)
@sleepyfox for sourcesense
Call to arms
• Designed to be more generic than initial itch that needed scratching
• Have Solr/Lucene committers • Happy to accept outside contributors • May eventually become Apache incubator• Contact: Nigel Runnels-Moss
• @sleepyfox on Twitter• [email protected]
![Page 22: Sunbirst](https://reader033.vdocuments.us/reader033/viewer/2022060110/5560eebbd8b42a3d768b563d/html5/thumbnails/22.jpg)
Questions