box + solr = content search for business
TRANSCRIPT
![Page 1: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/1.jpg)
1
June 2014
Box + Solr = Content Search for Business
![Page 3: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/3.jpg)
3
to make organizations more productive,
competitive and collaborative by connecting
people and their most important information
Box mission
![Page 4: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/4.jpg)
4
25MM+Users
225K+ Businesses
99%Fortune 500
![Page 5: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/5.jpg)
5
Box search mission is to make user content
easy to discover.
![Page 6: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/6.jpg)
6
10Billion+Documents
10TB+ Index size
100M+Daily requests
Box uses Solr for search
![Page 7: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/7.jpg)
7
Quick Search
![Page 8: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/8.jpg)
8
Quick Search
![Page 9: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/9.jpg)
9
Full Search
![Page 10: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/10.jpg)
10
Sharding – splitting the index
Agenda
Highly available search
A few more things
1
2
3
4
5 Q&A
Currently working on
![Page 11: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/11.jpg)
11
We shard things
![Page 12: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/12.jpg)
12
Shard ID = File ID % Total Shards
![Page 13: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/13.jpg)
13
Multi-tenant – One big logical index for all users
Solr index
Shard1 Shard2 Shard3 ShardN
![Page 14: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/14.jpg)
14
Search scope
![Page 15: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/15.jpg)
15
File ID: 12345
OwnerID: user1
Parent Folders IDs: folder1, folder2
File Name: Solr.ppt
File Content: blah......
A typical Solr Document
![Page 16: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/16.jpg)
16
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
![Page 17: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/17.jpg)
17
User1 with no share folder
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
![Page 18: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/18.jpg)
18
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
![Page 19: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/19.jpg)
19
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
![Page 20: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/20.jpg)
20
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder5
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
Removedout of Folder2
![Page 21: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/21.jpg)
21
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder5
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
Removedout of Folder2
![Page 22: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/22.jpg)
22
Highly Available Search
![Page 23: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/23.jpg)
23
• Index is highly available
• Search functionality is highly available
![Page 24: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/24.jpg)
24
Index workflow
![Page 25: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/25.jpg)
25
Box Front End
UploadIndex Queue
Queue 1
Queue 2
Queue 3
Indexer 1
Indexer 3
Indexer 2
MySQL
Index1
Index2
Index2
![Page 26: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/26.jpg)
26
Search workflow
![Page 27: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/27.jpg)
27
Box Front End
query HA Proxy
Head node
HA Proxy
1 2 3 N
Box Front End
queryHA
ProxyHead node
HA Proxy
1 2 3 N
Data center boundary
![Page 28: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/28.jpg)
28
A few more things
![Page 29: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/29.jpg)
29
File Content Search
![Page 30: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/30.jpg)
30
Box Front End
Upload
MySQL Box FileStorage
IndexerSolrIndex
Text ExtractionExtractedText
![Page 31: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/31.jpg)
31
Multi-language support
![Page 32: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/32.jpg)
32
Raw file content
Languagedetector
English tokenizer
Spanish tokenizer
Japanese tokenizer
German tokenizer
file_content_en
File_content_es{hola}
file_content_ja....
File_content_de
![Page 33: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/33.jpg)
33
To Dos
• Scale language support
• Support document with mixed languages
![Page 34: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/34.jpg)
34
Search Warm-up
![Page 35: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/35.jpg)
35
• Front end informs backend to warm up on keyboard focus
• Backend prepares the search filter and caches it in a search session
• Backend sends a warm-up query to Solr
![Page 36: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/36.jpg)
36
What we are working on
![Page 37: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/37.jpg)
37
• Search suggestions
• Search operators
• Use machine learning to influence ranking
• Logical sharding
Things we are working on
![Page 38: Box + Solr = Content Search for Business](https://reader034.vdocuments.us/reader034/viewer/2022051414/55a4e2de1a28abef648b4580/html5/thumbnails/38.jpg)
38
Question?