Download - Realtime Search at Twitter - Michael Busch
![Page 1: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/1.jpg)
![Page 2: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/2.jpg)
230 MillionTweets per day
![Page 3: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/3.jpg)
2 BillionQueries per day
![Page 4: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/4.jpg)
< 10 sIndexing latency
![Page 5: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/5.jpg)
50 msAvg. query response time
![Page 7: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/7.jpg)
Agenda
‣ Introduction
- Search Architecture
- Inverted Index 101
- Memory Model & Concurrency
- Top Tweets
Earlybird - Realtime Search @twitter
![Page 8: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/8.jpg)
Introduction
![Page 9: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/9.jpg)
Introduction
• Twitter acquired Summize in 2008
• 1st gen search engine based on MySQL
![Page 10: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/10.jpg)
Introduction
• Next gen search engine based on Lucene
• Improves scalability and performance by orders or magnitude
• Open Source
![Page 11: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/11.jpg)
Realtime Search @twitter
Agenda
- Introduction
‣ Search Architecture
- Inverted Index 101
- Memory Model & Concurrency
- Top Tweets
![Page 12: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/12.jpg)
Search Architecture
![Page 13: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/13.jpg)
Search Architecture
• Ingester pre-processes Tweets for search
• Geo-coding, URL expansion, tokenization, etc.
IngesterTweets
![Page 14: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/14.jpg)
Search Architecture
• Tweets are serialized to MySQL in Thrift format
ThriftMySQLMaster MySQL
Slaves
IngesterTweets
![Page 15: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/15.jpg)
Earlybird
• Earlybird reads from MySQL slaves
• Builds an in-memory inverted index in real time
ThriftMySQLMaster MySQL
Slaves
IngesterTweets
EarlybirdIndex
![Page 16: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/16.jpg)
Blender
EarlybirdIndex
BlenderThrift
Thrift
• Blender is our Thrift service aggregator
• Queries multiple Earlybirds, merges results
![Page 17: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/17.jpg)
Realtime Search @twitter
Agenda
- Introduction
- Search Architecture
‣ Inverted Index 101
- Memory Model & Concurrency
- Top Tweets
![Page 18: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/18.jpg)
Inverted Index 101
![Page 19: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/19.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
Table with 6 documents
Example from:Justin Zobel , Alistair Moffat, Inverted files for text search engines, ACM Computing Surveys (CSUR)v.38 n.2, p.6-es, 2006
![Page 20: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/20.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freqand 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
![Page 21: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/21.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freqand 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Query: keeper
![Page 22: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/22.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freqand 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Query: keeper
![Page 23: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/23.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
![Page 24: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/24.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
![Page 25: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/25.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
00000101VInt compression:
Values 0 <= delta <= 127 need one byte
![Page 26: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/26.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
11000110VInt compression:
Values 128 <= delta <= 16384 need two bytes
00011001
![Page 27: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/27.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
11000110VInt compression:
First bit indicates whether next byte belongs to the same value
00011001
![Page 28: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/28.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
11000110VInt compression: 00011001
• Variable number of bytes - a VInt-encoded posting can not be written as a primitive Java type; therefore it can not be written atomically
![Page 29: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/29.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
Read direction
• Each posting depends on previous one; decoding only possible in old-to-new direction
• With recency ranking (new-to-old) no early termination is possible
![Page 30: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/30.jpg)
Posting list encoding
• By default Lucene uses a combination of delta encoding and VInt compression
• VInts are expensive to decode
• Problem 1: How to traverse posting lists backwards?
• Problem 2: How to write a posting atomically?
![Page 31: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/31.jpg)
Posting list encoding in Earlybird
int (32 bits)
docID24 bits
max. 16.7M
textPosition8 bits
max. 255
• Tweet text can only have 140 chars
• Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory)
![Page 32: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/32.jpg)
Posting list encoding in Earlybird
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Earlybird encoding:
Read direction
5 15 9000 9002 100000 100090
![Page 33: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/33.jpg)
Early query termination
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Earlybird encoding:
Read direction
5 15 9000 9002 100000 100090
E.g. 3 result are requested: Here we can terminate after reading 3
postings
![Page 34: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/34.jpg)
Posting list encoding - Summary
• ints can be written atomically in Java
• Backwards traversal easy on absolute docIDs (not deltas)
• Every posting is a possible entry point for a searcher
• Skipping can be done without additional data structures as binary search, even though there are better approaches which should be explored
• On tweet indexes we need about 30% more storage for docIDs compared to delta+Vints; compensated by compression of complete segments
• Max. segment size: 2^24 = 16.7M tweets
![Page 35: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/35.jpg)
Realtime Search @twitter
Agenda
- Introduction
- Search Architecture
- Inverted Index 101
‣ Memory Model & Concurrency
- Top Tweets
![Page 36: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/36.jpg)
Memory Model & Concurrency
![Page 37: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/37.jpg)
Inverted index components
Dictionary
Posting list storage
?
![Page 38: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/38.jpg)
Inverted index components
Dictionary
Posting list storage
?
![Page 39: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/39.jpg)
Inverted Index
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freqand 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Per term we store different kinds of metadata: text pointer, frequency, postings pointer, etc.
![Page 40: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/40.jpg)
Term dictionary
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
0123456
term text pool
![Page 41: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/41.jpg)
0
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
t0 p0 f00123456
c a tt0
Term dictionary
![Page 42: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/42.jpg)
1
0
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
t0
t1
p0
p1
f0
f1
0123456
c a t f o ot0 t1
Term dictionary
![Page 43: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/43.jpg)
1
0
2
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
t0
t1
t2
p0
p1
p2
f0
f1
f2
0123456
c a t f o o b a rt0 t1 t2
Term dictionary
![Page 44: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/44.jpg)
Term dictionary
• Number of objects << number of terms
• O(1) lookups
• Easy to store more term metadata by adding additional parallel arrays
![Page 45: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/45.jpg)
Inverted index components
Parallel arraysDictionary
pointer to the most recently indexed posting for a term
Posting list storage
?
![Page 46: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/46.jpg)
Inverted index components
Parallel arraysDictionary
pointer to the most recently indexed posting for a term
Posting list storage
?
![Page 47: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/47.jpg)
• Store many single-linked lists of different lengths space-efficiently
• The number of java objects should be independent of the number of lists or number of items in the lists
• Every item should be a possible entry point into the lists for iterators, i.e. items should not be dependent on other items (e.g. no delta encoding)
• Append and read possible by multiple threads in a lock-free fashion (single append thread, multiple reader threads)
• Traversal in backwards order
Posting lists storage - Objectives
![Page 48: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/48.jpg)
Memory management
= 32K int[]
4 int[]pools
![Page 49: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/49.jpg)
Memory management
= 32K int[]
4 int[]pools
Each pool can be grown
individually by adding 32K
blocks
![Page 50: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/50.jpg)
Memory management
• For simplicity we can forget about the blocks for now and think of the pools as continuous, unbounded int[] arrays
• Small total number of Java objects (each 32K block is one object)
4 int[]pools
![Page 51: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/51.jpg)
Memory management
• Slices can be allocated in each pool
• Each pool has a different, but fixed slice size
21
24
27
211
slice size
![Page 52: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/52.jpg)
Adding and appending to a list
21
24
27
211
slice size
available
allocated
current list
![Page 53: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/53.jpg)
Adding and appending to a list
21
24
27
211
slice size
Store first twopostings in this slice
available
allocated
current list
![Page 54: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/54.jpg)
Adding and appending to a list
21
24
27
211
slice size
When first slice is full, allocate another one in second pool
available
allocated
current list
![Page 55: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/55.jpg)
Adding and appending to a list
21
24
27
211
slice size
available
allocated
current list
Allocate a slice on each level as list grows
![Page 56: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/56.jpg)
Adding and appending to a list
21
24
27
211
slice size
available
allocated
current list
On upper most level one list can own multiple slices
![Page 57: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/57.jpg)
Posting list format
int (32 bits)
docID24 bits
max. 16.7M
textPosition8 bits
max. 255
• Tweet text can only have 140 chars
• Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory)
![Page 58: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/58.jpg)
Addressing items
• Use 32 bit (int) pointers to address any item in any list unambiguously:
int (32 bits)
poolIndex2 bits0-3
offset in slice1-11 bits
depends on pool
sliceIndex19-29 bits
depends on pool
• Nice symmetry: Postings and address pointers both fit into a 32 bit int
![Page 59: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/59.jpg)
Linking the slices
21
24
27
211
slice size
available
allocated
current list
![Page 60: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/60.jpg)
Linking the slices
21
24
27
211
slice size
available
allocated
current list
Parallel arraysDictionary
pointer to the last posting indexed for a term
![Page 61: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/61.jpg)
Concurrency - Definitions
• Pessimistic locking
• A thread holds an exclusive lock on a resource, while an action is performed [mutual exclusion]
• Usually used when conflicts are expected to be likely
• Optimistic locking
• Operations are tried to be performed atomically without holding a lock; conflicts can be detected; retry logic is often used in case of conflicts
• Usually used when conflicts are expected to be the exception
![Page 62: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/62.jpg)
Concurrency - Definitions
• Non-blocking algorithm
Ensures, that threads competing for shared resources do not have their execution indefinitely postponed by mutual exclusion.
• Lock-free algorithm
A non-blocking algorithm is lock-free if there is guaranteed system-wide progress.
• Wait-free algorithm
A non-blocking algorithm is wait-free, if there is guaranteed per-thread progress.
* Source: Wikipedia
![Page 63: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/63.jpg)
Concurrency
• Having a single writer thread simplifies our problem: no locks have to be used to protect data structures from corruption (only one thread modifies data)
• But: we have to make sure that all readers always see a consistent state of all data structures -> this is much harder than it sounds!
• In Java, it is not guaranteed that one thread will see changes that another thread makes in program execution order, unless the same memory barrier is crossed by both threads -> safe publication
• Safe publication can be achieved in different, subtle ways. Read the great book “Java concurrency in practice” by Brian Goetz for more information!
![Page 64: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/64.jpg)
Java Memory Model
• Program order rule
Each action in a thread happens-before every action in that thread that comes later in the program order.
• Volatile variable rule
A write to a volatile field happens-before every subsequent read of that same field.
• Transitivity
If A happens-before B, and B happens-before C, then A happens-before C.
* Source: Brian Goetz: Java Concurrency in Practice
![Page 65: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/65.jpg)
Concurrency
0RAM
int x;
Cache
Thread 1 Thread 2
time
![Page 66: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/66.jpg)
Concurrency
0RAM
int x;
Cache 5
Thread 1 Thread 2
x = 5;
Thread A writes x=5 to cache
time
![Page 67: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/67.jpg)
Concurrency
0RAM
int x;
Cache 5
Thread 1 Thread 2
x = 5;
while(x != 5);time
This condition will likely never become false!
![Page 68: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/68.jpg)
Concurrency
0RAM
int x;
Cache
Thread 1 Thread 2
time
![Page 69: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/69.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
Thread A writes b=1 to RAM, because b is volatile
b = 1;
![Page 70: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/70.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
Read volatile b
b = 1;
int dummy = b;
while(x != 5);
![Page 71: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/71.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
• Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.
happens-before
![Page 72: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/72.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
happens-before
• Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field.
![Page 73: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/73.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
happens-before
• Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
![Page 74: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/74.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
This condition will be false, i.e. x==5
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
![Page 75: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/75.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
Memory barrier
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
![Page 76: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/76.jpg)
Demo
![Page 77: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/77.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
Memory barrier
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
![Page 78: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/78.jpg)
Concurrency
IndexWriter IndexReader
time
write 100 docs
maxDoc = 100
in IR.open(): read maxDoc
search upto maxDoc
maxDoc is volatile
write more docs
![Page 79: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/79.jpg)
Concurrency
IndexWriter IndexReader
time
write 100 docs
maxDoc = 100
in IR.open(): read maxDoc
search upto maxDoc
maxDoc is volatile
write more docs
happens-before
• Only maxDoc is volatile. All other fields that IW writes to and IR reads from don’t need to be!
![Page 80: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/80.jpg)
• Not a single exclusive lock
• Writer thread can always make progress
• Optimistic locking (retry-logic) in a few places for searcher thread
• Retry logic very simple and guaranteed to always make progress
Wait-free
![Page 81: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/81.jpg)
Realtime Search @twitter
Agenda
- Introduction
- Search Architecture
- Inverted Index 101
- Memory Model & Concurrency
‣ Top Tweets
![Page 82: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/82.jpg)
Top Tweets
![Page 83: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/83.jpg)
Signals
• Query-dependent
• E.g. Lucene text score, language
• Query-independent
• Static signals (e.g. text quality)
• Dynamic signals (e.g. retweets)
• Timeliness
![Page 84: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/84.jpg)
Signals
Many Earlybird segments (8M documents each)
Task: Find best tweet in billions of tweets efficiently
![Page 85: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/85.jpg)
Signals
Many Earlybird segments (8M documents each)
• For top tweets we can’t early terminate as efficiently
• Scoring and ranking billions of tweets is impractical
![Page 86: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/86.jpg)
Query cache
Many Earlybird segments (8M documents each)
Idea: Mark all tweets with high query-independent scores and only visit those for top tweets queries
Query results cache
High static score doc
![Page 87: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/87.jpg)
Query cache
Many Earlybird segments (8M documents each)
• A background thread periodically wakes up, executes queries, and stores the results in the per-segment cache
• Rewriting queries
• User query: q = ‘lucene’
• Rewritten: q’ = ‘lucene AND cached_filter:toptweets’
• Efficient skipping over tweets with low query-independent scores
This clause will be executed as a Lucene ConstantScoreQuery that wraps a BitSet or SortedVIntList
![Page 88: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/88.jpg)
Query cache
Many Earlybird segments (8M documents each)
• A background thread periodically wakes up, executes queries, and stores the results in the per-segment cache
• Configurable per cached query:
• Result set type: BitSet, SortedVIntList
• Execution schedule
• Filter mode: cache-only or hybrid
![Page 89: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/89.jpg)
Query cache
Many Earlybird segments (8M documents each)
• Result set type: BitSet, SortedVIntList
• BitSet for results with many hits
• SortedVIntList for very sparse results
![Page 90: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/90.jpg)
Query cache
Many Earlybird segments (8M documents each)
• Execution schedule
• Per segment: Sleep time between refreshing the cached results
![Page 91: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/91.jpg)
Query cache
Many Earlybird segments (8M documents each)
• Filter mode: cache-only or hybrid
![Page 92: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/92.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
![Page 93: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/93.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Partially filled segment (IndexWriter is currently
appending)
![Page 94: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/94.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Query cache can only be computed until current
maxDocID
![Page 95: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/95.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Some time later: More docs in segment, but query cache was
not updated yet
![Page 96: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/96.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Cache-only mode: Ignore documents added to segment
since cache was updatedSearch range
![Page 97: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/97.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Hybrid mode: Fall back to original query and execute it on
latest documentsSearch range
![Page 98: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/98.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Use query cache up to the cache’s maxDocID
Search range
![Page 99: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/99.jpg)
Query result cache
Many Earlybird segments (8M documents each)
query cache config yml file
![Page 100: Realtime Search at Twitter - Michael Busch](https://reader034.vdocuments.us/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/100.jpg)
Questions?