AARHUSUNIVERSITY
PhD Defense
Jesper Sindahl Nielsen
Implicit Data Structures,Sorting, and Text Indexing
Overview
1/24
Jesper Sindahl Nielsen
Overview
1/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Overview
1/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Overview
1/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Overview
1/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Overview
1/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Overview
1/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithm
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithminput
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithminput output
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithminput output
x + yAddition
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithminput output
x=5y=7 x + y
Addition
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithminput output
12x=5y=7 x + y
Addition
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithminput output
12x=5y=7 x + y
Addition
Sort L
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithminput output
12x=5y=7 x + y
Addition
L=5,1,3,2,4,6 Sort L
Algorithms
2/24
Jesper Sindahl Nielsen
Algorithminput output
12x=5y=7 x + y
Addition
L=1,2,3,4,5,6L=5,1,3,2,4,6 Sort L
We want to design fast algorithms
Data Structures
3/24
Jesper Sindahl Nielsen
Data Structures
3/24
Jesper Sindahl Nielsen
Data
Data Structures
3/24
Jesper Sindahl Nielsen
DataData
Structure
Data Structures
3/24
Jesper Sindahl Nielsen
DataData
Structure
Data organized insome clever manner
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
DataStructure
input
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
DataStructure
Queryinput
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
DataStructure
Queryinput
answer to query on data
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
DataStructure
Queryinput
answer to query on data
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123
Arnold
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
DataStructure
Queryinput
answer to query on data
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123
find phone number
Arnold
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
DataStructure
Queryinput
answer to query on data
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123
find phone number
Arnold 123456
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
DataStructure
Queryinput
answer to query on data
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123
find phone number
Mom 375874
Arnold 123456
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
DataStructure
Queryinput
answer to query on data
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123
find phone number
Mom 375874
Arnold 123456
efficient when data is organized
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data
StructureUpdate
inputUpd. DataStructure
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data
StructureUpdate
inputUpd. DataStructure
Dad,357951
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data
StructureUpdate
inputUpd. DataStructure
Dad,357951
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data
StructureUpdate
inputUpd. DataStructure
Dad,357951
Advisor,359617Arnold,123456Brother,681958Dad,357951Jean,456321Mom,375874Sylvester,987654Wesley,789123
Change phone number
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data
StructureUpdate
inputUpd. DataStructure
Dad,357951
Advisor,359617Arnold,123456Brother,681958Dad,357951Jean,456321Mom,375874Sylvester,987654Wesley,789123
Change phone number
Data Structures
3/24
Jesper Sindahl Nielsen
Data BuildData
Structure
Data organized insome clever manner
Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123
create sorted phonebook
Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data
StructureUpdate
inputUpd. DataStructure
Dad,357951
Advisor,359617Arnold,123456Brother,681958Dad,357951Jean,456321Mom,375874Sylvester,987654Wesley,789123
Change phone number
We want space efficient structures,fast queries, and fast updates
Outline
4/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Outline
4/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
Data Structure has n comparable elements, laid out in an array of size n.
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
0 1 2
Data Structure has n comparable elements, laid out in an array of size n.
n− 1
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
0 1 2
Data Structure has n comparable elements, laid out in an array of size n.
n− 1
Only the array and n is maintained between operations.
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
0 1 2
Data Structure has n comparable elements, laid out in an array of size n.
n− 1
Only the array and n is maintained between operations.
Strict implicit model: Nothing but elements and n stored
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
0 1 2
Data Structure has n comparable elements, laid out in an array of size n.
n− 1
Only the array and n is maintained between operations.
Strict implicit model: Nothing but elements and n stored
Weak implicit model: Allow a constant number of extra memory cells
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
0 1 2
Data Structure has n comparable elements, laid out in an array of size n.
n− 1
Only the array and n is maintained between operations.
Strict implicit model: Nothing but elements and n stored
Weak implicit model: Allow a constant number of extra memory cells
We work with the strict implicit model
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
0 1 2
Data Structure has n comparable elements, laid out in an array of size n.
n− 1
Only the array and n is maintained between operations.
Strict implicit model: Nothing but elements and n stored
Weak implicit model: Allow a constant number of extra memory cells
We work with the strict implicit model
Fundamental trick: pair encoding bits.
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
0 1 2
Data Structure has n comparable elements, laid out in an array of size n.
n− 1
Only the array and n is maintained between operations.
Strict implicit model: Nothing but elements and n stored
Weak implicit model: Allow a constant number of extra memory cells
We work with the strict implicit model
Fundamental trick: pair encoding bits. x yx<y ⇒1
x>y ⇒0
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
0 1 2
Data Structure has n comparable elements, laid out in an array of size n.
n− 1
Only the array and n is maintained between operations.
Strict implicit model: Nothing but elements and n stored
Weak implicit model: Allow a constant number of extra memory cells
We work with the strict implicit model
Fundamental trick: pair encoding bits. x yx<y ⇒1
x>y ⇒0Requires distinct elements.
Implicit Data Structures
5/24
Jesper Sindahl Nielsen
Primary goal: space efficiency
0 1 2
Data Structure has n comparable elements, laid out in an array of size n.
n− 1
Only the array and n is maintained between operations.
Strict implicit model: Nothing but elements and n stored
Weak implicit model: Allow a constant number of extra memory cells
We work with the strict implicit model
Fundamental trick: pair encoding bits. x yx<y ⇒1
x>y ⇒0Requires distinct elements.
Procedures can use a constant number of working memory/registers
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Maintain a (dynamic) set of n keys, while supporting:
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Maintain a (dynamic) set of n keys, while supporting:
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Maintain a (dynamic) set of n keys, while supporting:
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Maintain a (dynamic) set of n keys, while supporting:
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Maintain a (dynamic) set of n keys, while supporting:
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Time: O(log n)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Only in dynamic structures
Time: O(log n)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Only in dynamic structures
Time: O(log n)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Special element: the finger element
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Only in dynamic structures
Time: O(log n)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Special element: the finger element
a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Only in dynamic structures
Time: O(log n)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Special element: the finger element
a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10
n=11
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Only in dynamic structures
Time: O(log n)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Special element: the finger element
a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10
finger
n=11
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Only in dynamic structures
Time: O(log n)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Special element: the finger element
a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10
finger
n=11
Find(m)
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Only in dynamic structures
Time: O(log n)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Special element: the finger element
a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10
finger
n=11
Find(m)
Distance: t = 3
Dictionary and Fingersearch
6/24
Jesper Sindahl Nielsen
Insert(k)
Delete(k)
Find(k)
Predecessor(k)
Only in dynamic structures
Time: O(log n)
Successor(k)
Maintain a (dynamic) set of n keys, while supporting:
Special element: the finger element
a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10
finger
n=11
Find(m)
Distance: t = 3
Time: O(log t) forFind, Predecessor, and Successor
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
220
221
222
22`
· · · · · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
220
221
222
D0
22`
· · · · · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
220
221
222
D0 D1
22`
· · · · · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
22`
· · · · · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
22`
· · ·
Elements in sorted order
· · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · · · · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · ·
· · · · · ·
· · ·
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · ·
· · · · · ·
· · ·
Operation Time
Insert
Delete
Find O(log t)
Change-Finger
O(logn)
O(logn)
O(nε)
Implicit Finger Search Idea
7/24
Jesper Sindahl Nielsen
· · ·
· · ·
220
221
222
D0 D1 D2 D`
f
22`
· · ·
Elements in sorted order
· · ·
· · · · · ·
· · ·
Operation Time
Insert
Delete
Find O(log t)
Change-Finger
O(logn)
O(logn)
O(nε)
optimal trade-off
Priority Queues
8/24
Jesper Sindahl Nielsen
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Insert ExtractMin Moves StrictIdenticalElements
Amortized bounds are marked with ?
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Williams
Insert ExtractMin Moves StrictIdenticalElements
Amortized bounds are marked with ?
log nlog n log n yes yes
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Williams
Carlsson et al.
Insert ExtractMin Moves StrictIdenticalElements
Amortized bounds are marked with ?
log nlog n log n yes yes
log n1 log n no yes
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Williams
Carlsson et al.
Insert ExtractMin Moves StrictIdenticalElements
Amortized bounds are marked with ?
log nlog n log n yes yes
log n1 log n no yes
log n1 log n no yesEdelkamp et al.
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Williams
Carlsson et al.
Harvey & Zatloukal
Insert ExtractMin Moves StrictIdenticalElements
Amortized bounds are marked with ?
log nlog n log n yes yes
log n1 log n no yes
log n1 log n no yesEdelkamp et al.
? log n? 1 ? log n yes yes
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Williams
Franceschini & Munro
Carlsson et al.
Harvey & Zatloukal
Insert ExtractMin Moves StrictIdenticalElements
Amortized bounds are marked with ?
log nlog n log n yes yes
log n1 log n no yes
log n1 log n no yesEdelkamp et al.
? log n? 1 ? log n yes yes
? log n? log n ? 1 yes no
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Williams
Franceschini & Munro
Carlsson et al.
Harvey & Zatloukal
Insert ExtractMin Moves StrictIdenticalElements
Amortized bounds are marked with ?
log nlog n log n yes yes
log n1 log n no yes
log n1 log n no yesEdelkamp et al.
? log n? 1 ? log n yes yes
? log n? log n ? 1 yes no
? log n? 1 ? 1 yes yesNew
Priority Queues
8/24
Jesper Sindahl Nielsen
Maintain a set of n keys and support:
• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set
Williams
Franceschini & Munro
Carlsson et al.
Harvey & Zatloukal
Insert ExtractMin Moves StrictIdenticalElements
Amortized bounds are marked with ?
log nlog n log n yes yes
log n1 log n no yes
log n1 log n no yesEdelkamp et al.
? log n? 1 ? log n yes yes
? log n? log n ? 1 yes no
? log n? 1 ? 1 yes yes
log n1 log n yes no
New
New
Outline
9/24
Jesper Sindahl Nielsen
Outline
9/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Outline
9/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
A B
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
A B
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
A B
conclude A > B
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
A B
conclude A > B
To sort n gold pieces we needΘ(n log n) scalings/comparisons
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
A B
conclude A > B
To sort n gold pieces we needΘ(n log n) scalings/comparisons
A B
287315
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
A B
conclude A > B
To sort n gold pieces we needΘ(n log n) scalings/comparisons
A B
287315
Question: Can we sort faster ifwe have access to a digital scale?
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
A B
conclude A > B
To sort n gold pieces we needΘ(n log n) scalings/comparisons
A B
287315
Question: Can we sort faster ifwe have access to a digital scale?
Answer: Yes.
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
A B
conclude A > B
To sort n gold pieces we needΘ(n log n) scalings/comparisons
A B
287315
Question: Can we sort faster ifwe have access to a digital scale?
Answer: Yes.
Question: How much faster?
Integer Sorting
10/24
Jesper Sindahl Nielsen
Comparison based sorting: only allowed operation ≤
A B
conclude A > B
To sort n gold pieces we needΘ(n log n) scalings/comparisons
A B
287315
Question: Can we sort faster ifwe have access to a digital scale?
Answer: Yes.
Depends on accuracy of the scale
Question: How much faster?
Integer Sorting
11/24
Jesper Sindahl Nielsen
Integer Sorting
11/24
Jesper Sindahl Nielsen
Time Space NotesAlgorithm
w is the number of bits per word.
Integer Sorting
11/24
Jesper Sindahl Nielsen
Time Space NotesAlgorithm
Radix sort (1887?) O(n · wlogn
) O(n) optimal for w = O(log n)
w is the number of bits per word.
Integer Sorting
11/24
Jesper Sindahl Nielsen
Time Space NotesAlgorithm
Radix sort (1887?) O(n · wlogn
) O(n) optimal for w = O(log n)
vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie
w is the number of bits per word.
Integer Sorting
11/24
Jesper Sindahl Nielsen
Time Space NotesAlgorithm
Radix sort (1887?) O(n · wlogn
) O(n) optimal for w = O(log n)
vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie
Kirkpatrick &Reisch (1984)
O(n log wlogn
) Only asymptotically better thanvEB when w = O(log n)
w is the number of bits per word.
Integer Sorting
11/24
Jesper Sindahl Nielsen
Time Space NotesAlgorithm
Radix sort (1887?) O(n · wlogn
) O(n) optimal for w = O(log n)
vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie
Kirkpatrick &Reisch (1984)
O(n log wlogn
) Only asymptotically better thanvEB when w = O(log n)
Signature sort(1995)
O(n) O(n) Only for w = Ω(log2+ε n)
w is the number of bits per word.
Integer Sorting
11/24
Jesper Sindahl Nielsen
Time Space NotesAlgorithm
Radix sort (1887?) O(n · wlogn
) O(n) optimal for w = O(log n)
vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie
Kirkpatrick &Reisch (1984)
O(n log wlogn
) Only asymptotically better thanvEB when w = O(log n)
Signature sort(1995)
O(n) O(n) Only for w = Ω(log2+ε n)
Han & Thorup(2002) O(n
√log(w/ log n)) Randomized with mulitplicationO(n)
w is the number of bits per word.
Integer Sorting
11/24
Jesper Sindahl Nielsen
Time Space NotesAlgorithm
Radix sort (1887?) O(n · wlogn
) O(n) optimal for w = O(log n)
vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie
Kirkpatrick &Reisch (1984)
O(n log wlogn
) Only asymptotically better thanvEB when w = O(log n)
Signature sort(1995)
O(n) O(n) Only for w = Ω(log2+ε n)
Han & Thorup(2002) O(n
√log(w/ log n)) Randomized with mulitplication
Thorup (2002) O(n log log n) O(n) Randomized, AC0
O(n)
w is the number of bits per word.
Integer Sorting
11/24
Jesper Sindahl Nielsen
Time Space NotesAlgorithm
Radix sort (1887?) O(n · wlogn
) O(n) optimal for w = O(log n)
vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie
Kirkpatrick &Reisch (1984)
O(n log wlogn
) Only asymptotically better thanvEB when w = O(log n)
Signature sort(1995)
O(n) O(n) Only for w = Ω(log2+ε n)
Han & Thorup(2002) O(n
√log(w/ log n)) Randomized with mulitplication
Thorup (2002) O(n log log n) O(n) Randomized, AC0
O(n)
Han (2004) O(n log log n) O(n) Deterministic, with multiplication
w is the number of bits per word.
Integer Sorting
12/24
Jesper Sindahl Nielsen
Time NotesAlgorithm
Radix sort (1887?) O(n · wlogn
) Optimal for w = O(log n)
Signature sort (1995) O(n) Only for w = Ω(log2+ε n)
Han & Thorup (2002) O(n√log(w/ log n)
)Randomized with mulitplication
Time per element
w
log n
O(1)
O(√log log n)
log2 n log log n log2+ε n
NEW1
2
3
1
2
3
Overview
13/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Overview
13/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Text Indexing
14/24
Jesper Sindahl Nielsen
Text Indexing
14/24
Jesper Sindahl Nielsen
Most of us daily use services like Google, Bing, etc.
Text Indexing
14/24
Jesper Sindahl Nielsen
Most of us daily use services like Google, Bing, etc.
Text Indexing
14/24
Jesper Sindahl Nielsen
Most of us daily use services like Google, Bing, etc.
We want documents that contain both terms
Text Indexing
14/24
Jesper Sindahl Nielsen
Most of us daily use services like Google, Bing, etc.
We want documents that contain both terms
Text Indexing
14/24
Jesper Sindahl Nielsen
Most of us daily use services like Google, Bing, etc.
We want documents that contain both terms
Find all emails sent to madalgo that are not seminars
Text Indexing - Two Patterns
15/24
Jesper Sindahl Nielsen
Text Indexing - Two Patterns
15/24
Jesper Sindahl Nielsen
Problem definition:
Text Indexing - Two Patterns
15/24
Jesper Sindahl Nielsen
Problem definition:
Given documents D = d1, d2, . . . , dD
Text Indexing - Two Patterns
15/24
Jesper Sindahl Nielsen
Problem definition:
Given documents D = d1, d2, . . . , dD
preprocess and store D such that
Text Indexing - Two Patterns
15/24
Jesper Sindahl Nielsen
Problem definition:
Given documents D = d1, d2, . . . , dD
preprocess and store D such that
when given two text strings P1, P2 (online)
Text Indexing - Two Patterns
15/24
Jesper Sindahl Nielsen
Problem definition:
Given documents D = d1, d2, . . . , dD
preprocess and store D such that
when given two text strings P1, P2 (online)
we can report all di where both P1 and P2 occur
Text Indexing - Two Patterns
15/24
Jesper Sindahl Nielsen
Problem definition:
Given documents D = d1, d2, . . . , dD
preprocess and store D such that
when given two text strings P1, P2 (online)
we can report all di where both P1 and P2 occur(The two-pattern problem)
Text Indexing - Two Patterns
15/24
Jesper Sindahl Nielsen
Problem definition:
Given documents D = d1, d2, . . . , dD
preprocess and store D such that
when given two text strings P1, P2 (online)
we can report all di where both P1 and P2 occur
we can report all di where both P1 occurs but P2
does not occur
(The two-pattern problem)
Text Indexing - Two Patterns
15/24
Jesper Sindahl Nielsen
Problem definition:
Given documents D = d1, d2, . . . , dD
preprocess and store D such that
when given two text strings P1, P2 (online)
we can report all di where both P1 and P2 occur
we can report all di where both P1 occurs but P2
does not occur
(The two-pattern problem)
(The forbidden pattern problem)
Text Indexing - Two Patterns
16/24
Jesper Sindahl Nielsen
Space Query Time AuthorsProblem
Text Indexing - Two Patterns
16/24
Jesper Sindahl Nielsen
Space Query Time AuthorsProblem
Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03
Text Indexing - Two Patterns
16/24
Jesper Sindahl Nielsen
Space Query Time AuthorsProblem
Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03
Two-Pattern n logn√nt log n log2 n + t Cohen and Porat ’10
Text Indexing - Two Patterns
16/24
Jesper Sindahl Nielsen
Space Query Time AuthorsProblem
Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03
Two-Pattern n logn√nt log n log2 n + t
Two-Pattern n√nt log n log n + t
Cohen and Porat ’10
Hon et al. ’10
Text Indexing - Two Patterns
16/24
Jesper Sindahl Nielsen
Space Query Time AuthorsProblem
Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03
Two-Pattern n logn√nt log n log2 n + t
Two-Pattern n√nt log n log n + t
Cohen and Porat ’10
Hon et al. ’10
Forbidden Pattern (FP) n3/2/ logn√n + t Fischer et al. ’12
Text Indexing - Two Patterns
16/24
Jesper Sindahl Nielsen
Space Query Time AuthorsProblem
Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03
Two-Pattern n logn√nt log n log2 n + t
Two-Pattern n√nt log n log n + t
Cohen and Porat ’10
Hon et al. ’10
Forbidden Pattern (FP) n3/2/ logn√n + t Fischer et al. ’12
Forbidden Pattern Hon et al. ’12n√nt log n log2 n + t
Text Indexing - Two Patterns
16/24
Jesper Sindahl Nielsen
Space Query Time AuthorsProblem
Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03
Two-Pattern n logn√nt log n log2 n + t
Two-Pattern n√nt log n log n + t
Cohen and Porat ’10
Hon et al. ’10
Forbidden Pattern (FP) n3/2/ logn√n + t Fischer et al. ’12
Forbidden Pattern Hon et al. ’12n√nt log n log2 n + t
Both 2P and FP n√nt log n logε n + t New
Text Indexing - Two Patterns
16/24
Jesper Sindahl Nielsen
Space Query Time AuthorsProblem
Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03
Two-Pattern n logn√nt log n log2 n + t
Two-Pattern n√nt log n log n + t
Cohen and Porat ’10
Hon et al. ’10
Forbidden Pattern (FP) n3/2/ logn√n + t Fischer et al. ’12
Forbidden Pattern Hon et al. ’12n√nt log n log2 n + t
Both 2P and FP n√nt log n logε n + t New
Is√n query time really necessary?
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Annotate each leaf with thedocument id the suffx starts in
4 5 4 7 6 4 8 9 9 1 2 3
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Annotate each leaf with thedocument id the suffx starts in
4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Annotate each leaf with thedocument id the suffx starts in
4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Annotate each leaf with thedocument id the suffx starts in
4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and
P1P2
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Annotate each leaf with thedocument id the suffx starts in
4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and
A
P1P2
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Annotate each leaf with thedocument id the suffx starts in
4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and
A B
P1P2
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Annotate each leaf with thedocument id the suffx starts in
4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and
A B
Report: A B∩
P1P2
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Annotate each leaf with thedocument id the suffx starts in
4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and
A B
Report: A B∩
P1P2
= 4
Text Indexing - Two Patterns
17/24
Jesper Sindahl Nielsen
Build suffix tree on concatenation of the documents
S = d1$d2$ · · · $dD$
Annotate each leaf with thedocument id the suffx starts in
4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and
A B
Report: A B∩
P1P2
= 4set intersection is usually hard
Text Indexing - Two Patterns
18/24
Jesper Sindahl Nielsen
Text Indexing - Two Patterns
18/24
Jesper Sindahl Nielsen
Lower Bound NotesProblem
Text Indexing - Two Patterns
18/24
Jesper Sindahl Nielsen
Lower Bound NotesProblem
Both 2P and FP ω is matrix multiplication constant (new)P (n) + nQ(n) = Ω(nω)
Text Indexing - Two Patterns
18/24
Jesper Sindahl Nielsen
Lower Bound NotesProblem
Both 2P and FP ω is matrix multiplication constant (new)P (n) + nQ(n) = Ω(nω)
Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1)) Kopelowitz et al (3SUM)
Text Indexing - Two Patterns
18/24
Jesper Sindahl Nielsen
Lower Bound NotesProblem
Both 2P and FP ω is matrix multiplication constant (new)P (n) + nQ(n) = Ω(nω)
Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))
Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)
Kopelowitz et al (3SUM)
Kopelowitz et al (3SUM)
Text Indexing - Two Patterns
18/24
Jesper Sindahl Nielsen
Lower Bound NotesProblem
Both 2P and FP ω is matrix multiplication constant (new)P (n) + nQ(n) = Ω(nω)
Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))
Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)
Kopelowitz et al (3SUM)
Kopelowitz et al (3SUM)
Text Indexing - Two Patterns
18/24
Jesper Sindahl Nielsen
Lower Bound NotesProblem
Both 2P and FP ω is matrix multiplication constant (new)
Both 2P and FP Pointer Machine Model (new)
P (n) + nQ(n) = Ω(nω)
S(n)Q(n) = Ω(n2−o(1))
Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))
Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)
Kopelowitz et al (3SUM)
Kopelowitz et al (3SUM)
Text Indexing - Two Patterns
18/24
Jesper Sindahl Nielsen
Lower Bound NotesProblem
Both 2P and FP ω is matrix multiplication constant (new)
Both 2P and FP Pointer Machine Model (new)
Semi group model (new)
P (n) + nQ(n) = Ω(nω)
S(n)Q(n) = Ω(n2−o(1))
S(n)Q(n)2 = Ω(n2−o(1))Both 2P and FP
Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))
Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)
Kopelowitz et al (3SUM)
Kopelowitz et al (3SUM)
Text Indexing - Two Patterns
18/24
Jesper Sindahl Nielsen
Lower Bound NotesProblem
Both 2P and FP ω is matrix multiplication constant (new)
Both 2P and FP Pointer Machine Model (new)
Semi group model (new)
P (n) + nQ(n) = Ω(nω)
S(n)Q(n) = Ω(n2−o(1))
S(n)Q(n)2 = Ω(n2−o(1))Both 2P and FP
Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))
Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)
Kopelowitz et al (3SUM)
Kopelowitz et al (3SUM)
Q(n) = (nt)12−α ⇒
Both 2P and FPS(n) = Ω(n
1+6α1+2α
−o(1))
Pointer Machine Modelt is the output size (new)
Text Indexing - How to prove the lower bounds?
19/24
Jesper Sindahl Nielsen
We use two different techniques:
1) Reduction from Matrix Multiplication
2) Find a hard input for a Pointer Machine Data Structure
1) Given two matrices, produce a set of documentsand queries such that the product of the twomatrices can be found
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
dA1 = 1#2#3#
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
dA1 = 1#2#3#
dA2 = 3#
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
123
4 5 6 Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6#
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
× =T F TT F FT T T
F F TF T TT F T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
P1 = 1 P2 = 4Jesper Sindahl Nielsen
Query:
× =T F TT F FT T T
F F TF T TT F T
T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
P1 = 1 P2 = 4Jesper Sindahl Nielsen
Query:
× =T F TT F FT T T
F F TF T TT F T
T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
1 5P1 = 1 P2 =Jesper Sindahl Nielsen
Query:
× =T F TT F FT T T
F F TF T TT F T
T F
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
1 5P1 = 1 P2 =Jesper Sindahl Nielsen
Query:
× =T F TT F FT T T
F F TF T TT F T
T F
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
1 5P1 = 1 P2 =Jesper Sindahl Nielsen
Query:
× =T F TT F FT T T
F F TF T TT F T
T F TF F TT
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
Query:
× =T F TT F FT T T
F F TF T TT F T
T F TF F TT
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
3 5P1 = P2 =Jesper Sindahl Nielsen
Query:
× =T F TT F FT T T
F F TF T TT F T
T F TF F TT T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
3 5P1 = P2 =Jesper Sindahl Nielsen
Query:
× =T F TT F FT T T
F F TF T TT F T
T F TF F TT T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
3 6P1 = P2 =Query:
× =T F TT F FT T T
F F TF T TT F T
T F TF F TT T T
A B C
dA1 = 1#2#3#
dA2 = 3#
dA3 = 1#3#
dB1 = 6#
dB2 = 5#6#
dB3 = 4#6#
d1 = 1#2#3#6#
d2 = 3#5#6#
d3 = 1#3#4#6# D
123
Ci,j = T ⇔ ∃` s.t.
Ai,` = B`,j = T
i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T
Text Indexing - How to prove the lower bounds?
20/24
Jesper Sindahl Nielsen
3 6P1 = P2 =Query:
Overview
21/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
Overview
21/24
Jesper Sindahl Nielsen
1. Algorithms and Data Structures(a) Abstract description(b) Example
2. Implicit Data Structures(a) Finger Search(b) Priority Queues
3. Integer Sorting4. Text Indexing5. State and University Library
State and University Library
22/24
Jesper Sindahl Nielsen
State and University Library
22/24
Jesper Sindahl Nielsen
State and University Library
22/24
Jesper Sindahl Nielsen
State and University Library
22/24
Jesper Sindahl Nielsen
State and University Library
22/24
Jesper Sindahl Nielsen
”Old Days”
State and University Library
22/24
Jesper Sindahl Nielsen
”Old Days”
State and University Library
22/24
Jesper Sindahl Nielsen
”Old Days”
State and University Library
22/24
Jesper Sindahl Nielsen
”Old Days”
State and University Library
22/24
Jesper Sindahl Nielsen
”Old Days”
State and University Library
22/24
Jesper Sindahl Nielsen
”Old Days”
State and University Library
22/24
Jesper Sindahl Nielsen
”Old Days”
State and University Library
22/24
Jesper Sindahl Nielsen
”Old Days”
LOTS of data
A nuisance in a collection
23/24
Jesper Sindahl Nielsen
A nuisance in a collection
23/24
Jesper Sindahl Nielsen
Danish Radio collection 2 hour chunks
Before one runs out another is started
A nuisance in a collection
23/24
Jesper Sindahl Nielsen
Danish Radio collection 2 hour chunks
Before one runs out another is started⇒ overlap
A nuisance in a collection
23/24
Jesper Sindahl Nielsen
Danish Radio collection 2 hour chunks
Before one runs out another is started⇒ overlap
Annoying for customers
A nuisance in a collection
23/24
Jesper Sindahl Nielsen
Danish Radio collection 2 hour chunks
Before one runs out another is started⇒ overlap
Annoying for customers
Task: find overlap and eliminate it
A nuisance in a collection
23/24
Jesper Sindahl Nielsen
Danish Radio collection 2 hour chunks
Before one runs out another is started⇒ overlap
Annoying for customers
Task: find overlap and eliminate it
A nuisance in a collection
23/24
Jesper Sindahl Nielsen
Danish Radio collection 2 hour chunks
Before one runs out another is started⇒ overlap
Annoying for customers
Task: find overlap and eliminate it
A nuisance in a collection
23/24
Jesper Sindahl Nielsen
Danish Radio collection 2 hour chunks
Before one runs out another is started⇒ overlap
Annoying for customers
Task: find overlap and eliminate it
Use Cross Correlation.
24/24
Thank You
Jesper Sindahl Nielsen