succinct indexes for strings, binary relations and multi-labeled trees
Post on 12-Jan-2016
30 Views
Preview:
DESCRIPTION
TRANSCRIPT
Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees
Jérémy Barbay, Meng He, J. Ian Munro,
University of WaterlooS. Srinivasa Rao,
IT University of Copenhagen
Background: Succinct Data Structures
What are succinct data structures Jacobson 1989
Why succinct data structures Large data sets in modern applications:
textual, genomic, spatial or geometric An implementation: Delpratt et al. 2006
Succinct integrated encodings Main data and auxiliary data structures
Our Problem: Succinct Indexes Use of the concept in previous work
Compact PAT trees: Clark & Munro 1996 Lower bounds: Demaine & López-Ortiz 2001;
Miltersen 2005 Upper bounds: Sadakane & Grossi 2006
Definition of succinct indexes in data structure design ADT: primitive access operators Succinct index: more powerful operators
Succinct Integrated Encodings
+
Navigational Operations
Auxiliary Data Structures
XMain Data
Succinct Indexes
+
Navigational Operations
Succinct IndexMain Data
Succinct Indexes vs. Integrated Encodings
Maximizing the freedom of the encoding of the main data
Allowing incremental design
Supporting implicit data
Strings: Definitions Notation
Alphabet: [σ]={1, 2, …, σ} String: S[1..n]
Operations: string_access(x): S[x] string_rank(α, x): number of occurrences of α in S[1..x]
string_select(α, r): position of the rth occurrence of α in S
Strings: An Example
S = a a b a c c c d a d d a b b b c
string_access(8) =
d
string_rank(a, 8) =
3
string_select(b, 3) =
14
Strings: Previous Results
Succinct Integrated Encodings Wavelet trees: Grossi et al. 2003
Space: nH0 + o(n)∙lg σ bits Time: O(lg σ) time for all three operations
Golynski et al. 2006 Space: n (lg σ + o(lg σ)) bits Time: O(lglg σ) time for string_access and
string_rank, O(1) time for string_select
Strings: Our Results
Succinct Indexes ADT
string_access: f(n, σ) time Space: n∙o(lg σ) bits Operations
string_rank: O(lglg σ lglglg σ (f(n, σ)+lglg
σ)) string_select: O(lglglg σ (f(n, σ)+lglg σ)) Other operations: negations
Binary Relations: Definitions Notation
Binary relation: R ⊆ [n] x [σ] Number of objects: n; number of labels: σ Number of object-label pairs: t
Operations object_access(x, r): rth label associated with x label_access(x, α): whether x is associated
with α label_rank(α, x): number of objects labeled α
up to object x label_select(α, r): rth object labeled α
Binary Relations: An Example
σ
n object_access(1, 2) =label_access(2, 3) =label_rank(3, 4) =label_select(4, 3) =
4
false
3
5
0 1 0 1 0
0 0 0 1 0
1 0 1 1 0
1 1 0 0 1
Binary Relations: Previous Results
Succinct Integrated Encodings Barbay et al., 2006
Space: t (lg σ + o(lg σ)) bits Time: O(lglg σ) time for object_access,
label_rank and label_access, O(1) time for label_select
Binary Relations: Our Results Succinct Indexes
ADT: object_access: f(n,σ,t)
Space: t∙o(lg σ) bits
Time: label_rank and label_access: O(lglg σ
lglglg σ (f(n,σ,t) + lglg σ)) label_select: O(lglglg σ (f(n,σ,t) + lglg σ))
Multi-labeled Trees: Definitions Notation
Number of nodes: n Number of labels: σ Number of node-label pairs: t
Operations α-descendant α-child α-ancestor
Multi-labeled Trees: An Example
1
2
3 7
5 6
4
8
9 10 11
{a, c, d}
{c, d}
{a}
{a, c}
{a, b} {b,d}
{a, b}{b}
{c} {c,d}
{b,c,d}
Node 2 is a c-ancestorof node 6
Node 6 is a b-descendantof node 2
Node 10 is a d-childof node 8
Multi-labeled Trees: Previous Results
Labeled trees Geary et al. 2004 Ferragina et al. 2005 Barbay et al. 2006
Multi-labeled trees Barbay et al. 2006
3
Multi-labeled Trees: Our Approach Traversal Orders
Preorder DFUDS order
Ordinal Trees: DFUDS Benoit et al. 1999 &
2005 Jansson et al. 2007
2 Binary Relations Nodes in preorder &
labels Nodes in DFUDS order
& labels
1
2
7
5 6
4
8
9 10 11
3
4 5 6
7 8
Multi-labeled Trees: Our Results Succinct Indexes
ADT: node_label(x, r) Supporting α-child/descendant queries:
t∙o(lg σ) bits Supporting α-child/descendant/ancestor
queries: t∙(lg ρ + o(lg ρ) + o(lg σ))bits (ρ: recursivity)
Supporting α-child/descendant/ancestor queries of node x after another node y
Applications Compressed Succinct Encodings
Strings Space: nHk + o(nlg σ) bits Operations:
string_access: O(1) String_rank: O((lglg σ)2lglglg σ) string_select: O(lglg σ lglglg σ)
First high-order entropy-compressed encoding supporting rank/select efficiently
Other Data Structures
Applications (Continued) High-order entropy-compressed text
indexes for large alphabets Notations: n-text size, σ-alphabet size, m-
pattern length, occ-number of occurrences
Our results Space: n Hk+o(n lg σ) bits Pattern searching: O(m lglg σ+occ lg1+ε
n lglg σ) Previous results: a lg σ factor instead of
lglg σ or incompressible
Conclusions We showed the importance of succinct
indexes in the design of succinct data structures by designing: Succinct representation of multi-
labeled trees that supports efficient retrieval of ancestors / children / descendants by label
First high-order entropy compressed representation of strings supporting rank/select
High-order entropy compressed text indexes for large alphabets
Conclusions (Continued)
The concept of succinct indexes is useful in designing succinct data structures … it maximizes the freedom of the encoding of the main data and leads to a rich choice of design tradeoffs.
Thank you!
top related