succinct indexes for strings, binary relations and multi-labeled trees

24
Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen

Upload: melora

Post on 12-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees. Jérémy Barbay, Meng He , J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen. Background: Succinct Data Structures. What are succinct data structures Jacobson 1989 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees

Jérémy Barbay, Meng He, J. Ian Munro,

University of WaterlooS. Srinivasa Rao,

IT University of Copenhagen

Page 2: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Background: Succinct Data Structures

What are succinct data structures Jacobson 1989

Why succinct data structures Large data sets in modern applications:

textual, genomic, spatial or geometric An implementation: Delpratt et al. 2006

Succinct integrated encodings Main data and auxiliary data structures

Page 3: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Our Problem: Succinct Indexes Use of the concept in previous work

Compact PAT trees: Clark & Munro 1996 Lower bounds: Demaine & López-Ortiz 2001;

Miltersen 2005 Upper bounds: Sadakane & Grossi 2006

Definition of succinct indexes in data structure design ADT: primitive access operators Succinct index: more powerful operators

Page 4: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Succinct Integrated Encodings

+

Navigational Operations

Auxiliary Data Structures

XMain Data

Page 5: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Succinct Indexes

+

Navigational Operations

Succinct IndexMain Data

Page 6: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Succinct Indexes vs. Integrated Encodings

Maximizing the freedom of the encoding of the main data

Allowing incremental design

Supporting implicit data

Page 7: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Strings: Definitions Notation

Alphabet: [σ]={1, 2, …, σ} String: S[1..n]

Operations: string_access(x): S[x] string_rank(α, x): number of occurrences of α in S[1..x]

string_select(α, r): position of the rth occurrence of α in S

Page 8: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Strings: An Example

S = a a b a c c c d a d d a b b b c

string_access(8) =

d

string_rank(a, 8) =

3

string_select(b, 3) =

14

Page 9: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Strings: Previous Results

Succinct Integrated Encodings Wavelet trees: Grossi et al. 2003

Space: nH0 + o(n)∙lg σ bits Time: O(lg σ) time for all three operations

Golynski et al. 2006 Space: n (lg σ + o(lg σ)) bits Time: O(lglg σ) time for string_access and

string_rank, O(1) time for string_select

Page 10: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Strings: Our Results

Succinct Indexes ADT

string_access: f(n, σ) time Space: n∙o(lg σ) bits Operations

string_rank: O(lglg σ lglglg σ (f(n, σ)+lglg

σ)) string_select: O(lglglg σ (f(n, σ)+lglg σ)) Other operations: negations

Page 11: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Binary Relations: Definitions Notation

Binary relation: R ⊆ [n] x [σ] Number of objects: n; number of labels: σ Number of object-label pairs: t

Operations object_access(x, r): rth label associated with x label_access(x, α): whether x is associated

with α label_rank(α, x): number of objects labeled α

up to object x label_select(α, r): rth object labeled α

Page 12: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Binary Relations: An Example

σ

n object_access(1, 2) =label_access(2, 3) =label_rank(3, 4) =label_select(4, 3) =

4

false

3

5

0 1 0 1 0

0 0 0 1 0

1 0 1 1 0

1 1 0 0 1

Page 13: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Binary Relations: Previous Results

Succinct Integrated Encodings Barbay et al., 2006

Space: t (lg σ + o(lg σ)) bits Time: O(lglg σ) time for object_access,

label_rank and label_access, O(1) time for label_select

Page 14: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Binary Relations: Our Results Succinct Indexes

ADT: object_access: f(n,σ,t)

Space: t∙o(lg σ) bits

Time: label_rank and label_access: O(lglg σ

lglglg σ (f(n,σ,t) + lglg σ)) label_select: O(lglglg σ (f(n,σ,t) + lglg σ))

Page 15: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Multi-labeled Trees: Definitions Notation

Number of nodes: n Number of labels: σ Number of node-label pairs: t

Operations α-descendant α-child α-ancestor

Page 16: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Multi-labeled Trees: An Example

1

2

3 7

5 6

4

8

9 10 11

{a, c, d}

{c, d}

{a}

{a, c}

{a, b} {b,d}

{a, b}{b}

{c} {c,d}

{b,c,d}

Node 2 is a c-ancestorof node 6

Node 6 is a b-descendantof node 2

Node 10 is a d-childof node 8

Page 17: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Multi-labeled Trees: Previous Results

Labeled trees Geary et al. 2004 Ferragina et al. 2005 Barbay et al. 2006

Multi-labeled trees Barbay et al. 2006

Page 18: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

3

Multi-labeled Trees: Our Approach Traversal Orders

Preorder DFUDS order

Ordinal Trees: DFUDS Benoit et al. 1999 &

2005 Jansson et al. 2007

2 Binary Relations Nodes in preorder &

labels Nodes in DFUDS order

& labels

1

2

7

5 6

4

8

9 10 11

3

4 5 6

7 8

Page 19: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Multi-labeled Trees: Our Results Succinct Indexes

ADT: node_label(x, r) Supporting α-child/descendant queries:

t∙o(lg σ) bits Supporting α-child/descendant/ancestor

queries: t∙(lg ρ + o(lg ρ) + o(lg σ))bits (ρ: recursivity)

Supporting α-child/descendant/ancestor queries of node x after another node y

Page 20: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Applications Compressed Succinct Encodings

Strings Space: nHk + o(nlg σ) bits Operations:

string_access: O(1) String_rank: O((lglg σ)2lglglg σ) string_select: O(lglg σ lglglg σ)

First high-order entropy-compressed encoding supporting rank/select efficiently

Other Data Structures

Page 21: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Applications (Continued) High-order entropy-compressed text

indexes for large alphabets Notations: n-text size, σ-alphabet size, m-

pattern length, occ-number of occurrences

Our results Space: n Hk+o(n lg σ) bits Pattern searching: O(m lglg σ+occ lg1+ε

n lglg σ) Previous results: a lg σ factor instead of

lglg σ or incompressible

Page 22: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Conclusions We showed the importance of succinct

indexes in the design of succinct data structures by designing: Succinct representation of multi-

labeled trees that supports efficient retrieval of ancestors / children / descendants by label

First high-order entropy compressed representation of strings supporting rank/select

High-order entropy compressed text indexes for large alphabets

Page 23: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Conclusions (Continued)

The concept of succinct indexes is useful in designing succinct data structures … it maximizes the freedom of the encoding of the main data and leads to a rich choice of design tradeoffs.

Page 24: Succinct Indexes for Strings, Binary Relations and  Multi-labeled Trees

Thank you!