paolo ferragina, università di pisa compressed rank & select on general strings paolo ferragina...

14
Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa

Upload: quincy-sabine

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Paolo Ferragina, Università di Pisa

Compressed Rank & Select

on general strings

Paolo FerraginaDipartimento di Informatica, Università di Pisa

Paolo Ferragina, Università di Pisa

Generalised Rank and Select

Rank(c,i) = #c in L[1,i]

Select(c,i) = position of the i-th c in L

L = a b a a a c b c d a b e c d ...

Rank( a , 7 ) = 4Select( a , 2 ) = 3

Paolo Ferragina, Università di Pisa

Generalised Rank and Select

If is small (i.e. constant) Build binary Rank data structure for each symbol of

Rank takes O(1) time and small space

If is large (words ?) Need a smarter solution: Wavelet Tree data structure

Algorithmic reduction:

>> Reduce Rank&Select over arbitrary strings

... to Rank&Select over binary strings

Paolo Ferragina, Università di Pisa

The Wavelet Tree

a c

b r

d

abracadabra

(Alphabetic ?)Tree

Paolo Ferragina, Università di Pisa

The Wavelet Tree

a c

b r

d

abracadabra

aacaaa brdbr

brbr

rr?

aaaaa?

bb?

d?

Paolo Ferragina, Università di Pisa

The Wavelet Tree

a c

b r

d

abracadabra

aacaaa brdbr

brbr

abracadabra01100010110

aacaaa001000

brdbr00100

brbr0101

01100010110

001000 00100

0101

Fact. Given the tree and the binary strings,we can recover the original string !!

In any case, O(|| log ||) bits.

Easier Alphabetic order + Heap structure

Paolo Ferragina, Università di Pisa

brdbr00100

abracadabra01100010110

brbr0101

aacaaa001000

The Wavelet Tree

a c

b r

d

Rank(b,8)

Rank(b,3)

Rank(b,2)

Reducetorightsymbols

Reducetoleftsymbols

It’s binary

Every step can be turned to binary

Paolo Ferragina, Università di Pisa

abracadabra01100010110

Rank1(8)=3

Rank0(2) = 2 – Rank1(1)= 1 Rank0(3) =

3 – Rank1(3)= 2brbr0101

brdbr00100

aacaaa001000

The Wavelet Tree

a c

b r

d

Generalised R&S implemented with log || binary R&S

Rank(b,8)

Right move=Rank1

Left move=Rank0

Left move=Rank0

Select is similar

Paolo Ferragina, Università di Pisa

Representing Trees

Paolo FerraginaDipartimento di Informatica, Università di Pisa

Standard representationBinary tree: each node has twopointers to its left and right children

An n-node tree takes2n pointers or 2n lg n bits.

Supports finding left child or right child of a node (in constant time).

For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each.

x

xxxx

x xx x

Can we improve the space bound? There are less than 22n distinct binary trees on n nodes.

2n bits are enough to distinguish between any two different binary trees.

Can we represent an n node binary tree using 2n bits?

Binary tree representation A binary tree on n nodes can be represented using

2n+o(n) bits to support:

parent left child right child

in constant time.

Heap-like notation for a binary tree

1

11 1

1 1

1

1

00 00

0 000

0

Add external nodes

Label internal nodes with a 1and external nodes with a 0

Write the labels in level order

1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0

One can reconstruct the tree from this sequence

An n node binary tree can be represented in 2n+1 bits.

What about the operations?

Heap-like notation for a binary tree

1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

8

5 764

32

1

9

17161514

13121110

1

87

654

32

1 2 3 4 5 6 7 8

parent(x) = On red (⌊x/2⌋)

left child(x) = On green(2x)

right child(x) = On green(2x+1)

x x: # 1’s up to x (Rank)

x x: position of x-th 1 (Select)