bloom filters

Post on 08-Jan-2016

44 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Bloom Filters. Benoit Donnet November 30th, 2006. 1. Context. Introduced in 1970 ([bloom]) Set membership problem Trade-off between space and computing complexity Lossy summary technique Historical usage Spell checking ([McIlroy]) Database ([Bratbergsengen]). Content. Bloom filters - PowerPoint PPT Presentation

TRANSCRIPT

Bloom Filters

Benoit DonnetNovember 30th, 2006

1

2

Context

• Introduced in 1970 ([bloom])

- Set membership problem- Trade-off between space and

computing complexity- Lossy summary technique

• Historical usage- Spell checking ([McIlroy])

- Database ([Bratbergsengen])

3

Content

• Bloom filters• Extensions• Networking applications• Conclusion• References

4

Bloom filters

5

Construction

6

Membership Query

7

False Positive

• A Bloom filter can suffer of false positives

- The filter returns a positive answer for some elements that do not belong to A

• Can we evaluate a priori the impact of false positives on a Bloom filter?

8

False Positive (2)

9

False Positive (3)

10

False Positive (4)

11

Extensions

12

Content

• Compressed Bloom filters ([mitzenmacher])

• Counting Bloom filters ([Fan et al.])

• Dynamic Bloom filters ([Guo et al.])

• Retouched Bloom filters ([Donnet et al.])

13

Compressed BF

• A Bloom filter can be used a message exchanged by networked monitors

• New performance metric- Bandwidth

• Transmission size can be affected by compression

• Compressed Bloom filters ([mitzenmacher])

14

Compressed BF (2)

• Positive aspects:- Quantity of bit exchanged reduced- False positive rate reduced- Amount of computation per query

reduced• Cost:

- Internal memory increased- Compression/decompression process

15

Compressed BF (3)

16

Counting BF

• The subset A is changing over time- Insertion- Deletion

• How to perform deletion?- Couting Bloom filters ([Fan et al.])

17

Counting BF (2)

18

Counting BF (3)

• Which size for the counter?- 4 bits per counter are OK for most

of the applications• What happens in case of an

overflow?

19

Dynamic BF

• Statement:- During the execution of the

application, |A| can exceed its orignal size n

• Consequence:- The false positive rate is not

maintained anymore• Solution?

- Dynamic Bloom Filters ([Guo et al.])

20

Dynamic BF (2)

• It uses a matrix of s Bloom filters- Each Bloom filter uses m bits and k

hash functions• It starts with s equals to 1.• A new Bloom filter (i.e., a new row in

the matrix) is created when needed.

21

Dynamic BF (3)• How to insert an element?

- Check for an active Bloom filter- If there is no active Bloom filter,

create a new one- Add the element to the Bloom filter

• How to query an element?• If all s Bloom filters return false, the

element does not belong the DBF.• If, at least, one Bloom filter returns

true, the element probably belongs to the DBF.

22

Dynamic BF (4)

23

Retouched BF• Statement:

- Some false positives might be more troublesome than others

- Some applications might tolerate a small level of false negatives

• Question:- Can we trade-off the false positives

against false negatives?• Solution?

- Retouched Bloom filters ([Donnet et al. 06])

24

Retouched BF (2)

25

Retouched BF (3)

• Quid if we randomly reset s bits in the vector?

- Eliminates the same proportion of false positives as the proportion of false negatives generated

- Randomized bit clearing.• The process of removing selected false

positives is called selective clearing

26

Retouched BF (4)

27

Networking Applications

28

Distributed Caching

•Proxies cooperate to exchange cache information

•Instead of sharing URLs list, proxies broadcast Bloom filters ([Fan et al.])

•A Bloom filter represents a proxy’s cache content

29

Multicast

•A router maintains, for each multicast address, a list of associated interfaces/connections

•Replace the list by a Bloom filter ([Grönvall])

•Parallelization possible

•Deletion of an address can be achieved with a counting Bloom filter

30

Measurement

• Topology discovery at the IP interface level

• Traceroute monitors exchange information about what was previously discovered

- Doubletree• This information shared can be encoded

as a Bloom filter- Communication cost reduction ([Donnet et al.

05])

31

Conclusion

•A Bloom filter

•solves the set membership problem

•can generate false positives

•Extensions to standard Bloom filter were presented

•A few networking applications were discussed

32

References

[Bloom]: Space/Time Trade-Offs in Hash Coding with Allowable Errors. In Communications of the ACM. vol. 13, n°7.

[McIlroy]: Development of a Spelling List. In Transactions on Communications. vol. 30, n° 1.

[Mitzenmacher]: Compressed Bloom Filters. In Transactions on Networking. vol. 10, n° 5.

[Fan et al.]: Summary Cache: a Scalable Wide-Area Web Cache Sharing Protocol. In Transactions on Networking. vol. 8, n°3.

[Guo et al.]: Theory and Network Applications of Dynamic Bloom Filters. In Proc. INFOCOM 2006.

33

References (2)

[Bratbergsengen]: Hashing Methods and Relational Algebra Operations. In Proc. ICVLD 1984.

[Bruck et al.]: Weighted Bloom Filters. In Proc. ISIT 2006.

[Donnet et al. 06]: Retouched Bloom Filters: Allowing Networked Applications to Trade-Off Selected False Positives Against False Negatives. In Proc. CoNEXT 2006.

[Grönvall]: Scalable Multicast Forwarding. In Proc. ACM SIGCOMM 2001. Student Workshop.

[Donnet et al. 05]: Improved Algorithms for Network Topology Discovery. In Proc. PAM 2005.

top related