improving bloom filter configuration for lazy transactional memory

35
Improving Bloom Filter Configuration for Lazy Transactional Memory Mark Jeffrey and J. Gregory Steffan ECE, University of Toronto November 10, 2011

Upload: dora

Post on 23-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Improving Bloom Filter Configuration for Lazy Transactional Memory. Mark Jeffrey and J. Gregory Steffan ECE, University of Toronto November 10, 2011. Parallel Programming is Hard. T 1. T 3. T 2. Rd(a). Rd(a). Rd(x). Rd(b). Wr (c). Rd(a). Wr (a). Rd(a). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Improving Bloom Filter Configuration for Lazy Transactional Memory

Mark Jeffrey and J. Gregory SteffanECE, University of Toronto

November 10, 2011

Page 2: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 2

Parallel Programming is Hard

T1

Rd(a)

Rd(b)

Wr(a)

T2

Rd(a)

Wr(c)

Rd(a)

T3

Rd(x)

Rd(a)

Tools offload some burden of managing data accesses:– Memory Race Replay– Atomicity Violation Survival– Transactional Memory– Speculative Optimizations

Many tools are using Bloom filters

Page 3: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 3

Bloom Filter

• Bit-vector-based data structure [1970]– offers fast set operations– in exchange for some imprecision

• Recently used to compare memory accesses• With unconventional practices: Intersection

&

We show new practices are inefficient!(in theory and empirically)

Page 4: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 4

Bloom Filters in Concurrency ToolsSystem Year ApplicationBulk 2006 Hardware TMBulkSC 2007 Memory ConsistencyHARD 2007 Race DetectionDeLorean 2008 Deterministic Race ReplaySoftSig 2008 Code Analysis/Optimization/DebugRingSTM 2008 Software TMSigRace 2009 Race DetectionColorSafe 2010 Atomicity ViolationInvalSTM 2010 Software TMAdapSig 2010 Software TMSvS 2011 Auto-protection of shared state

Our propositions will improve parallelism!

Page 5: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 5

Tracking Address-Set Conflicts

Page 6: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 6

Address-Sets

T1

Rd(a)

Rd(b)

Wr(a)

T2

Rd(a)

Wr(c)

Rd(a)

T3

Rd(x)

Rd(a)

Read Set:• memory locations read• RT1 = {a,b}

Write Set:• memory locations written• WT1 = {a}

Page 7: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 7

Burden: Address-Set Conflicts

T1

Rd(a)

Rd(b)

Wr(a)

T2

Rd(a)

Wr(c)

Rd(a)

T3

Rd(x)

Rd(a)

Conflicts– address accesses are dependent– independence -> parallelism!– address conflicts -> no parallelism

Conflict Detection requires – read and write set comparison

Page 8: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 8

Test address-sets for null-intersections

Detect conflicts at the end of a transaction

Lazy Conflict Detection

R1={a,c}W1={b}

T1 T2

Wr(b)--Rd(a)Rd(a)-

Rd(c)- -Rd(b)

?021 RW

Page 9: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 9

Bloom Filters (BF)

Page 10: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 10

Bloom Filter Background

• Bloom filter is a compact set representation– bit vector - much smaller than address space

x

h()

xS )BF(

Page 11: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 11

Bloom Filter Background

y h()?)BF(Sy

{Yes, No}

Query for an address, y

Page 12: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 12

Bloom Filter False Positives (FPs)

• Encode a large address space into a bit-vector – response to query is actually No or Maybe

• False Positives – when “maybe” is wrong

is y in ?

x y

Page 13: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 13

Partitioned Bloom Filter

Insert an address, x:– k hash functions encode k bit indices to set

x

h1() h2() hk()…

xS )BF(

Page 14: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 14

Probability of False Positives is well understood

Query for an address, y:

Partitioned Bloom Filter

y

h1() h2() hk()…

{Maybe, No}

?)BF(Sy

Page 15: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 15

UnconventionalBloom Filter Null-Intersection Tests

Page 16: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 16

Two existing approaches:1. build a Queue of Queries (QoQ)

2. combine queries into distinct Bloom filter– replace many queries with 1 intersection!

Bloom Filter Null-Intersection Tests

a2a3a4a5 a1 ?

?

Page 17: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 17

Do two sets share any elements?

Partitioned BF Intersection

?021 SS

…& …

{Disjoint, Maybe Overlap}

Page 18: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 18

Any asserted bits indicate set overlap

Unpartitioned BF Intersection

?021 SS

…& …

{Disjoint, Maybe Overlap}

Page 19: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 19

Imprecision in BF Intersection

• Bloom filter was intended for fast Querying

• Recent systems use filter for Intersection– Imprecision can produce False Set-Overlaps (FSO)– We are the first to study Bloom filter FSOs– Our goal is to

Understand and improve Bloom filter intersection

Page 20: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 20

Important Questions

When using BFs for testing null-intersection1. How do BF Intersection and QoQ compare?– theoretical study [SPAA ‘11]

2. Can we compromise? – new Bloom filter design

3. Does theory work in practice? – empirical study

Page 21: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 21

1. How do BF Intersection and QoQ compare?

Bloom Filters for Null-Intersection Tests

Page 22: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 22

Definitions

sets access addressdisjoint ,BA

bits m

Page 23: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 23

Definitions

h1() h2() hk()……

partitions k

sets access addressdisjoint ,BA

bits m

Page 24: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 24

• Unpartitioned BF Intersection

• Partitioned BF Intersection

• Queue of BF Queries

BAkmUnpartp

2111

Probability of FSO [SPAA ‘11]h1 h2 hk…

h1 h2 hk…

kBA

mk

Partp 11

BkA

mk

QoQp 1111b2b3b4b5 b1 ϵ?

Page 25: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 25

For any length m, and k > 1 hash functions,

nedUnpartitiodPartitioneQoQ ppp

Queue of Queries gives the fewest false conflictsPartitioned intersection improves on Unpartitioned

Comparing FSOs [SPAA ’11]

b2b3b4 b1 ϵ?

h1 hk… h1 hk…

Page 26: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 26

2. Can we compromise? A new Bloom filter design

Bloom Filters for Null-Intersection Tests

Page 27: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 27

Batch-of-Bloom-filters (BoB)

x hpre

x

h1 hk…

…h1 hk

xS )BoB(

…h1 hk

bSSSS 21

)BF( 1S )BF( 2S )BF( bS)BF(S

Page 28: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 28

{Disjoint, Maybe Overlap}

BoB Intersection

&…

……

?021 SS

BoB: compromise between QoQ and Intersect

Page 29: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 29

3. Does theory work in practice?Bloom Filters for Null-Intersection Tests

Page 30: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 30

Methodology

• Augment RingSTM with alternate BF configs[Spear et al. SPAA ’08]– unpartitioned Bloom filter intersection

• Stress BF configurations using STAMP bench

• 8-core Intel Xeon with SSE2 ISA– 32-bit Linux 2.6.32-5-686

Page 31: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 31

QoQ, BoB, part. intersect outperform baseline

Performance Results: LabyrinthExecution Time Aborts

21% Speedup

Better

Page 32: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 32

Querying overhead counteracts reduced aborts

Performance Results: Kmeans-low

Better

>25% slowdown

Execution Time Aborts

Page 33: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 33

Conclusion

Page 34: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 34

Conclusion

Conflict detection often applies Bloom filters– for fast set operations: y ϵ S and S1∩S2

– unconventionally using BFs for null-intersection

Our recommendations (from theory & practice)1. strongly consider querying before intersection2. in hardware, consider intersecting BoBs3. build adaptive systems for application behaviors

Page 35: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Improving Bloom Filter Configuration for Lazy Transactional Memory

Thank [email protected]