hard: hardware-assisted lockset- based race detection p.zhou, r.teodorescu, y.zhou. hpca’07 shimin...

25
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Upload: juniper-fields

Post on 17-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Motivating Example

TRANSCRIPT

Page 1: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

HARD: Hardware-Assisted lockset-based Race Detection

P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07

Shimin Chen

LBA Reading Group Presentation

Page 2: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Motivation

Data race detection important S/W solutions slow (not good for production

runs) Previous H/W solutions focus on happens-

before relation

Cannot detect potential races

Page 3: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Motivating Example

Page 4: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Solution: HARD (h/w lockset)

Challenges:– How to efficiently store and maintain lockset for

each variable in hardware?– How to efficiently perform the set operation in the

lockset algorithm? Main ideas (will be detailed later)

– h/w bloom filter– Piggybacking on cache coherence protocols– Reset all bloom filters after exiting a barrier

Page 5: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Outline

LockSet (refresh our memory) HARD Evaluation Conclusion

Page 6: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Main Lockset Algorithm

Idea: accesses to every shared variable should be protected by some common lock.

Data structures:– Thread t’s current lock set: L(t)– Candidate set for a variable v: C(v)

Algorithms– Modify L(t) upon lock acquire and release– Initiate C(v) to be a set of all locks– When t accesses v, C(v)=C(v) L(t)– If C(v) == then report violation on variable v

Page 7: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Reducing False Positives

Page 8: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Outline

LockSet (refresh our memory) HARD Evaluation Conclusion

Page 9: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

HARD Overview

LState: exclusive, shared, etc.

BFVector: candidate lock set for the cache line

Lock Register: Thread’s lockset

Counter Register: used for resolving hash collisions (more detail later)

2bits 16bits16bits

32bits

Page 10: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

HARD Overview: Operations

A lock a ‘1’ in bloom filter Fetching a line from memory: set the

BFVector to all 1s, LState to exclusive Update BFVector and LState on accesses Communicate them through coherence

protocol Lock register: thread’s lock set

2b 16b

16b

32b

Page 11: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Bloom Filter

Bloom filter: A bit vector that represents a set of keys– A key is hashed d (e.g. d=3) times and represented by d bits

Construct: for every key in the set, set its 3 bits in vector Membership Test: given a key, check if all its 3 bits are 1

– Definitely not in the set if some bits are 0– May have false positives

0 0 0 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 0 1

Bit0=H0(key)

Bit1=H1(key)

Bit2=H2(key)

Filter

Page 12: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Representing LockSet as Bloom Filter

4 hash functions Lockset Intersection:

bloom filter intersection Lockset empty:

any of the 4bits are all 0

Page 13: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

False Negative Caused by Bloom Filter

Page 14: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Prob of False Negatives

Suppose the candidate set contains m locks Given a lock, probability of recognizing it as a member:

prob_whole = prob_part k

prob_part = 1 – (1-1/n)m

When k=4, n=4:– 0.0039 (m=1), 0.037 (m=2), 0.111 (m=3)– Paper says: “experiments show that no races were missed”

But what if the thread currently holds multiple locks?

n bits n bits n bits n bits n bits n bits

k parts

k=4, n=4

Page 15: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

If threads hold 1 to 8 locks (not in the paper)

n bits =4k parts =4----------------------------------------------- m=1 m=2 m=3 m=4 t=1 : 0.0039 0.0366 0.1117 0.2184 t=2 : 0.0078 0.0719 0.2109 0.3891 t=3 : 0.0117 0.1059 0.2991 0.5225 t=4 : 0.0155 0.1387 0.3774 0.6267 t=5 : 0.0194 0.1702 0.4469 0.7083 t=6 : 0.0232 0.2006 0.5087 0.7720 t=7 : 0.0270 0.2299 0.5636 0.8218 t=8 : 0.0308 0.2581 0.6123 0.8607 -----------------------------------------------

Page 16: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Try another design

n bits =8k parts =8----------------------------------------------- m=1 m=2 m=3 m=4 t=1 : 0.0000 0.0000 0.0001 0.0009 t=2 : 0.0000 0.0000 0.0003 0.0017 t=3 : 0.0000 0.0000 0.0004 0.0026 t=4 : 0.0000 0.0000 0.0006 0.0034 t=5 : 0.0000 0.0000 0.0007 0.0043 t=6 : 0.0000 0.0001 0.0008 0.0051 t=7 : 0.0000 0.0001 0.0010 0.0060 t=8 : 0.0000 0.0001 0.0011 0.0069 -----------------------------------------------

Page 17: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Unlock operationremove bit from bloom filter?

32 bit counter register each bloom filter bit has 2 bit counter Increment the 2-bit counter if the

bloom filter bit is set Unlock: decrement the 2-bit counter,

if 0, clear bloom filter bit

2b 16b

16b

32b

Page 18: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Candidate Set and LState Communications

must broadcast changes to C(v) if cache line is in shared state

Page 19: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Handling Barriers

Set BFVectors to all 1s after exiting a barrier

(what if t2 does not hold any lock?)

Page 20: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Three Approximations

Bloom filter to represent lockset Lockset info only in cache

– Can only detect races in a short window of execution Cache line granularity

– False sharing– Compiler to put shared variables to different lines?– Removing false sharing is generally good

Page 21: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Outline

LockSet (refresh our memory) HARD Evaluation Conclusion

Page 22: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Methodology

SESC: cycle-accurate execution-driven simulator (MIPS instruction set)

Six SPLASH-2 benchmarks Randomly inject a data race: randomly remove a dynamic

instance of lock and corresponding unlock Compare with happens-before, ideal lockset

Page 23: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Bug detected, false alarms

Ideal: word-granularity, keep state in memory, perfect lockset # of false alarms is # of source code locations, dynamic errors are

much more

Page 24: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Mainly bus traffic increase Note that HARD requires bloom filter operation per memory

access in processor pipeline

Page 25: HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

Conclusion

Main idea: bloom filter to represent lockset Three approximations:

– Bloom filter to represent lockset– Lockset info only in cache– Cache line granularity

Problems:– Lockset: false positives– Seems hard to add operations into processor pipeline– Are these the right approximations for monitoring production

runs?