light64: lightweight hardware support for data race detection during systematic testing of parallel...

Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs

A. Nistor, D. Marinov and J. Torellas

to appear MICRO’09LBA reading group – 09/29/09

(by Evangelos)

Introduction – Context Debugging of parallel applications

Even for 1 input too many interleavings Systematic Testing

Execute many times - explore all interleavings

Assumptions: Input provided Thread Interleaving only cause of non-determinism

Goal: Hardware support for data race detection under Systematic Testing

Background of Systematic Testing

• Serializing of threads (multiplexing)

• New scheduler implementation

• Happens-before definition

• Segment-based interleaving


State: represented by a Serial Log; ordered list of segments

Light64 – The Idea

“Two different thread interleavings that have the same happens-before graph but a flipped data race, will very likely have at least a small deviation in the execution history”

Corner cases?

No false positives; few false negatives Systematic tester environment highly

deterministic Extremely improbable for two different

streams of values to generate the same hash

Cannot identify benign races; races on data that will never be consumed

By construction…

Design

Small hardware modifications CRC logic at the head of ROB ISA extensions; start/stop – save/load hash

history Two modes of execution

Passive Mode Active Mode Tradeoff between accuracy and

performance

Passive Mode

During step 4 Augment each state with the Execution

History Hash. Check if executions with same happens-before have the same hash value (e.g., S2 & S11)

No guarantees on coverage Dependable on systematic tester’s exploration

strategy and pruning heuristics No practical overhead

Active Mode

During step 2; While re-executing to reach the selected state ‘S’,

flip as many segments as possible. Compare Execution History Hash against original execution

Heuristic 1 – efficient segment reordering Smallest-ID Thread first during first run Biggest-ID Thread first during re-execution

Heuristic 2 – additional re-executions to increase coverage

ActiveFIN – re-execute all final states ActiveFULL – re-execute all states

Experimental Setup

Used Pin to model a system running a systematic tester

Instruction count as a performance metric

SPLASH-2 benchmarks (modified & unmodified)

6 versions of a system: Plain, Plain+RD, ActiveNO, ActiveFIN,

ActiveFULL, Passive

State Space Characterization

Race Detection Capability

Runtime Overhead

Runtime Overhead – Software-based

Conclusions

Lightweight support for data race detection in a Systematic Tester world

Relatively low overhead for S.T. Not a conventional MICRO paper

light64: lightweight hardware support for data race detection during systematic testing of parallel...

Documents

passive slide

performance slide

interleaving slide

execution history slide

evangelos slide

construction slide

runtime overhead slide

practical overhead slide