volition: scalable and precise sequential consistency...

18
University of Illinois http://iacoma.cs.uiuc.edu / Volition: Scalable and Precise Sequential Consistency Violation Detection Xuehai Qian, Benjamin Sahelices Josep Torrellas, Depei Qian University of Illinois

Upload: others

Post on 09-Mar-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Universityof Illinois http://iacoma.cs.uiuc.edu/

Volition: Scalable and Precise Sequential Consistency Violation

Detection

Xuehai Qian, Benjamin SahelicesJosep Torrellas, Depei Qian

University of Illinois

Page 2: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Sequential Consistency (SC)

2

• In SC, memory accesses:

• Appear atomic

• Have a total global order

• For each thread, follow program order

P0

A0: x=1A1: y=1

P1

B0: p=yB1: t=x

Total Order

A0A1B0B1

A0B0A1B1

A0B0B1A1

B0B1A0A1

B0A0B1A1

B0A0A1B1

Page 3: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Sequential Consistency Violation(SCV)

3

• SCV: accesses reorder that does not conform to SC

• Machines support relaxed memory models, not SC

• Machines may induce SC violations (SCV)

P0

A0: x=1A1: y=1

P1

B0: p=yB1: t=x

Total Order

A0A1B0B1

A0B0A1B1

A0B0B1A1

B0B1A0A1

B0A0B1A1

B0A0A1B1

Initially: x=y=0

A1B0B1A0(p=1)

(t=0) (1,0)(1,1) (0,0) (0,1)

SCV can cause very unintuitive

results (bugs)

Page 4: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Why Detecting SCVs is Important?

4

• Programmers assume SC

• SCV is almost always a bug: unexpected interleaving

• Single-stepping debuggers cannot reproduce the bug

• Causes portability problems (e.g Intel TBB)

• Code may not work across machines

• Lock-free data structures sometimes explicitly use races but rely on SC

• Traditional data race detectors won’t work

Page 5: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Contribution

5

• Volition: first hardware scheme that detects SCVs in a relaxed-consistency machine, such that

• It is precise

• It is scalable

• Works for an arbitrary number of participating processors

• Leverages coherence protocol transactions to detect cycles in memory access orders across threads

• Incurs negligible overhead and is decoupled from the cache coherence protocol

Page 6: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Outline

6

•Motivation

•Volition Mechanisms

•Evaluation

Page 7: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Understanding SCV

7

• Two or more data races overlap and form a cycle

• Problem: detecting such cycles without affecting the execution timing

• Design Goals:

• Precise

• Scalable

• Low overhead

P0 P1 P0 P1 P2

wr y

rd x

wr x

rd y

Not completed

Completed

Page 8: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Definition: Active Accesses

8

P0 P1

wr x

rd y

wr y

Not completed

Completed

• Active Accesses are the accesses that may participate in SCV

• Active Access:

• Either itself or older local access is not completed OR

• It is the destination of a dependence from an active access

• Volition provides metadata hardware:

• For all accesses: Sequence Number (SN)

• For active accesses: current state (address, completion status, ..)

rd x

Page 9: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Definition: Active Data Race (AR)

9

• Data race where the source (and therefore destination) are active

• SCVs are composed of two or more ARs forming a cycle

• Volition detects ARs and cycles in hardware dynamically

P0 P1 P0 P1 P2

wr y

rd x

wr x

rd y

Page 10: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

How Does Volition Detect an SCV?

10

• All active accesses in a processor: recorded in an ACtive Table (ACT)

• If active access is the source of a dependence

• Record AR in AR Source Table (ARST)

• Respond with a bit in the coherence response

• Destination:

• Record the AR in the AR Destination Table (ARDT)

• If source becomes inactive

• Remove local ARST

• Send Expire msg to destination, remove ARDT

• SCV: when a processor finds two ARs forming a cycle (i.e. SNsrc > SNdst)

P0 P1

wr y

rd x

wr x

rd y

ACT ACT94100

94

100

SN

90

98

SN

9098

ARST100➙90

ARST98➙9498➙94

ARDT ARDT100➙90

Page 11: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Active Table (ACT)

11

• FIFO table of all active accesses

• When to insert an ACT entry?

• Access is issued (program order)

• When to delete an ACT entry?

• Access is at the head of ACT and completed, AND

• Access is not the destination of an AR

• HW keeps deleting entries until the conditions don’t hold

• After deleting an entry: if it is the source of an AR

• Remove the related entry in ARST

• Send Expire msg

• Remove ARDT entry when expire is received

P0 P1

wr y

rd x

wr x

ACT ACT

94100

94

100

SN

90

SN

90yx

SN Addr PC ... SN Addr PC ...

ARST100➙90

ARDT100➙90

x

Page 12: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Finding Cycles with any # of Processors

12

Pi Pj Pk

SNi

SNjd

SNjs

SNkAR

ARARtrans

• AR Propagation: Volition creates a transitive dependence out of two dependences

• ARtrans: from the src of one AR to the dst of the other AR

• Detected by the processor in the middle

• Only if program order connects these two ARs

P0 P1 P2

Page 13: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

SCV Detection with Arbitrary # of Processors

13

Pi Pj Pk

1

2

3

4

5

6

• In each processor: Volition HW keeps generating transitive dependences

• When a processor detects a local dependence from younger to older access ➙ SCV

Pi Pj Pk

1

2

3

4

5

6

Pi Pj Pk

1

2

3

4

5

6

AR Propagation AR detected or propagated so farSCV!!

Page 14: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Multi-word Cache Line Issue

14

P0 P1

a b

Wc

Rb

Wa

Wa does not access same word as Rb ➙ HW

does not record ARRc

Wb

Wb does not cause a coherence transaction ➙HW does not record AR

Missed AR and cycle

Page 15: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Support for Multi-word Cache Line

15

• For lines that are being actively shared

• Augment line with Summary of Active Information (SAI)

• Contains current information on system-wide active accesses to any of the words

• Indicates: when future access, do we need to record an AR?

a bSAIa

R3,R6W1

SAIb

• When a word is accessed and AR-recording is necessary:

• If a cache coherence transaction occurs ➙ reuse the transaction to record the AR

• Else ➙ create a “metadata access” transaction to record the AR

• SAI info cleared when an access becomes inactive

Page 16: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Summary of Evaluation

16

• Volition is able to detect SCVs in codes in which we removed fences

• Support for multi-word cache line metadata is needed to avoid missing SCVs

• Using data races as proxies for SCVs is insufficient

Page 17: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Xuehai Qian Volition: Scalable and Precise SC Detection

Conclusion

17

• Volition is the first scalable and precise SCV detector

• Works with directory protocol

• Can detect SCVs involving an arbitrary number of processors

• Incurs negligible execution time and traffic overhead

• Needs modest-sized hardware structures

Page 18: Volition: Scalable and Precise Sequential Consistency ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos13_3.pdfXuehai Qian Volition: Scalable and Precise SC Detection NYV\W Conclusion

Universityof Illinois http://iacoma.cs.uiuc.edu/

Volition: Scalable and Precise Sequential Consistency Violation

Detection

Xuehai Qian, Benjamin SahelicesJosep Torrellas, Depei Qian

University of Illinois