![Page 1: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/1.jpg)
Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir RakamaricSchool of Computing, University of Utah, Salt Lake City, UT 84112
Presented at IPDPS 2018
See paper for details
Ignacio Laguna, Greg L. Lee, Dong H. AhnLawrence Livermore National Laboratory, Livermore, CA
Github.com / PRUNERS
SWORD: A Bounded Memory-Overhead Detectorof OpenMP Data Races
in Production Runs
Courtesy Pinterest
![Page 2: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/2.jpg)
What is a data race?
![Page 3: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/3.jpg)
What is a data race?
Thread 1 Thread 2
![Page 4: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/4.jpg)
What is a data race?
Thread 1 Thread 2
WR/W
![Page 5: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/5.jpg)
What is a data race?
Thread 1 Thread 2
WR/W
No synchronizations
![Page 6: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/6.jpg)
T0 T1
W R/W
One way to eliminate this race
![Page 7: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/7.jpg)
T0 T1
W R/W
One way to eliminate this race
UNLOCK
LOCK
UNLOCK
LOCK
![Page 8: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/8.jpg)
T0 T1
W R/W
One way to eliminate this race
UNLOCK
LOCK
UNLOCK
LOCK
![Page 9: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/9.jpg)
Another way to eliminate this race
T0 T1
W R/W
![Page 10: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/10.jpg)
Another way to eliminate this race
T0 T1
W R/W
RELEASE
ACQUIRE
Signal using `special’ variables
• Java ‘volatile’ annotations• NOT C ‘volatiles’ !
• C++11 ’atomic’ annotations
![Page 11: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/11.jpg)
A third way
T0 T1
W R/W
![Page 12: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/12.jpg)
A third way
T0 T1
W R/W
Put a barrier
![Page 13: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/13.jpg)
Why eliminate races?
![Page 14: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/14.jpg)
Popular answer: “avoid nondeterminism”
T0 T1
X = 0 t = X
![Page 15: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/15.jpg)
Unclear what “nondeterminism” means..
![Page 16: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/16.jpg)
Execution Order is Still Nondeterministic
T0 T1
X = 0 t = X
UNLOCK
LOCK
UNLOCK
LOCK
![Page 17: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/17.jpg)
More relevant: Avoid “pink elephants” !
![Page 18: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/18.jpg)
More relevant: Avoid “pink elephants” !
Pink elephant (Sutter) : “A value you never wrote but managed to read”
Aka ”out of thin air” value
![Page 19: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/19.jpg)
The birth of a pink elephant…
T0 T1
X = 0 t = X
T0 T1
X = 24 t = X
Compiler Optimizations
t is 0 here 24
read here
You may
never have
written “24”
in your program
![Page 20: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/20.jpg)
Details of how a pink elephant is made!
T0 T1
X = 0 t = X
Y = 23
X = Y + 1
T0 T1
t = X
Y = 23
The compiler has NO IDEA thatthe user meant tocommunicate here !!
Compiler optimizationscreate thesepink-elephant
values…
24read here
X = 24
![Page 21: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/21.jpg)
This is why code containing data races
often fail (only) when optimized!
![Page 22: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/22.jpg)
Race-freedom ensures intended communications
T0 T1
W R/W
• You don’t observe
“half baked” values
• Code does not reorder
around sync. points
• No “word tearing”
• Pending writes flushed
(fences inserted)nly
![Page 23: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/23.jpg)
Exploding a myth!
There is no
such thing as a
benign race !!
![Page 24: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/24.jpg)
Races in OpenMP programs are hard to spot
• See#tinyurl.com/ompRaces if#you#wish#• but$later$!
• Static#analysis#tools#never#shown#to#work#well
• First#usable#OpenMPdynamic#race#checker#(afaik)• Archer$[Atzeni,$IPDPS’16]• More$on$that$soon
• This$talk$will#present#the#second#usable#dynamic#race#checker• Sword
![Page 25: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/25.jpg)
This talk: Why and how of another OMP race checker
![Page 26: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/26.jpg)
• HYDRA&porting&on&Sequoia&at&LLNL
• Large&multiphysicsMPI/OpenMPapplication
• Non@deterministic&crashes&in&OpenMP region
• Only&when&the&code&was&optimized!
• Suspected&data&race
• Emergency&hack:
• Disabled&OpenMP&in&Hypre
• Root@cause&found&by&Archer&:• two&threads&writing&0 to&a&common&location&without&synchronization
The Pink Elephant Actually Struck Us!
![Page 27: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/27.jpg)
Archer to the rescue!
![Page 28: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/28.jpg)
Archer [IPDPS’16]• Utah: Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir Rakamaric
• LLNL: Dong H. Ahn, Ignacio Laguna, Martin Schulz, Gregory L. Lee• RWTH: Joachim Protze, Matthias S. Muller
– In production use at LLNL
Part of the “PRUNERS” tool suite
PRUNERS was a finalist of the 2017 R&D 100 Award Selection
Archer to the rescue!
![Page 29: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/29.jpg)
Archer’s “find”
Two$threads$writing$0to$the$same$location$without$synchronization
![Page 30: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/30.jpg)
Archer’s “find”
Two$threads$writing$0to$the$same$location$without$synchronization
![Page 31: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/31.jpg)
Did we live “happily ever after?”
![Page 32: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/32.jpg)
No !
![Page 33: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/33.jpg)
Archer has “memory-outs”; also misses races
![Page 34: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/34.jpg)
• Archer&increases&memory&500%• It&also&misses&races!
• These&were&known&issues• Finally'surfaced'with'the'”right'large'example”
• Root9cause'found'by'Archer':• two'threads'writing'0 to'a'common'location'without'synchronization
Archer has “memory-outs”; also misses races
![Page 35: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/35.jpg)
Reason: Archer employs “shadow cells”
Core 0 Core 1 Core 2 Core 3
ss0
ss1
ss2
ss3
A0
ss0
ss1
ss2
ss3
A1
ss0
ss1
ss2
ss3
Amax
….
A programmable
number of cells
per address
(4 shown, and is
typical)
![Page 36: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/36.jpg)
~4 shadow cells per application location
Core 0 Core 1 Core 2 Core 3
ss0
ss1
ss2
ss3
A0
ss0
ss1
ss2
ss3
A1
ss0
ss1
ss2
ss3
Amax
….
A programmable
number of cells
per address
(4 shown, and is
typical)
Shadow-cells immediately increase memory demand by a factor of four
![Page 37: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/37.jpg)
Archer misses races due to shadow cell eviction
![Page 38: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/38.jpg)
Archer misses races due to shadow cell eviction
Core
0
Core
1
Core
2
Core
3
ss0
ss1
ss2
ss3
A0
ss0
ss1
ss2
ss3
A1
ss0
ss1
ss2
ss3
Amax
….
![Page 39: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/39.jpg)
Core
0
Core
1
Core
2
Core
3
ss0
ss1
ss2
ss3
A0
ss0
ss1
ss2
ss3
A1
ss0
ss1
ss2
ss3
Amax
….
All threads read a[3]Thread 3 writes a[3]All threads read A[3]
Thread 3 writes A[3]
Archer misses races due to shadow cell eviction
![Page 40: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/40.jpg)
Capacity conflict ! evict shadow cell
Core 0 Core 1 Core 2 Core 3
ss0
ss1
ss2
ss3
A0
ss0
ss1
ss2
ss3
A1
ss0
ss1
ss2
ss3
Amax
….
With shadow-cell evicted, races are missed
![Page 41: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/41.jpg)
Archer misses races due to HB-masking
![Page 42: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/42.jpg)
Archer misses races due to HB-masking
These are
concurrent;
there are two
races here!
These races
are missed
in this
interleaving!
![Page 43: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/43.jpg)
Solution : Get rid of shadow cells !!
![Page 44: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/44.jpg)
Offline Analysis
Core 0 Core 1 Core 2 Core 3
Need New Approach with Online/Offline split
RaceReports
Compression Compression Compression Compression
![Page 45: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/45.jpg)
Details of the online phase
Core 0 Core 1 Core 2 Core 3
• Collect'traces'per'core'un#coordinated• Trace-collection-speeds-increased;-we-use-the-OMPT-tracing-method
• Employ-data-compression-to-bring-FULL-traces-out• Only'2.5'MB'compression'buffer'per'thread'(fits-in-L3-cache)
Compression Compression Compression Compression
![Page 46: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/46.jpg)
Consequences for the offline phase
Core 0 Core 1 Core 2 Core 3
•We#would#have#lost#all#the#synchronization#information• We#only#know#what#each#thread#is#doing
•We#must#recover#the#concurrency#structure• And#in#the#context#of#its#happens;before#order,#detect#races!
Compression Compression Compression Compression
![Page 47: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/47.jpg)
Offline synchronization recovery and analysis
0 - [0,1]
1 - [0,1][0,2] 2 - [0,1][1,2]
3 - [0,1][0,2][0,2] 4 - [0,1][0,2][1,2]
7 - [0,1][2,2]
5 - [0,1][1,2][0,2] 6 - [0,1][1,2][1,2]
11 - [0,1][3,2]
12 - [1,1]
8 - [0,1][2,2][0,2] 9 - [0,1][2,2][1,2]
10 - [0,1][4,2]
IBarrier(3)
Barrier(1)read(x)write(y)
write(x)m_acq()
m_rel()
read(y)m_acq(M1)
m_rel(M1)IBarrier(4)
Barrier(2)
write(y)m_acq(M1)
m_rel(M1)
write(x)m_acq()
m_rel()
IBarrier(6)
FOR-LOOP
IBarrier(7)
R1: race on y
R2: race on y
R3: race on x
IBarrier(5)
Core 0 Core 1 Core 2 Core 3
Compression Compression Compression Compression
OpSem(HIPS’18)
![Page 48: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/48.jpg)
Offset-Span Labels: How we record concurrency(Mellor-Crummey, 1991)
![Page 49: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/49.jpg)
Key state in OpSem: Maintain Barrier Intervals
0 - [0,1]
1 - [0,1][0,2] 2 - [0,1][1,2]
3 - [0,1][0,2][0,2] 4 - [0,1][0,2][1,2]
7 - [0,1][2,2]
5 - [0,1][1,2][0,2] 6 - [0,1][1,2][1,2]
11 - [0,1][3,2]
12 - [1,1]
8 - [0,1][2,2][0,2] 9 - [0,1][2,2][1,2]
10 - [0,1][4,2]
IBarrier(3)
Barrier(1)read(x)write(y)
write(x)m_acq()
m_rel()
read(y)m_acq(M1)
m_rel(M1)IBarrier(4)
Barrier(2)
write(y)m_acq(M1)
m_rel(M1)
write(x)m_acq()
m_rel()
IBarrier(6)
FOR-LOOP
IBarrier(7)
IBarrier(5)
Barrier&Interval&1
Barrier&Interval&3
Barrier&Interval&2
Barrier&Interval&5
![Page 50: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/50.jpg)
Examples of Races Reported
0 - [0,1]
1 - [0,1][0,2] 2 - [0,1][1,2]
3 - [0,1][0,2][0,2] 4 - [0,1][0,2][1,2]
7 - [0,1][2,2]
5 - [0,1][1,2][0,2] 6 - [0,1][1,2][1,2]
11 - [0,1][3,2]
12 - [1,1]
8 - [0,1][2,2][0,2] 9 - [0,1][2,2][1,2]
10 - [0,1][4,2]
IBarrier(3)
Barrier(1)read(x)write(y)
write(x)m_acq()
m_rel()
read(y)m_acq(M1)
m_rel(M1)IBarrier(4)
Barrier(2)
write(y)m_acq(M1)
m_rel(M1)
write(x)m_acq()
m_rel()
IBarrier(6)
FOR-LOOP
IBarrier(7)
R1: race on y
R2: race on y
R3: race on x
IBarrier(5)
Barrier&Interval&3
Race&within&same&barrier&interval
![Page 51: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/51.jpg)
Examples of Races Reported
0 - [0,1]
1 - [0,1][0,2] 2 - [0,1][1,2]
3 - [0,1][0,2][0,2] 4 - [0,1][0,2][1,2]
7 - [0,1][2,2]
5 - [0,1][1,2][0,2] 6 - [0,1][1,2][1,2]
11 - [0,1][3,2]
12 - [1,1]
8 - [0,1][2,2][0,2] 9 - [0,1][2,2][1,2]
10 - [0,1][4,2]
IBarrier(3)
Barrier(1)read(x)write(y)
write(x)m_acq()
m_rel()
read(y)m_acq(M1)
m_rel(M1)IBarrier(4)
Barrier(2)
write(y)m_acq(M1)
m_rel(M1)
write(x)m_acq()
m_rel()
IBarrier(6)
FOR-LOOP
IBarrier(7)
R1: race on y
R2: race on y
R3: race on x
IBarrier(5)
Barrier&Interval&3
Races&across¶llel®ions
Barrier&Interval&2
Barrier&Interval&5
![Page 52: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/52.jpg)
Good news
•Online&analysis&proved&really&good•No#memory#pressure#!!
![Page 53: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/53.jpg)
Bad news
Offline'analysis'took$a$day$to$$finish$on“medium$sized”$examples
![Page 54: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/54.jpg)
Two Key Innovations Saved the Approach
• Self%balancing,red%black interval,trees
• On%the%fly,generation,of,Integer,Linear,Programs
![Page 55: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/55.jpg)
• Decompress,*record*strided accesses)in*self0balancing*red0black interval*trees
• Generate*Integer*Linear*Programs*on0the0fly,*and*check*for*overlaps• Handles)bursts)of)accesses)efficiently
Core 0 Core 1 Core 2 Core 3
Reducing “a day” to “under a minute”
Compression Compression Compression Compression
![Page 56: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/56.jpg)
OMP read/writes are bursty with strides!
![Page 57: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/57.jpg)
OMP read/writes are bursty with strides!
Each of this is a multi-word access
Build Integer Linear Programs for each constant-stride intervalILP system encodes accessed byte-addresses in each “burst”
![Page 58: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/58.jpg)
Overlap of Access Bursts: ILP Generation!
![Page 59: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/59.jpg)
Interval Trees to record accesses[335820,335820],1R,4,4208860
[335820,335820],1W,4,4208658
[335824,335824],1W,4,4208639
[335816,335816],1W,4,4208677
[335820,335820],1R,4,4208822
[335820,335820],1W,4,4208884
[335920,335920],1R,4,4208736
[335812,335812],1W,4,4208696
[335820,335820],1R,4,4208926
[335824,335824],1R,4,4208902
[337888,339884],500R,4,4208985
[337892,339888],500W,4,4209028
• Recorded info is: [Begin, End], #Accesses, Kind, Stride, AtWhichPCValue
• Allows efficient comparison of access bursts across threads
• These Red-Black trees are highly tuned
• Used within Linux to realize fair scheduling methods
![Page 60: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/60.jpg)
Concluding Remarks: Sword is now practical!
Both%Archer%and%Sword%are%available
Github.com /%PRUNERS
![Page 61: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/61.jpg)
Conclusions: Time for “Medium” Examples
Online Offline Total Efficacy
Archer 1 0 1 Misses races
Sword 1 10* 11Finds all races within the execution**
* : can be brought down to 1 by using an MPI cluster** : we define the formal semantics of OMP race checking [HIPS’18]
![Page 62: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/62.jpg)
Online Offline Total Efficacy
Archer 1 0 1 Misses races
Sword 1 10* 11
Finds all
races within
the
execution**
Conclusions: Time for Larger Examples
Memory
![Page 63: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/63.jpg)
• Sword&works&well&;&finds&more&races&than&Archer• Applied&to&realistic&benchmarks
• Archer&test&suite• RaceBench from&LLNL
• Offline&analysis&can&be¶llelized• Still&“decent”&on&standard&multicore&platforms
• It&took&many%ideas%working&together&to&realize&Sword• Formal&semantics&of&OpenMPConcurrency• Online&/&Offline&checking&split• Data&compression• SelfAbalancing&interval&trees• ILPAsystems&to&compress&traces
• Employs&standard&tracing&methods&based&on&OMPT
More Concluding Remarks
![Page 64: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/64.jpg)
• Continue to(debug(/(tune(Sword• Incorporate ideas(from(upcoming(pubs• GPU(race checking
Future Work
![Page 65: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/65.jpg)
Group Credits
Simone Zvonimir Dong Ignacio Greg
![Page 66: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/66.jpg)
Extras
![Page 67: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/67.jpg)
• High-level code is just “fiction”• Code%optimizations%are%done%on%a%PER%THREAD%basis• Races%occur%if%you%don’t%tell%a%compiler%what’s%shared
while(!f)%{}%%%%%! r%=%f;%%while%(!r)%{}%%%%%:%this%is%OK%if%“f”%is%purely%local
while(!f)%{}%%%%%! r%=%f;%%while%(!r)%{}%%%%%:%not%OK%if%f is%shared%and%you%don’t%tellthis%to%the%compiler
• How to inform a compiler• Put%the%variables%inside%a%mutex (or%other%synchronization%block)• Declare%them%to%be%a%Java%volatile%or%C++11%atomic
• C-volatiles won’t do (they don’t have a definite concurrency semantics)
Data Races: Gist
![Page 68: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/68.jpg)
• High-level code is just “fiction”• Code%optimizations%are%done%on%a%PER%THREAD%basis• Races%occur%if%you%don’t%tell%a%compiler%what’s%shared
while(!f)%{}%%%%%! r%=%f;%%while%(!r)%{}%%%%%:%this%is%OK%if%“f”%is%purely%local
while(!f)%{}%%%%%! r%=%f;%%while%(!r)%{}%%%%%:%not%OK%if%f is%shared%and%you%don’t%tellthis%to%the%compiler
Data Races: Gist
![Page 69: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way](https://reader033.vdocuments.us/reader033/viewer/2022053118/609eda201ff7c52fd65931fd/html5/thumbnails/69.jpg)
GPUs races also can lead to “pink-elephants”
Analogy due to Herb Sutter
__global__'void'kernel(int*'x,'int*'y)'{'''int'index'='threadIdx.x;'
''y[index]'='x[index]'+'y[index];'
''if'(index'!='63'&&'index'!='31)'''''y[index+1]'='1111;'
}'
Ini$ally(:(x[i](==(y[i](==(i(
Warp1size(=(32(
The'hardware'schedules'these'instrucKons'in'“warps”'(SIMD'groups).''
However,'this'“warp'view”'oSen'appears'to'be'lost'
E.g.'When'compiling'with'opKmizaKons'
Expected(Answer:(0,(1111,(1111,(…,(1111,(64,(1111,(…('
New(Answer:(0,(2,(4,(6,(8,(…'