improved single global lock fallback for best-effort hardware transactional memory

Post on 22-Feb-2016

64 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory. TRANSACT 2014. Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy. Multicore Performance Scaling. 2. Hardware Transactional Memory (HTM). - PowerPoint PPT Presentation

TRANSCRIPT

Improved Single Global Lock Fallback for Best-effort Hardware

Transactional Memory

Irina CalciuJustin GottschlichTatiana Shpeisman

Gilles PokamMaurice Herlihy

TRANSACT 2014

Multicore Performance Scaling

2

Intel’s Haswell TSX: RTM & HLE

3

Low overhead (cache based)

IBM’s Blue Gene/Q & System Z & Power Architecture

Hardware Transactional Memory (HTM)

Haswell RTM

if (_xbegin() == _XBEGIN_STARTED)

_xend()

Speculate Execution

Speculate Execution, without any locks

Read and Write Sets

4

Abort on memory conflict

else

Abort Handler

Haswell RTM

5

_xbegin()

_xend()

Read X

Write Y

Add to Read Set

Add to Write Set

_xbegin()

_xend()

Write X

Write YAdd to Write Set

Make the change to Y visibleCOMMIT

Add to Write SetABORT

if (_xbegin() == _XBEGIN_STARTED)

_xend()

Speculate Execution

Lock Elision

<HLE_Aquire_Prefix> Lock(L)

<HLE_Release_Prefix> Release(L)

Atomic region executed as a transaction or mutually exclusive on L

Execute optimistically, without any locks

Track Read and Write Sets

6

Abort on memory conflict: rollback acquire lock

[Anand Tech]7

Best-effort

OverflowUnsupported InstructionsInterrupts

Conflicts

8

Small & Medium Transactions

Haswell RTM

Needs software fallback

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

9

Try_SPEC:Wait until Lock is freeTransactional_Read(Lock)If Lock is taken ABORTSpeculate critical sectionEnd speculation

Single Global Lock HyTM (simple and common)

10

EndHW txn

BeginHW txnRead L

Begin SW txn

Acquire L

Release LEnd

SW txn

On_ABORT:If try_lock(Lock)

Critical sectionRelease(Lock)

Else Try_SPEC

Does not abort!

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txnRead L

EndHW txn

(1)

BeginHW txnRead L

EndHW txn

(2)

BeginHW txnRead L

BeginHW txnRead L

EndHW txn

(3) EndHW txn

(4)

XX

X

X

Legend: X = ABORT

Single Global Lock HyTM (simple and common)

Tim

e

11

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Acquire(L)

Release(L)

CRITICAL SECTION(SW TXN)

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Tim

e

Thread 1 Thread 2

Execution Time 1 12

Thread 1 Thread 2

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Acquire(L)

Release(L)

CRITICAL SECTION(SW TXN)

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Execution Time 1

Tim

e

Execution Time 2

13

Try_SPEC:Speculate critical sectionTransactional_Read(Lock)If Lock is taken ABORTEnd speculation

Lazy SGL

1414

Begin SW txn

Acquire L

Release LEnd

SW txn

On_ABORT:If try_lock(Lock)

Critical sectionRelease(Lock)

Else Try_SPEC

Does not abort!

Read LEnd

HW txn

BeginHW txn

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Read LEnd

HW txn(1)

BeginHW txn

Read LEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

XX

Legend: X = ABORT

COMMITCOMMIT

Lazy SGL

Tim

e

15

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

16

Transactional Memory Correctness

Transaction 1SW

Transaction 2HW

Tim

e

Order T2 AFTER T1

Order T2 BEFORE T1

COMMIT

COMMIT

17

Thread 1(SW)

Acquire Lock…

X = a

Release Lock

TXN_BEGIN

X = b…

TXN_END

Thread 2(HW)

Correct: a Actual: b

Tim

e

Case 1: HW begins SW begins HW ends SW ends

X value: a b

Check Lock

ABORT

Correct: a Actual: a

18

Acquire Lock…

X = a

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Thread 1(SW)

Thread 2(HW)

Case 2: SW beginsHW beginsHW endsSW ends

Correct: a Actual: b

Tim

e

Correct: a Actual: a

Check Lock

ABORT

X value:

19

Acquire Lock…

X = a…

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Case 3: SW beginsHW beginsSW endsHW ends

Thread 1(SW)

Thread 2(HW)

Tim

eX value: a b

Correct: b Actual: b

Check LockCOMMIT

20

Acquire Lock…

X = a…

Release Lock

TXN_BEGIN

X = b…

TXN_END

Case 4: HW beginsSW beginsSW endsHW ends

Thread 1(SW)

Thread 2(HW)

Tim

e

X value:Correct:

b Actual: b

Check Lock

COMMIT

21

22

Thread 1(SW)

X = 5; Y = 6Acquire Lock

…++X

++Y…

Release Lock

TXN_BEGIN

Z = 1/(Y-X)

TXN_END

Thread 2(HW)

Z = 1/0 !!!Tim

e

Hardware Sandboxing

Indirect Jumps

Thread 1(SW)

X = 5; Y = 6Acquire Lock

…++X

++Y…

Release Lock

_xbegin

if (X == Y) *p = garbagep()

…if (lock) abort_xend

Thread 2(HW)

_xend

Indirect jump to

garbage location

Tim

e

23

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

24

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

Ssca2 (small txns)

Threads

Spee

dup

1 2 4 80

0.51

1.52

2.53

3.54

Labyrinth (large txns)

Threads

Spee

dup

25

Intruder (medium txns)

1 2 4 80

0.5

1

1.5

2

2.5

3

TL2

SGL

HLE

E-SGL

L-SGL

Threads

Spee

dup

Better

Improved Lock Acquisition Rate

26

Vacation Low (medium txns)

Kmeans High (small txns)

Intruder (medium txns)

Labyrinth (large txns)

1 2 4 80

5

10

15

20

25

30

Threads

% lo

ck a

cqui

sitio

ns

1 2 4 80

10

20

30

40

50

60

70

Threads

% lo

ck a

cqui

sitio

ns

1 2 4 805

1015202530354045

HLEE-SGLL-SGL

Threads

% lo

ck a

cqui

sitio

ns

1 2 4 80

10

20

30

40

50

60

70

80

HLEE-SGLL-SGL

Threads

% lo

ck a

cqui

sitio

ns

Better

No single thread overhead

27

Slowdown relative to sequential for 1 thread

baye

s

geno

me

intrud

er

km_lo

w

km_h

igh

labyri

nth

vaca

tion_

low

vaca

tion_

high

ssca

2ya

da0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

TL2SGLHLEE-SGLL-SGLSl

owdo

wn

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

28

Bloom Filters

• Efficient probabilistic data structure to compute fast set intersection

• Can admit false positives

• No false negatives

• Used in TM for Conflict Detection

29

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Read LEnd

HW txn(1)

BeginHW txn

Read LEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

XX

Legend: X = ABORT

COMMITCOMMIT

Lazy SGL

Tim

e

30

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Check BFEnd

HW txn(1)

BeginHW txn

Check BFEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

Legend: X = ABORT

COMMITCOMMIT

BF SGL

Tim

e

31

Thread 1(SW)

Acquire Lock…

X = a

Release Lock

TXN_BEGIN

X = b…

TXN_END

Thread 2(HW)

Correct: a Actual: b

Tim

e

Case 1: HW begins SW begins HW ends SW ends

X value: a b

Check Lock

ABORT

Correct: a Actual: a

Check BF

If BFs intersect: ABORTElse: COMMIT

32

Acquire Lock…

X = a

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Thread 1(SW)

Thread 2(HW)

Case 2: SW beginsHW beginsHW endsSW ends

Correct: a Actual: b

Tim

e

Correct: a Actual: a

Check Lock

ABORT

X value:

Check BF

If BFs intersect: ABORTElse: COMMIT 33

Conclusions

• HTMs are becoming more available

• Best-effort – need software fallback

• Eager SGL • simple and fast fallback, • often preferred to more efficient solutions

34

Conclusions

• Lazy SGL • as simple as Eager SGL• more efficient

• Bloom Filter SGL • more accurate conflict detection• Slower

• Can be implemented directly in hardware

35

http://de.sap.info/wp-content/uploads/2012/02/In_Memory_Technologie.jpg

http://www.avoiceformen.com/wp-content/uploads/sites/2/2013/01/Questions.jpg

References

1 2 4 80

0.5

1

1.5

2

2.5

3

Intruder

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

Vacation Low

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

Vacation High

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

Genome

TL2SGLHLEHyswell

Threads

Spee

dup

38

Medium transactions

1 2 4 80

0.51

1.52

2.53

3.54

4.5

Kmeans Low

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.51

1.52

2.53

3.54

4.5

Kmeans High

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

Ssca2

TL2SGLHLEHyswell

Threads

Spee

dup

39

Small transactions

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

4

Bayes

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

4

Labyrinth

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.2

0.4

0.6

0.8

1

1.2

Yada

TL2SGLHLEHyswell

Threads

Spee

dup

40

Large transactions

bayes genome intruder kmeans low kmeans high

labyrinth ssca2 vacation low

vacation high

yada0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Speedup over sequential for 8 threads

TL2

SGL

HLE

Hyswell

41

  Software Hardware  (1) Read(x) Read(x) Not a conflict

(2)Read(x)  

Write(x)

Software transaction ordered before hardware transaction -> CORRECT

(3) 

Read(x)

Write(x) Hardware abort

(4)Write(x)

 

 

Read(x)

Software transaction ordered before hardware transaction -> CORRECT

(5) 

Write(x)

Read(x) Hardware abort

(6)Write(x)

 

 

Write(x)

Software transaction ordered before hardware transaction -> CORRECT

(7) 

Write(x)

Write(x) Hardware abort

42

top related