software transactional memory for large scale clusters...

76
Software Transactional Memory for Large Scale Clusters Robert L. Bocchino Jr. and Vikram S. Adve University of Illinois at Urbana-Champaign Bradford L. Chamberlain Cray Inc.

Upload: nguyenliem

Post on 14-Mar-2018

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Software Transactional Memory for Large Scale Clusters

Robert L. Bocchino Jr. and Vikram S. AdveUniversity of Illinois at Urbana-Champaign

Bradford L. ChamberlainCray Inc.

Page 2: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transactional Memory (TM)

Can simplify parallel programming

Well studied for small-scale, cache-coherent platforms

No prior work on TM for large scale platforms

• Potentially thousands of processors

• Distributed memory, no cache coherence

• Slow communication between nodes

2

Page 3: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Why STM On Clusters?

TM is a natural fit for PGAS Languages

• UPC, CAF, Titanium, Chapel, Fortress, X10, ...

• Address space is global (unlike message passing)

• But data distribution is explicit (unlike cc-NUMA, DSM)

Commodity clusters are in widespread use

Software transactional memory (STM) is natural choice

• Communication done in software anyway

• Could leverage hardware TM support if it exists

3

Page 4: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

What’s New About STM on Clusters?

Classic STM Cluster STM

4

Page 5: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

What’s New About STM on Clusters?

Classic STM Cluster STM

4

Primary overhead Extra scalar ops Extra remote ops

Page 6: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

What’s New About STM on Clusters?

Classic STM Cluster STM

4

Read and write Words of data Blocks of data

Primary overhead Extra scalar ops Extra remote ops

Page 7: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

What’s New About STM on Clusters?

Classic STM Cluster STM

4

Heap address space Uniform Partitioned

Read and write Words of data Blocks of data

Primary overhead Extra scalar ops Extra remote ops

Page 8: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

What’s New About STM on Clusters?

Classic STM Cluster STM

4

STM metadata Uniform Distributed

Heap address space Uniform Partitioned

Read and write Words of data Blocks of data

Primary overhead Extra scalar ops Extra remote ops

Page 9: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

What’s New About STM on Clusters?

Classic STM Cluster STM

4

Distributing computation for data locality N/A on p { ... }

STM metadata Uniform Distributed

Heap address space Uniform Partitioned

Read and write Words of data Blocks of data

Primary overhead Extra scalar ops Extra remote ops

Page 10: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Research Contributions

First STM designed for high performance on large clusters

• Block data movement

• Computation migration

• Distributed metadata

Experimental evaluation of prototype

• Performance vs. locks

• New design tradeoffs

Decomposition of STM design space into eight axes

5

Page 11: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Outline

Interface Design

Algorithm Design

Evaluation

• Cluster STM vs. Manual Locking

• Read Locks vs. Read Validation

Conclusion

6

Page 12: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Interface Design

Compiler target, not primarily for programmers

Correct use guarantees serializability of transactions

7

Page 13: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Interface Design

Compiler target, not primarily for programmers

Correct use guarantees serializability of transactions

atomic {// Array assignmentcache = A;compute(cache);A = cache;

}

Chapel

7

Page 14: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Interface Design

Compiler target, not primarily for programmers

Correct use guarantees serializability of transactions

atomic {// Array assignmentcache = A;compute(cache);A = cache;

}

Chapel

stm_start(...)stm_get(&cache, &A,...)compute(&cache);stm_put(&A, &cache, ...);stm_commit(...);

Cluster STM API

compiler

7

Page 15: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Interface Summary

Transaction start and commit

Transactional memory allocation

Block data movement to and from transactional store

Remote execution of transactional computations

8

Page 16: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Block Data Movement

p1tx

non-

txp2

A

B

CAll ops occur on p1

9

Page 17: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Block Data Movement

p1tx

non-

txp2

stm_get(work_proc=p2,src=A, dest=B, size=n, ...)

A

B

CAll ops occur on p1

9

Page 18: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Block Data Movement

p1tx

non-

txp2

stm_put(work_proc=p2,src=B, dest=A, size=n, ...)

stm_get(work_proc=p2,src=A, dest=B, size=n, ...)

A

B

CAll ops occur on p1

9

Page 19: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Block Data Movement

p1tx

non-

txp2

stm_read(src=C, dest=B, size=n, ...)

stm_put(work_proc=p2,src=B, dest=A, size=n, ...)

stm_get(work_proc=p2,src=A, dest=B, size=n, ...)

A

B

CAll ops occur on p1

9

Page 20: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Block Data Movement

p1tx

non-

txp2

stm_read(src=C, dest=B, size=n, ...)

stm_put(work_proc=p2,src=B, dest=A, size=n, ...)

stm_write(src=B, dest=C, size=n, ...)

stm_get(work_proc=p2,src=A, dest=B, size=n, ...)

A

B

CAll ops occur on p1

9

Page 21: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work for Exploiting Locality

p1 p2

10

Page 22: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work for Exploiting Locality

p1 p2

remoteputs & gets

10

Page 23: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work for Exploiting Locality

p1 p2

10

Page 24: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work for Exploiting Locality

p1 p2

localreads & writes

10

Page 25: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work for Exploiting Locality

p1 p2

stm_on(work_proc=p2,function=f, ...)

localreads & writes

10

Page 26: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work for Exploiting Locality

p1

Chapel: on p2 { f(...); }

p2

stm_on(work_proc=p2,function=f, ...)

localreads & writes

10

Page 27: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work Inside Transaction

p1

atomic { on p2 { f(...); } }

p2

11

Page 28: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work Inside Transaction

p1

atomic { on p2 { f(...); } }

p2

stm_start(src_proc=p1)

11

Page 29: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work Inside Transaction

p1

atomic { on p2 { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...)

stm_start(src_proc=p1)

11

Page 30: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work Inside Transaction

p1

atomic { on p2 { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...)

stm_start(src_proc=p1)

stm_read(src_proc=p1)

11

Page 31: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work Inside Transaction

p1

atomic { on p2 { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...)

stm_start(src_proc=p1)

stm_read(src_proc=p1)stm_write(src_proc=p1)

11

Page 32: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work Inside Transaction

p1

atomic { on p2 { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...)

stm_start(src_proc=p1)

stm_commit(src_proc=1)

stm_read(src_proc=p1)stm_write(src_proc=p1)

11

Page 33: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work Inside Transaction

p1

atomic { on p2 { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...)

stm_start(src_proc=p1)

stm_commit(src_proc=1)

stm_read(src_proc=p1)stm_write(src_proc=p1)

localops

11

Page 34: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Remote Work Inside Transaction

p1

atomic { on p2 { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...)

stm_start(src_proc=p1)

stm_commit(src_proc=1)

stm_read(src_proc=p1)stm_write(src_proc=p1)

single commit message

localops

11

Page 35: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction Inside Remote Work

p1

on p2 { atomic { f(...); } }

p2

12

Page 36: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction Inside Remote Work

p1

on p2 { atomic { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...)

12

Page 37: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction Inside Remote Work

p1

on p2 { atomic { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...)

stm_start(src_proc=p1)

12

Page 38: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction Inside Remote Work

p1

on p2 { atomic { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...) stm_read(src_proc=p1)

stm_start(src_proc=p1)

12

Page 39: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction Inside Remote Work

p1

on p2 { atomic { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...) stm_read(src_proc=p1)

stm_write(src_proc=p1)

stm_start(src_proc=p1)

12

Page 40: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction Inside Remote Work

p1

on p2 { atomic { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...) stm_read(src_proc=p1)

stm_write(src_proc=p1)

stm_start(src_proc=p1)

stm_commit(src_proc=p1)

12

Page 41: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction Inside Remote Work

p1

on p2 { atomic { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...) stm_read(src_proc=p1)

stm_write(src_proc=p1)

stm_start(src_proc=p1)

stm_commit(src_proc=p1)

12

Page 42: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction Inside Remote Work

p1

on p2 { atomic { f(...); } }

p2

stm_on(src_proc=p1, work_proc=p2,function=f, ...) stm_read(src_proc=p1)

stm_write(src_proc=p1)

stm_start(src_proc=p1)

stm_commit(src_proc=p1)

localops

12

Page 43: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Outline

Interface Design

Algorithm Design

Evaluation

• Cluster STM vs. Manual Locking

• Read Locks vs. Read Validation

Conclusion

13

Page 44: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Algorithm Design

Shared metadata

Transaction-local metadata

STM design choices

14

Page 45: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Shared Metadata

Some metadata must be visible to all transactions

• Read and write locks

• Validation ID

Memory overhead compromise

• Each metadata word guards one conflict detection unit (CDU)

• CDU size is s ≥ 1 words

• s > 1 may introduce false sharing

15

Page 46: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Shared Metadata

Some metadata must be visible to all transactions

• Read and write locks

• Validation ID

Memory overhead compromise

• Each metadata word guards one conflict detection unit (CDU)

• CDU size is s ≥ 1 words

• s > 1 may introduce false sharing

Keep metadata on same processor as corresponding CDU

15

Page 47: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction-Local Metadata

p1 p2

stm_on(src_proc=p1)stm_start(src_proc=p1)

stm_commit(src_proc=p1)

stm_read(src_proc=p1)

stm_write(src_proc=p1)

16

Page 48: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction-Local Metadata

p1 p2

stm_on(src_proc=p1)stm_start(src_proc=p1)

stm_commit(src_proc=p1)

stm_read(src_proc=p1)

stm_write(src_proc=p1)

p2 stores tx metadata for p1

16

Page 49: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction-Local Metadata

p1 p2

stm_on(src_proc=p1)stm_start(src_proc=p1)

stm_commit(src_proc=p1)

stm_read(src_proc=p1)

stm_write(src_proc=p1)

p2 uses stored metadata to commit on

behalf of p1

p2 stores tx metadata for p1

16

Page 50: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Transaction-Local Metadata

p1 p2

stm_on(src_proc=p1)stm_start(src_proc=p1)

stm_commit(src_proc=p1)

stm_read(src_proc=p1)

stm_write(src_proc=p1)

p2 uses stored metadata to commit on

behalf of p1

p2 stores tx metadata for p1

Keep metadata local to processor where access occurred

16

Page 51: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

STM Design Choices

Eight-axis design space (see paper)

Choose four axes to explore

• CDU size

• Read locks (RL) vs. read validation (RV)

• Undo log (UL) vs. write buffer (WB)

• Early acquire (EA) vs. late acquire (LA)

17

Page 52: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

STM Design Choices

Eight-axis design space (see paper)

Choose four axes to explore

• CDU size

• Read locks (RL) vs. read validation (RV)

• Undo log (UL) vs. write buffer (WB)

• Early acquire (EA) vs. late acquire (LA)

Some tradeoffs come out differently on clusters

17

Page 53: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

STM Design Choices

Eight-axis design space (see paper)

Choose four axes to explore

• CDU size

• Read locks (RL) vs. read validation (RV)

• Undo log (UL) vs. write buffer (WB)

• Early acquire (EA) vs. late acquire (LA)

Some tradeoffs come out differently on clusters

Discuss RL vs. RV here

17

Page 54: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

STM Design Choices

Eight-axis design space (see paper)

Choose four axes to explore

• CDU size

• Read locks (RL) vs. read validation (RV)

• Undo log (UL) vs. write buffer (WB)

• Early acquire (EA) vs. late acquire (LA)

Some tradeoffs come out differently on clusters

Discuss RL vs. RV here

See paper for additional results

17

Page 55: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Outline

Interface Design

Algorithm Design

Evaluation

• Cluster STM vs. Manual Locking

• Read Locks vs. Read Validation

Conclusion

18

Page 56: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Evaluation

Benchmarks

• Micro Benchmarks: Intset, Hashmap Swap

• Graph clustering: SSCA 2 Kernel 4

Machine

• Intel Xeon cluster

• Two cores per node

• Myrinet

19

Page 57: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Outline

Interface Design

Algorithm Design

Evaluation

• Cluster STM vs. Manual Locking

• Read Locks vs. Read Validation

Conclusion

20

Page 58: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

1

10

100

1000

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

Locks, 6M operationsSTM, 6M operations

Locks, 100M operationsSTM, 100M operations

Intset Results

21

Page 59: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

1

10

100

1000

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

Locks, 6M operationsSTM, 6M operations

Locks, 100M operationsSTM, 100M operations

Intset Results

Switching problem sizes at p = 8

21

Page 60: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

1

10

100

1000

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

Locks, 6M operationsSTM, 6M operations

Locks, 100M operationsSTM, 100M operations

Intset Results

21

Page 61: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

1

10

100

1000

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

Locks, 6M operationsSTM, 6M operations

Locks, 100M operationsSTM, 100M operations

Intset Results

Bump at p < 4

21

Page 62: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

1

10

100

1000

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

Locks, 6M operationsSTM, 6M operations

Locks, 100M operationsSTM, 100M operations

Intset Results

21

Page 63: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

1

10

100

1000

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

Locks, 6M operationsSTM, 6M operations

Locks, 100M operationsSTM, 100M operations

Intset Results

Good scaling after p = 4Locks, STM about the same

21

Page 64: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

SSCA 2 Results

0.1

1

10

100

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

locks STM22

Page 65: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

SSCA 2 Results

0.1

1

10

100

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

locks STM

Bump is smaller

22

Page 66: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

SSCA 2 Results

0.1

1

10

100

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

locks STM22

Page 67: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

SSCA 2 Results

0.1

1

10

100

1 2 4 8 16 32 64 128 256 512

Tim

e (

seconds, lo

g s

cale

)

Number of processors (log scale)

locks STM

Good scaling after p = 4STM overhead about 2.5x

22

Page 68: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Outline

Interface Design

Algorithm Design

Evaluation

• Cluster STM vs. Manual Locking

• Read Locks vs. Read Validation

Conclusion

23

Page 69: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Implementation

Read locks (RL)

• Metadata word holds write bit and reader count

• Abort on attempt to read or write conflicting CDU

Read validation (RV)

• Metadata word holds write bit and validation ID

• Increment ID with each write

• Abort at commit if ID changes after CDU is read

24

Page 70: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Implementation

Read locks (RL)

• Metadata word holds write bit and reader count

• Abort on attempt to read or write conflicting CDU

Read validation (RV)

• Metadata word holds write bit and validation ID

• Increment ID with each write

• Abort at commit if ID changes after CDU is read

Validation requires extra remote op at commit

24

Page 71: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Implementation

Read locks (RL)

• Metadata word holds write bit and reader count

• Abort on attempt to read or write conflicting CDU

Read validation (RV)

• Metadata word holds write bit and validation ID

• Increment ID with each write

• Abort at commit if ID changes after CDU is read

Validation requires extra remote op at commit

No global cache ⇒ no helpful cache effects

24

Page 72: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Ratio of RV to RL Runtimes

0

1

2

3

4

1 2 4 8 16 32 64 128 256 512

Intset SSCA 2

Number of processors

Rat

io

25

Page 73: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Results Summary

Good scalability to p = 512

Good performance

• Nearly identical to locks on micro benchmarks

• Some STM overhead for SSCA2

Different design tradeoffs

• RL outperforms RV

• EA outperforms LA (see paper)

• Minimal penalty for WB (see paper) 

26

Page 74: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Outline

Interface Design

Algorithm Design

Evaluation

Conclusion

27

Page 75: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Conclusion

Presented design and evaluation of Cluster STM

• First STM for high performance on large scale clusters

• Good performance, scaling to p = 512

• New evaluation of design tradeoffs

Future work

• Exploiting shared memory within a node

• Nested parallelism in a transaction

• Dynamic spawning of threads

28

Page 76: Software Transactional Memory for Large Scale Clusters …research.ihost.com/ppopp08/presentations/bocchino.pdf · Software Transactional Memory for Large Scale Clusters Robert L

Thank You

29