extended memory semantics for thread synchronization sheng li, ying zhou operating system progress...
Post on 19-Dec-2015
218 Views
Preview:
TRANSCRIPT
Extended Memory Semantics for Thread Synchronization
Extended Memory Semantics for Thread Synchronization
Sheng Li, Ying Zhou
Operating System Progress Report
Nov 1st, 2007
Sheng Li, Ying Zhou
Operating System Progress Report
Nov 1st, 2007
Nov 1st, 2007 2
Problems
Hardware multithreading is no longer a privilege of supercomputing, it is already part of the major microprocessors.
E.g. In Sun Niagara 2 has 64 threads/chip and 256 threads/server.
Concurrency management is one of the biggest challenges in multithreaded system
Key requirement: Low overhead and scalable thread synchronization
Synchronization mechanisms
Atomic primitives (Test-and-Set, Compare-and-Swap, LL-SC) Software routines built on them have poor performance
and scalability
Empty/Full bits, using extension bit for each memory location to denote the empty/full state.
Better performance [1], but still not enough
Nov 1st, 2007 3
Our Goal
Solve the synchronization bottleneck by using Extended Memory Semantics
Better performance and scalability
Quantify the performance gain when using EMS, compared to other synchronization mechanisms (e.g Empty/Full bits)
Nov 1st, 2007 4
Extended Memory Semantics
Memory instructions are characterized synchronization behavior.
Load.ff, Load.fe, Store.xf, Store.ef, Store.xe. (F--- Full, e---empty, x---don’t care)
64 bits of data/metadata
Extension bit
Nov 1st, 2007 5
EMS handler
There is no free lunch… EMS handler has overhead Creating the handler threads
To queue up memory requests, to build the data structure
Nov 1st, 2007 6
What we have done so far
Build the EMS model on both architecture and OS aspects in the Structural Simulation Toolkit (SST)
SST is the simulation environment for massively lightweight multithreading , developed at Notre Dame and Sandia Lab
Modified the glibc to use EMS
Especially pthread library
Design benchmarks for different categories
Run the simulations to evaluate EMS performance
Nov 1st, 2007 7
Tightly Coupled Parallel
Each thread competes with the others for the only lock before updating the counter
Very high contention, worst case
Nov 1st, 2007 8
Loosely Coupled Parallel
Each thread competes locks with the others before updating the counters.
Mild contention
Nov 1st, 2007 10
Embarrassingly parallel and loosely coupled parallel
Low synchronization overhead--- guaranteed by EMS
EMS shows very good scalability
32 64 128 256 512 10240
5
10
15
20
25
30
35
Sp
eed
up
Number of threads
Ideal Gene-Embarrassingly Parallel Gene-Loosely Coupled
~5.1G synchroniztion operations
4.78%11%
84.2%
Non-Contention Hardware Supported Software Supported (EMS Handler)
~2.5G synchroniztion Operations
Synchronization distribution
Nov 1st, 2007 11
Tightly Coupled Parallel
Bad performance for EMS in the worst case
Most of threads are used for synchronization, not for real job
1 2 4 8 16 32 64 128
105
106
107
Ex
ec
uti
on
Tim
e (
Cy
cle
s)
Number of Competing Threads
Serial Parallel Using EMS
1 2 4 8 16 32 64 128
0
200
400
600
800
1000
1200
1400
1600
Nu
mb
er o
f th
read
s
Number of competing threads
EMS handlers Total Threads
Nov 1st, 2007 12
The Road Ahead
Build/complete other synchronization mechanisms (e.g. Empty/Full bits and etc) into SST
Modify glibc to make it support for other synchronization mechanisms
Compare performance between EMS and other synchronization mechanisms
Nov 1st, 2007 14
Bibliography
[1] Performance and Programming Experience on the Tera MTA, Larry Carter, John Feo, Allan Snavely, PPSC, 1999
Nov 1st, 2007 16
Lightweight Threads
Thread context (frame) is 32 double words (256 bytes) Two double words are reserved for the thread status; 30
general purpose registers.
No other per thread state, easy for multithreading .
Frames are stored in memory (No Register File) Registers are aliases for memory locations
top related