evaluating the impact of transactional characteristics on the [email protected],...
TRANSCRIPT
-
Introduction Methodology Performance Evaluation Conclusions References
Evaluating the Impact of TransactionalCharacteristics on the Performance of Transactional
Memory Applications
1Fernando Rui, 2Márcio Castro, 1Dalvan Griebler,1Luiz Gustavo Fernandes
Email: [email protected], [email protected],[email protected], [email protected]
1Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS - GMAP
2Universidade Federal do Rio Grande do Sul - UFRGS - INF
February 2014 1 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
Summary
1 Introduction
2 Methodology
3 Performance Evaluation
4 Conclusions
5 References
2 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
Introduction
1 MotivationMulti-coreApplications are not embarrassingly parallelTraditional synchronization structures (locks, mutexes andsemaphores)
Low-level mechanismsCause BlockingHard to manageVulnerable to failures and faults
3 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
Introduction
1 Transactional Memory (TM)High-level abstractionAllows to write parallel code as transactionsIn runtime detect conflicts and solve them
4 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
Introduction
1 Challenge of TM systemsWhat kind of applications can really take advantage of TM?Why some TM applications present low performance?
2 Contributions of this researchPerformance evaluation of the state-of-art STM systemsand applicationsExtend the analysis of [1], including the RSTM [2] systemWe find out characteristics that affect the performance TMWe identify bottlenecks of TM App. that limit their scalabilityWe show possible improvements to achieve betterperformance.
5 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
Methodology
1 Comparative Analysis1 Four state-of-the-art STM systems using the Stanford
Transactional Applications for Multi-Processing (STAMP)benchmark [3];
2 Evaluation of STM systems using EigenBench [1];3 We evaluate the impact of certain transactional
characteristics using EigenBench.2 Environment of Tests
All experiments were performed on a Dell PowerEdge R610machine with two quad-core Intel Xeon E5520 2.27 GHzprocessors with 8MB of L2 cache and 16GB of sharedmemory;All results are arithmetic means of at least 30 runs toguarantee a confidence level of 95%.
6 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
STM Systems Using STAMP Benchmark
1 STM SystemsTransactional Locking (TL2) [4]: second version of theoriginal TL;TinySTM [5]: uses shared counter as clock to control theconflicts between transactions and locks to protect sharedmemory locations;SwissTM [6]: its innovations is the hybrid conflict detectionscheme;Rochester Software Transactional Memory (RSTM) [2]:reduces cache misses by employing a single level ofindirection to access shared objects.
7 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
STM Systems Using STAMP Benchmark
1 Performance Evaluation
0
1
2
3
4
5
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
SwissTM
Speedups
0
1
2
3
4
5
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
RSTM
0
1
2
3
4
5
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
TinySTM
0
1
2
3
4
5
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
TL2
Applications
0.0
1.0
2.0
3.0
4.0
5.0
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
SwissTM
2 cores 4 cores 8 cores Legend
8 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
SwissTM vs. RSTM using EigenBench
1 Set-up:STM systems which presented better performance;STAMP applications with poor (ssca2), medium (intruderand vacation) and good (labyrinth and genome) scalability;The evaluation is based on speedup and aborts per commit(ApC).
2 EigenBench Input Parameters
Table: Applications characteristics from STAMP benchmark
Characteristic ssca2 intruder vacation labyrinth genome
Working-set Size 400 MB 20 MB 256 MB 16 MB 20 MBTransactional Lenght 3 24 226 357 88Pollution 33% 5% 2% 50% 5%Temporal Locality 0.33 0.52 0.59 0.77 0.58Contention 0.0005% 22% 0.2% 5% 0.5%Predominance Low Low High Low HighDensity High High High Low High
9 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
SwissTM vs. RSTM using EigenBenach (Cont.)
1 Performance Evaluation
0.0
1.0
2.0
3.0
4.0
5.0
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
SwissTM
2 cores 4 cores 8 cores Legend
0% 2% 4% 6% 8%
10% 12% 14% 16%
2 4 8 Number of cores
Aborts per commit
genome
intruder
labyrinth
ssca2
vacation
0 1 2 3 4 5 6 7 8
genome intruder labyrinth ssca2 vacation
Applications
Speedups SwissTM
RSTM
0 1 2 3 4 5 6 7 8
genome intruder labyrinth ssca2 vacation
Applications
Speedups
0%
1%
2%
3%
4%
5%
6%
2 4 8 Number of cores
Aborts per commit
genome
intruder
labyrinth
ssca2
vacation
0%
5%
10%
15%
20%
2 4 8 Number of cores
Aborts per commit
genome intruder labyrinth ssca2 vacation Legend
10 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
SwissTM vs. RSTM using EigenBenach (Cont.)
1 FindingsTM applications that use large amounts of memory did notpresent good performance, since STM systems need tokeep track of much more data to detect conflicts;The variation in terms of transaction lengths during theexecution is not well treated by most of the STM systems;Low degrees of predominance and density help TMapplications to perform better;High levels of ApC generally limit the performance of TMapplications.
11 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
Evaluating the Impact of Transactional Characteristics
0
1
2
3
4
5
Original V1 V2 V3 V4
Genome - Transactional Length
0
1
2
3
4
5
Original V1 V2 V3 V4
Intruder - Temporal Locality
0
1
2
3
4
5
Original V1 V2 V3 V4
Ssca2 - Working-set Size
0
1
2
3
4
5
Original V1 V2 V3 V4
Vacation - Working-set Size Speedups
Versions
0
1
2
3
4
5
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
SwissTM
Speedups
0
1
2
3
4
5
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
RSTM
0
1
2
3
4
5
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
TinySTM
0
1
2
3
4
5
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
TL2
Applications
0.0
1.0
2.0
3.0
4.0
5.0
baye
s
geno
me
intru
der
kmea
ns
labyri
nth
ssca
2
vaca
tion
yada
SwissTM
2 cores 4 cores 8 cores Legend
12 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
Conclusions
About this paperSome Characteristics drive the performance of TMapplications;Applications must be analysed carefully to identify relevantcharacteristics;
Future OpportunitiesWe intend to extend this work using some tracingmechanisms as proposed in [7];We intend to study the impact of the TM characteristics onthe performance of TM applications when executed on areal HTM processor such as the Intel Haswell.
13 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
References I
Sungpack Hong et al.Eigenbench: A Simple Exploration Tool for Orthogonal TM Characteristics.In IEEE International Symposium on Workload Characterization (IISWC), pages 1–11, Washington, USA,2010. IEEE Computer Society.
Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William N.Scherer III, and Michael L. Scott.Lowering the Overhead of Nonbacterial Software Transactional Memory.In ACM SIGPLAN Workshop on Transactional Computing. Jun 2006.
Cao Minh et al.STAMP: Stanford Transactional Applications for Multi-Processing.In IEEE International Symposium on Workload Characterization (IISWC), pages 35–46, Seattle, USA, 2008.IEEE Computer Society.
Dave Dice et al.Transactional Locking II.In International Symposium on Distributed Computing (DISC), pages 194–208, 2006.
Pascal Felber, Christof Fetzer, and Torvald Riegel.Dynamic Performance Tuning of Word-based Software Transactional Memory.In Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 237–246, Salt Lake City,USA, 2008. ACM.
Aleksandar Dragojević, Rachid Guerraoui, and Michal Kapalka.Stretching Transactional Memory.In Programming Language Design and Implementation (PLDI), pages 155–165, 2009.
14 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
References II
Márcio Castro et al.Analysis and Tracing of Applications Based on Software Transactional Memory on Multicore Architectures.In Euromicro International Conference on Parallel, Distributed and Network-Based Computing (PDP), pages199–206. IEEE Computer Society, 2011.
15 / 16
-
Introduction Methodology Performance Evaluation Conclusions References
Evaluating the Impact of TransactionalCharacteristics on the Performance of Transactional
Memory Applications
1Fernando Rui, 2Márcio Castro, 1Dalvan Griebler,1Luiz Gustavo Fernandes
Email: [email protected], [email protected],[email protected], [email protected]
1Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS - GMAP
2Universidade Federal do Rio Grande do Sul - UFRGS - INF
February 2014 16 / 16
IntroductionMethodologyPerformance EvaluationConclusionsReferences