evaluating the impact of transactional characteristics on the [email protected],...

16
Introduction Methodology Performance Evaluation Conclusions References Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications 1 Fernando Rui, 2 Márcio Castro, 1 Dalvan Griebler , 1 Luiz Gustavo Fernandes Email: [email protected], [email protected], [email protected], [email protected] 1 Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS - GMAP 2 Universidade Federal do Rio Grande do Sul - UFRGS - INF February 2014 1 / 16

Upload: others

Post on 01-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Introduction Methodology Performance Evaluation Conclusions References

    Evaluating the Impact of TransactionalCharacteristics on the Performance of Transactional

    Memory Applications

    1Fernando Rui, 2Márcio Castro, 1Dalvan Griebler,1Luiz Gustavo Fernandes

    Email: [email protected], [email protected],[email protected], [email protected]

    1Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS - GMAP

    2Universidade Federal do Rio Grande do Sul - UFRGS - INF

    February 2014 1 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    Summary

    1 Introduction

    2 Methodology

    3 Performance Evaluation

    4 Conclusions

    5 References

    2 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    Introduction

    1 MotivationMulti-coreApplications are not embarrassingly parallelTraditional synchronization structures (locks, mutexes andsemaphores)

    Low-level mechanismsCause BlockingHard to manageVulnerable to failures and faults

    3 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    Introduction

    1 Transactional Memory (TM)High-level abstractionAllows to write parallel code as transactionsIn runtime detect conflicts and solve them

    4 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    Introduction

    1 Challenge of TM systemsWhat kind of applications can really take advantage of TM?Why some TM applications present low performance?

    2 Contributions of this researchPerformance evaluation of the state-of-art STM systemsand applicationsExtend the analysis of [1], including the RSTM [2] systemWe find out characteristics that affect the performance TMWe identify bottlenecks of TM App. that limit their scalabilityWe show possible improvements to achieve betterperformance.

    5 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    Methodology

    1 Comparative Analysis1 Four state-of-the-art STM systems using the Stanford

    Transactional Applications for Multi-Processing (STAMP)benchmark [3];

    2 Evaluation of STM systems using EigenBench [1];3 We evaluate the impact of certain transactional

    characteristics using EigenBench.2 Environment of Tests

    All experiments were performed on a Dell PowerEdge R610machine with two quad-core Intel Xeon E5520 2.27 GHzprocessors with 8MB of L2 cache and 16GB of sharedmemory;All results are arithmetic means of at least 30 runs toguarantee a confidence level of 95%.

    6 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    STM Systems Using STAMP Benchmark

    1 STM SystemsTransactional Locking (TL2) [4]: second version of theoriginal TL;TinySTM [5]: uses shared counter as clock to control theconflicts between transactions and locks to protect sharedmemory locations;SwissTM [6]: its innovations is the hybrid conflict detectionscheme;Rochester Software Transactional Memory (RSTM) [2]:reduces cache misses by employing a single level ofindirection to access shared objects.

    7 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    STM Systems Using STAMP Benchmark

    1 Performance Evaluation

    0

    1

    2

    3

    4

    5

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    SwissTM

    Speedups

    0

    1

    2

    3

    4

    5

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    RSTM

    0

    1

    2

    3

    4

    5

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    TinySTM

    0

    1

    2

    3

    4

    5

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    TL2

    Applications

    0.0

    1.0

    2.0

    3.0

    4.0

    5.0

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    SwissTM

    2 cores 4 cores 8 cores Legend

    8 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    SwissTM vs. RSTM using EigenBench

    1 Set-up:STM systems which presented better performance;STAMP applications with poor (ssca2), medium (intruderand vacation) and good (labyrinth and genome) scalability;The evaluation is based on speedup and aborts per commit(ApC).

    2 EigenBench Input Parameters

    Table: Applications characteristics from STAMP benchmark

    Characteristic ssca2 intruder vacation labyrinth genome

    Working-set Size 400 MB 20 MB 256 MB 16 MB 20 MBTransactional Lenght 3 24 226 357 88Pollution 33% 5% 2% 50% 5%Temporal Locality 0.33 0.52 0.59 0.77 0.58Contention 0.0005% 22% 0.2% 5% 0.5%Predominance Low Low High Low HighDensity High High High Low High

    9 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    SwissTM vs. RSTM using EigenBenach (Cont.)

    1 Performance Evaluation

    0.0

    1.0

    2.0

    3.0

    4.0

    5.0

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    SwissTM

    2 cores 4 cores 8 cores Legend

    0% 2% 4% 6% 8%

    10% 12% 14% 16%

    2 4 8 Number of cores

    Aborts per commit

    genome

    intruder

    labyrinth

    ssca2

    vacation

    0 1 2 3 4 5 6 7 8

    genome intruder labyrinth ssca2 vacation

    Applications

    Speedups SwissTM

    RSTM

    0 1 2 3 4 5 6 7 8

    genome intruder labyrinth ssca2 vacation

    Applications

    Speedups

    0%

    1%

    2%

    3%

    4%

    5%

    6%

    2 4 8 Number of cores

    Aborts per commit

    genome

    intruder

    labyrinth

    ssca2

    vacation

    0%

    5%

    10%

    15%

    20%

    2 4 8 Number of cores

    Aborts per commit

    genome intruder labyrinth ssca2 vacation Legend

    10 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    SwissTM vs. RSTM using EigenBenach (Cont.)

    1 FindingsTM applications that use large amounts of memory did notpresent good performance, since STM systems need tokeep track of much more data to detect conflicts;The variation in terms of transaction lengths during theexecution is not well treated by most of the STM systems;Low degrees of predominance and density help TMapplications to perform better;High levels of ApC generally limit the performance of TMapplications.

    11 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    Evaluating the Impact of Transactional Characteristics

    0

    1

    2

    3

    4

    5

    Original V1 V2 V3 V4

    Genome - Transactional Length

    0

    1

    2

    3

    4

    5

    Original V1 V2 V3 V4

    Intruder - Temporal Locality

    0

    1

    2

    3

    4

    5

    Original V1 V2 V3 V4

    Ssca2 - Working-set Size

    0

    1

    2

    3

    4

    5

    Original V1 V2 V3 V4

    Vacation - Working-set Size Speedups

    Versions

    0

    1

    2

    3

    4

    5

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    SwissTM

    Speedups

    0

    1

    2

    3

    4

    5

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    RSTM

    0

    1

    2

    3

    4

    5

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    TinySTM

    0

    1

    2

    3

    4

    5

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    TL2

    Applications

    0.0

    1.0

    2.0

    3.0

    4.0

    5.0

    baye

    s

    geno

    me

    intru

    der

    kmea

    ns

    labyri

    nth

    ssca

    2

    vaca

    tion

    yada

    SwissTM

    2 cores 4 cores 8 cores Legend

    12 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    Conclusions

    About this paperSome Characteristics drive the performance of TMapplications;Applications must be analysed carefully to identify relevantcharacteristics;

    Future OpportunitiesWe intend to extend this work using some tracingmechanisms as proposed in [7];We intend to study the impact of the TM characteristics onthe performance of TM applications when executed on areal HTM processor such as the Intel Haswell.

    13 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    References I

    Sungpack Hong et al.Eigenbench: A Simple Exploration Tool for Orthogonal TM Characteristics.In IEEE International Symposium on Workload Characterization (IISWC), pages 1–11, Washington, USA,2010. IEEE Computer Society.

    Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William N.Scherer III, and Michael L. Scott.Lowering the Overhead of Nonbacterial Software Transactional Memory.In ACM SIGPLAN Workshop on Transactional Computing. Jun 2006.

    Cao Minh et al.STAMP: Stanford Transactional Applications for Multi-Processing.In IEEE International Symposium on Workload Characterization (IISWC), pages 35–46, Seattle, USA, 2008.IEEE Computer Society.

    Dave Dice et al.Transactional Locking II.In International Symposium on Distributed Computing (DISC), pages 194–208, 2006.

    Pascal Felber, Christof Fetzer, and Torvald Riegel.Dynamic Performance Tuning of Word-based Software Transactional Memory.In Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 237–246, Salt Lake City,USA, 2008. ACM.

    Aleksandar Dragojević, Rachid Guerraoui, and Michal Kapalka.Stretching Transactional Memory.In Programming Language Design and Implementation (PLDI), pages 155–165, 2009.

    14 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    References II

    Márcio Castro et al.Analysis and Tracing of Applications Based on Software Transactional Memory on Multicore Architectures.In Euromicro International Conference on Parallel, Distributed and Network-Based Computing (PDP), pages199–206. IEEE Computer Society, 2011.

    15 / 16

  • Introduction Methodology Performance Evaluation Conclusions References

    Evaluating the Impact of TransactionalCharacteristics on the Performance of Transactional

    Memory Applications

    1Fernando Rui, 2Márcio Castro, 1Dalvan Griebler,1Luiz Gustavo Fernandes

    Email: [email protected], [email protected],[email protected], [email protected]

    1Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS - GMAP

    2Universidade Federal do Rio Grande do Sul - UFRGS - INF

    February 2014 16 / 16

    IntroductionMethodologyPerformance EvaluationConclusionsReferences