preparing sandia's application portfolio for the future using kokkos · 2017. 2. 27. · §...

13
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2016-2672 C Preparing Sandia's Application Portfolio for the Future Using Kokkos Christian Trott, Daniel Sunderland, Carter Edwards, Si Hammond [email protected] Center for Computing Research Sandia National Laboratories, NM SAND2017-2110 C

Upload: others

Post on 26-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2016-2672 C

PreparingSandia'sApplicationPortfoliofortheFutureUsingKokkos

Christian Trott,DanielSunderland, CarterEdwards,[email protected]

Center forComputingResearchSandiaNationalLaboratories,NMSAND2017-2110C

Page 2: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

NewProgrammingModels

§ HPCisataCrossroads§ DiversifyingHardwareArchitectures§ MoreparallelismnecessitatesparadigmshiftfromMPI-only

§ NeedforNewProgrammingModels§ PerformancePortability:OpenMP 4.5,OpenACC,Kokkos,RAJA,SyCL,

C++20?,…§ ResilienceandLoadBalancing:Legion,HPX,UPC++,...

§ Vendordecouplingdrivesexternaldevelopment

2

My(slightlychanged)GoalfortheTalk:DescribewhatittooktogetKokkos acceptedbylegacyapplications

Page 3: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

Kokkos:Performance,PortabilityandProductivity

DDR#

HBM#

DDR#

HBM#

DDR#DDR#

DDR#

HBM#HBM#

Kokkos#

LAMMPS# Sierra# Albany#Trilinos#

https://github.com/kokkos

Page 4: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

PerformancePortabilitythroughAbstraction

Kokkos

Execution Spaces (“Where”)

Execution Patterns (“How”)

Execution Policies

- N-Level- Support Heterogeneous Execution

- parallel_for/reduce/scan, task spawn- Enable nesting

- Range, Team, Task-Dag- Dynamic / Static Scheduling- Support non-persistent scratch-pads

Memory Spaces (“Where”)

Memory Layouts (“How”)

Memory Traits

- Multiple-Levels- Logical Space (think UVM vs explicit)

- Architecture dependent index-maps- Also needed for subviews

- Access Intent: Stream, Random, …- Access Behavior: Atomic- Enables special load paths: i.e. texture

Parallel ExecutionData Structures

Separating of Concerns for Future Systems…

Page 5: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

Timeline

5

Initial Kokkos: Linear Algebra for Trilinos

Restart of Kokkos: Scope now Programming Model

Mantevo MiniApps: Compare Kokkos to other Models

LAMMPS: Demonstrate Legacy App Transition

Trilinos: Move Tpetra over to use Kokkos Views

Multiple Apps start exploring (Albany, Uintah, …)

Sandia Multiday Tutorial (~80 attendees)

Sandia Decision to prefer Kokkos over other models

Github Release of Kokkos 2.0

Kokkos-Kernels and Kokkos-Tools Release

DOE Exascale Computing Project starts

2008

2011

2013

2012

2014

2016

2015

2017

Page 6: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

17"

0"2000"4000"6000"8000"10000"12000"14000"

FOM+(Z

/s)+

LULESH+Figure+of+Merit+Results+(Problem+60)+

HSW"1x16" HSW"1x32" P8"1x40"XL" KNC"1x224" ARM64"1x8" NV"K40"Higher"is"

Better"

Results"by"Dennis"Dinge,"Christian"Trott"and"Si"Hammond"

InitialDemonstrations

6

§ DemonstrateFeasibilityofPerformancePortability§ DevelopmentofanumberofMiniApps fromdifferentsciencedomains

§ DemonstrateLowPerformanceLossversusNativeModels§ MiniApps areimplementedinvariousprogrammingmodels

§ DOETriLab Collaboration§ ShowKokkos worksfor

otherlabsapp§ Notethisishistoricaldata:

Improvementswerefound,RAJAimplementedsimilaroptimizationetc.

Page 7: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

TrainingtheUser-Base§ TypicalLegacyApplicationDeveloper

§ ScienceBackground§ MostlySerialCoding(MPIappsusuallyhavecommunicationlayerfew

peopletouch)§ Littlehardwarebackground,littleparallelprogrammingexperience

§ NotsufficienttoteachProgrammingModelSyntax§ Needtraininginparallelprogrammingtechniques§ Teachfundamentalhardwareknowledge(howdoesCPU,MICandGPU

differ,andwhatdoesitmeanformycode)§ Needtraininginperformanceprofiling

§ RegularKokkos Tutorials§ ~200slides,9hands-onexercisestoteachparallelprogramming

techniques,performanceconsiderationsandKokkos§ HeldatGTC,andSC;Alsoatrequestofinstitutions§ NowdedicatedECPKokkos supportproject:developonlinesupport

community 7

Page 8: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

KeepingApplicationsHappy§ Neverunderestimatedevelopersabilitytofindnewcorner

cases!!§ HavingaProgrammingModeldeployedinMiniApps orasinglebigappis

verydifferentfromhavinghalfadozenmulti-millionlinecodecustomers.§ 430Issuesin22months§ ~25%aresmallenhancements§ ~20%biggerfeaturerequests§ ~25%arebugs:oftencornercases

80

100

200

300

400

500

Issues

Issues since 2015

Other

Question

Compiler Issue

Bug

Feature RequestEnhancements

§ Example:Subviews§ Initiallydatatypeneededtomatch

includingcompiletimedimensions§ Allowcompile/runtimeconversion§ AllowLayoutconversionifpossible§ Automaticallyfindbestlayout§ Addsubviewpatterns

Page 9: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

TestingandSoftwareQuality§ ProgrammingModelsareinvasive

§ Reachmanycodelocations:allparallelizableloops§ Sometakeoverlowleveldatastructures§ Potentiallycostlytobackoutagain

§ PerformancePortabilityimpliesmultiplatform§ Muchgreatervarietyofcompilersandarchitectures§ Programmingmodelneedstosupportunionofcustomerneeds

§ TestingonSNLTestbeds§ IntelHaswell,KNL;IBMPower;CaviumARM;NVIDIAKepler,Pascal§ 15compilers(GCC,Intel,Clang,IBM,PGI)§ >200configurationseverynight

§ SEMS:SupportCommonSoftwareStackaccross SNL§ Applicationteamsdon’thavetheresourcesformultiplesoftwarestacks§ Delivertestedcompiler/tpl combinationsacrossdiversemachines 9

Page 10: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

BuildinganEcoSystem

10

Algorithms(Random, Sort)

Containers(Map, CrsGraph, Mem Pool)

Kokkos(Parallel Execution, Data Allocation, Data Transfer)

Kokkos – Kernels(Sparse/Dense BLAS, Graph Kernels, Tensor

Kernels)

Kokk

os–

Tool

s(K

okko

saw

are

Pro

filin

g an

d D

ebug

ging

Too

ls)

Trilinos(Linear Solvers, Load Balancing, Discretization, Distributed Linear

Algebra)

Kokk

os–

Supp

ort C

omm

unity

(App

licat

ion

Sup

port,

Dev

elop

er T

rain

ing)

ApplicationsMiniApps

std::thread OpenMP CUDA ROCm

Page 11: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

NecessaryResources§ Longtermdevelopment:

§ ~6yearseffortsofar§ onlynowseriouslyworkingonmajorapplications

§ NowmoreResourcesforSupport/ToolsthancoreModelR&D§ ~2FTEoncoreKokkos development§ ~1.5FTEapplicationsupport§ ~2FTEonToolsandKokkos Kernels

§ Diversehardwareresourcesfortestinganddevelopment§ Equivalentof2-3nodesfordedicatedtesting§ ~5differentarchitecturetestbedsfordevelopment§ BetaaccesstoallmajorHPCcompilers

§ IntensiveCollaborationwithVendors§ WorkingonCompilerBugs,Compilerimprovementsandnewbackends

11

Page 12: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

FurtherMaterial

§ https://github.com/kokkos Kokkos Github Organization§ Kokkos: Corelibrary,Containers,Algorithms§ Kokkos-Kernels: SparseandDenseBLAS,Graph,Tensor(under

development)§ Kokkos-Tools: ProfilingandDebugging§ Kokkos-MiniApps:MiniApp repositoryandlinks§ Kokkos-Tutorials: ExtensiveTutorialswithHands-OnExercises

§ https://cs.sandia.gov Publications(searchfor’Kokkos’)§ ManyPresentationsonKokkos anditsuseinlibrariesandapps

§ www.gputechconf.com/gtcnew/on-demand-gtc.php§ SearchforKokkos:recordedtalksonKokkos andsomeusage

12

Page 13: Preparing Sandia's Application Portfolio for the Future Using Kokkos · 2017. 2. 27. · § Development of a number of MiniAppsfrom different science domains § Demonstrate Low Performance

http://www.github.com/kokkos