argobots and its application to charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23...

30
Sangmin Seo Assistant Computer Scientist ArgonneNational Laboratory [email protected] April 19, 2016 Argobots and its Application to Charm++ Charm++ Workshop 2016

Upload: others

Post on 25-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

SangminSeo

AssistantComputerScientistArgonneNationalLaboratory

[email protected]

April19,2016

Argobots and its Application to Charm++

Charm++Workshop2016

Page 2: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Argo Concurrency Team

• ArgonneNationalLaboratory(ANL)– Pavan Balaji (co-lead)– SangminSeo– AbdelhalimAmer– MarcSnir– PeteBeckman(PI)

• UniversityofIllinoisatUrbana-Champaign(UIUC)– Laxmikant Kale(co-lead)– Prateek Jindal– JonathanLifflander

• UniversityofTennessee,Knoxville(UTK)– GeorgeBosilca– ThomasHerault– DamienGenet

• PacificNorthwestNationalLaboratory(PNNL)– Sriram Krishnamoorthy

PastTeamMembers:• CyrilBordage (UIUC)• EstebanMeneses

(UniversityofPittsburgh)• Huiwei Lu(ANL)• Yanhua Sun (UIUC)

Charm++Workshop2016 2

Page 3: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Massive On-node Parallelism

• Thenumberofcoresisincreasing• Massiveon-nodeparallelismisinevitable• Existingsolutionsdonoteffectivelydealwithsuchparallelismwith

respecttoon-nodethreading/taskingsystemsorwithrespecttooff-nodecommunicationinthepresenceofsuchtasks/threads

• Howtoexploit?

core

Core-levelParallelism3Charm++Workshop2016

Page 4: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Shortcomings today? Pthreads (1/2)

Nesting

int in[1000][1000], out[1000][1000];

#pragma omp parallel for

for (i = 0; i < 1000; i++) {

petsc_voodoo(i);

}

petsc_voodoo(int x)

{

#pragma omp parallel for

for (j = 0; j < 1000; j++)

out[x][j] = cosine(in[x][j]);

}

Executiontimefor36threadsintheouterloop

WhyistraditionalOpenMP’s performancesobad?Thecompilercannotanalyzepetsc_voodoo toknowwhetherthefunctionmighteverblockoryield,so ithastoassumethatitmight.Thereforeastackisneededtofacilitateit.CreatingadditionalPthreads foreachnestingisthesimplestwaytoachievethis.

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Time(s)

#OMPThreads|ArgobotsULTs/tasks(innerloop)

GCC/pthreads GCC/ArgobotsULTs GCC/Argobotstasks

Lowerisbetter

Charm++Workshop2016 4

Page 5: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Shortcomings today? Pthreads (2/2)

TasksofapplicationmappedtoagroupofPthreads

Needlightweightmechanismstoswitchtasks!computation

communication

C

Pthreads

C C C

map&schedule

Howaboutthesecommunications?Waitorcontextswitch?

Workunitsintermixedwithblockingcalls (suchascommunication calls)cancauseidlecores

Charm++Workshop2016 5

Page 6: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Outline

• Background• Argobots• Charm++withArgobots• OtherProgrammingModels• Summary

Charm++Workshop2016 6

Page 7: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

User-Level Threads (ULTs)

• Whatisuser-levelthread(ULT)?– Providesthreadsemanticsinuserspace– Executionmodel:cooperativetimesharing

• MorethanoneULTcanbemappedtoasinglekernelthread

• ULTsonthesameOSthreaddonotexecuteconcurrently• Wheretouse?

– Tobetteroverlapcomputationandcommunication/IO– Toexploitfine-grainedtaskparallelism

timeline

Context switch

Context switch

ULT1

ULT2

Core Core Core Core Core Core Core Core

ULTs :

Kernel threads :

Charm++Workshop2016 7

Page 8: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Pthreads vs. ULTs

• Averagetimeforcreatingandjoiningonethread• pthread:6.6us- 21.2us(avg.34,953cycles)• ULT(Argobots):78ns- 130ns(avg.191cycles)• ULTis64x- 233xfaster thanPthread

– HowfastisULT?• L1$access:1.112ns,L2$access:5.648ns,memory access:18.4ns• Context switch (2processes):1.64us

1

10

100

1000

10000

100000

1 2 4 8 16 32 64 128 256 512 1024 2048

Avg.Create&JoinTime/thread

(ns)

NumberofThreads

pthread ULT(Argobots)

*measuredusingLMbench3

Charm++Workshop2016 8

Page 9: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Growing Interests in ULTs

• ULTandtasklibraries– Conversethreads,Qthreads,MassiveThreads,Nanos++,

Maestro,GnuPth,StackThreads/MP,Protothreads,Capriccio,StateThreads,TiNy-threads,etc.

• OSsupports– Windowsfibers,Solaristhreads

• Languageandprogrammingmodels– Cilk,OpenMP task,C++11task,C++17coroutineproposal,

Stackless Python,Gocoroutines,etc.

• Pros– EasytousewithPthreads-likeinterface

• Cons– Runtimetriestodosomethingsmart(e.g.,work-stealing)– Thismayconflictwiththecharacteristicsanddemandsof

applications

Charm++Workshop2016 9

Page 10: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Argobots

Overview• Separationofmechanismsandpolicies• Massiveparallelism

– Exec.Streams guaranteeprogress– WorkUnits executetocompletion

• User-levelthreads(ULTs)vs.Tasklet• Clearlydefinedmemorysemantics

– Consistencydomains• ProvideEventualConsistency

– Softwarecanmanageconsistency

Argobots Innovations• Enablingtechnology,butnotapolicymaker

– High-levellanguages/librariessuchasOpenMP,Charm++havemoreinformationabouttheuserapplication(datalocality,dependencies)

• Explicitmodel:– Enablesdynamism,butalwaysmanaged

byhigh-levelsystems

Argobots

coreProcessor

Programming Models(MPI, OpenMP, Charm++, PaRSEC, …)

U User-Level Thread T TaskletLightweightWork Units

Exe

cutio

n S

tream

Private pool Private poolShared pool

U U

U T

TTU TU

Exe

cutio

n S

tream

Exe

cutio

n S

tream

Alow-levellightweightthreadingandtaskingframework(http://collab.cels.anl.gov/display/argobots/)

*Teammembers:Sangmin Seo,Abdelhalim Amer,Pavan Balaji (ANL),Laxmikant Kale,Prateek Jindal(UIUC)

Charm++Workshop2016 10

Page 11: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Argobots Execution Model

• ExecutionStreams(ES)– Sequentialinstructionstream

• Canconsistofoneormoreworkunits– Mappedefficientlytoahardware

resource– Implicitlymanagedprogresssemantics

• OneblockedEScannotblockotherESs

• User-levelThreads(ULTs)– Independentexecutionunitsinuser

space– AssociatedwithanESwhenrunning– Yieldableandmigratable– Canmakeblockingcalls

• Tasklets– Atomicunitsofwork– Asynchronouscompletionvia

notifications– Notyieldable,migratablebefore

execution– Cannotmakeblockingcalls

S

Scheduler Pool

U

ULT

T

Tasklet

E

Event

ES1 Sched

U

U

E

E

E

E

U

S

S

T

T

T

T

T

Argobots Execution Model

...

ESn

• Scheduler– Stackableschedulerwithpluggable

strategies• Synchronizationprimitives

– Mutex,conditionvariable,barrier, future• Events

– Communicationtriggers

Charm++Workshop2016 11

Page 12: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Explicit Mapping ULT/Tasklet to ES

• TheuserneedstomapworkunitstoESs• Nosmartscheduling,nowork-stealingunlesstheuserwants

touse

ES1

U0

U1

T1

T2

U2

U3

ES2

U4

U5

• Benefits– Allowlocalityoptimization

• ExecuteworkunitsonthesameES

– NoexpensivelockisneededbetweenULTsonthesameES

• Theydonotrunconcurrently• Aflagisenough

Charm++Workshop2016 12

Page 13: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Stackable Scheduler with Pluggable Strategies

• AssociatedwithanES• CanhandleULTsandtasklets• Canhandleschedulers

– Allowstostackschedulershierarchically• Canhandleasynchronousevents• Userscanwriteschedulers

– Providesmechanisms,notpolicies– Replacethedefaultscheduler

• E.g.,FIFO,LIFO,PriorityQueue,etc.• ULTcanexplicitlyyieldto anotherULT

– Avoidscheduleroverhead

Sched

U

U

E

E

E

E

U

S

S

T

T

T

T

T

U S U U U

yield() yield_to(target)

Charm++Workshop2016 13

Page 14: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Performance: Create/Join Time

• Idealscalability– IftheULTruntimeisperfectlyscalable,thetimeshouldbethesame

regardlessofthenumberofESs

10

100

1000

10000

1 2 4 8 16 24 32 36 40 48 56 64 72

Create/Jo

inTimeperU

LT(cycles)

NumberofExecutionStreams(Workers)

Qthreads MassiveThreads (H) MassiveThreads (W)

Argobots(ULT) Argobots(Tasklet)

Charm++Workshop2016 14

Page 15: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

JonathanLifflander,Prateek Jindal,Yanhua SunLaxmikant Kale

UniversityofIllinoisatUrbana-Champaign(UIUC)

Charm++ with Argobots

15Charm++Workshop2016

Page 16: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Charm++ with Argobots

• Goals– TestthecompletenessandperformanceofArgobotswith

Charm++programmingmodel– TakeadvantageofArgobots features(tasklets,stackable

schedulers,etc.)withoutmodifyingapplicationcodes– ForCharm++applications,interoperatewithapplications

writteninothermodels(MPI,Cilk,etc.)

16

Mini-apps and real world applications

Charm++ model

Converse runtime(threading, messaging, scheduler)

Communication libraries (MPI, uGNI, PAMI, Verbs)

Intelligent runtime

Argobots(ULTs, Tasks, scheduling, etc.)

Charm++ infrastructure Charm++ with Argobots *Teammembers:LaxmikantKale,JonathanLifflander, PrateekJindal (UIUC)

Charm++Workshop2016

Page 17: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Replacing the Converse Runtime with Argobots

• Converse– TheactivemessaginglayerinCharm++

• Approaches– EachCharm++Pthread insideanode(includingthecommunication

thread)isimplementedasanArgobots ES• CreateanESforeveryConverseinstance

– AcustomArgobots scheduleriscreatedinsteadofusingtheConversescheduler

– Conversemessagesareenqueued intoArgobots poolsastasklets– Conversethreads(CthThread)areimplementedontopofArgobots

ULTs,withconditionalvariablestoimplementsuspend/resume

• Only180linesofcodehadtobechanged!

Charm++Workshop2016 17

Converse runtime(threading, messaging, scheduler)

Argobots(ULTs, Tasks, scheduling, etc.)

Page 18: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

LeanMD Performance: Runtime Comparison

Charm++Workshop2016 18

• Evaluationmachine– 2xIntelXeon E5-2699v3(2.30GHz):36cores (72threads)

• LeanMD simulation– Atotalof20stepsonacellarrayofdimensions7x7x7– 1-awayXYZconfigurationand1000atomspercell

Achievedcomparableperformancealthough itisaverysimpleimplementation

Page 19: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

LeanMD Performance: Manual Implementation

Charm++Workshop2016 19

• Manualimplementation ofLeanMD usingtheArgobots– ExploitedbothULTsandtasklets

• AULTformanagingacellandatasklet formanagingtheinteractionbetweencells– Usedfuturesforthewaitingmechanism– Workstealingbetweenpools

• Betterperformanceofourmanualimplementation implies thatCharm++withArgobots couldbeimproved

Page 20: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Argobots: Interfaces for Shrink/Expand Events

20

ES0

Sched

ES1

Sched

ES2

Sched NRM

ESn-1

Sched...

E

E

E

Argobots

socket

Charm++ CilkBot PaRSECMPI+Argobots

programmingmodelruntimesandapplications

callbackfunctions

1. [Argobots]ConnecttoNRMusingasocketonABT_init()2. [Runtimes/applications]Registercallbackfunctionsforshrink/expandevents3. [Runtimes/applications]Deregistercallbackfunctionswhentheyterminate4. [Argobots]DisconnectfromNRMonABT_finalize()

ABT_ENV_POWER_EVENT_HOSTNAMEABT_ENV_POWER_EVENT_PORT

ABT_event_add_callback()ABT_event_del_callback()

Charm++Workshop2016

Page 21: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Argobots: Shrink/Expand Event Handling

• Shrinking

21

ES0

Sched

ES1

Sched

ES2

Sched

E E E

prog.model runtime/application

1. ES1 picksanevent,whichrequestsES2 tobestopped2. AsktheruntimeusingcallbackswhetherES2 canbestopped3. IfOK,markES2 toneedtostopsowhenthescheduleronES2 checksevents,

itcanbestopped4. NotifytheruntimethatES2 willbestopped5. CreateaULTonES06. WhenthescheduleronES2 stops,ES2 isterminated7. AfterES2 isterminated,theULTfreesES2 andsendsaresponsetoNRM

- Anyscheduler onanyEScancheckandhandle events- ES0 cannotbestopped

U

12

3

4

5NRM

7

6

Charm++Workshop2016

Page 22: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Argobots: Shrink/Expand Event Handling

• Expanding

22

ES0

Sched

ES1

Sched

ES2

Sched

E E E

prog.model runtime/application

1. ES0 picksanevent,whichrequeststocreateanES22. AsktheruntimeusingcallbackswhetheritcancreateES23. IfOK,invokeacallbackfunctionsotheruntimecreatesES24. CreateES25. SendaresponsetoNRM

1 2 3

5NRM

4

Charm++Workshop2016

Page 23: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Charm++ with Argobots: Implementation

• Shrink/ExpandImplementation– Charm++maintainsasetofpoolsforeachscheduler,mappedtoanES

– RanksinCharm++arevirtualizedbysavingthemappingofpooltoanES

– WhenanESisremoved,theassociatedpoolsareputintoagloballist

• TomaintaincorrectnessinCharm++,therankofanytasks/threadsinthegloballistarederivedfromthepool(ranksarevirtualized)

– Shrink• OtherESsexecuteworkunitsfromorphanedpoolswithsomeaddedsynchronization

– Expand• AnewESiscreatedandtakesoverasetoforphanedpools

23Charm++Workshop2016

Page 24: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

LeanMD Results of Shrinking/Expanding ESs

Charm++Workshop2016 24

Page 25: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Argobots Ecosystem

ES1 Sched

U

U

E

E

E

E

U

S

S

T

T

T

T

T

Argobots

...

ESn

MPI+Argobots

ULT

ES

ULT

ES

MPI

Argobots runtime

Communication libraries

Charm++

Applications

Charm++Cilk “Worker”

ArgobotsES

RWSULT

FusedULT1

FusedULT2

FusedULTN

…CilkBots

PO

GE

TR

SYTR TR

PO GE GE

TR TR

SYSY GE

PO

TR

SY

PO

SY SY

PaRSEC

OpenMP MercuryRPC

Origin

Target

RPCproc

RPCproc

OmpSs

GridFTP,Kokkos,RAJA,ROSE,TASCEL,XMP,etc.ExternalConnections

Charm++Workshop2016 25

Page 26: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Summary

• Massiveon-nodeparallelismisinevitable– Needruntimesystemsutilizingsuchparallelism

• Argobots– Alightweightlow-levelthreading/taskingframework– Providesefficientmechanisms,notpolicies,tousers(library

developersorcompilers)• Theycanbuildtheirownsolutions

• Charm++withArgobots– ImplementedbyreplacingtheConverseruntimewithArgobots– AchievedcomparableperformanceonLeanMD– Incorporatedtheshrinking/expandingofESsinArgobots in

ordertorespondtotheexternalevents(e.g.,powercapping)

Charm++Workshop2016 26

Page 27: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Try Argobots

• git repository• http://git.mcs.anl.gov/argo/argobots.git/

• Documentation– Wiki

• https://collab.cels.anl.gov/display/ARGOBOTS/

– Doxygen• http://www.mcs.anl.gov/~sseo/public/argobots/

Charm++Workshop2016 27

Page 28: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Funding AcknowledgmentsFundingGrantProviders

InfrastructureProviders

Charm++Workshop2016

Page 29: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Programming Models and Runtime Systems GroupGroupLead– PavanBalaji(computerscientistandgroup

lead)

CurrentStaffMembers– Abdelhalim Amer (postdoc)– Yanfei Guo (postdoc)– RobLatham(developer)– LenaOden(postdoc)– KenRaffenetti (developer)– Sangmin Seo (assistantcomputerscientist)– MinSi(postdoc)– MinTian(visitingscholar)

PastStaffMembers– AntonioPena(postdoc)– WesleyBland(postdoc)– DariusT.Buntinas (developer)– JamesS.Dinan (postdoc)– DavidJ.Goodell (developer)– Huiwei Lu(postdoc)– YanjieWei(visitingscholar)– Yuqing Xiong (visitingscholar)

– JianYu(visitingscholar)

– Junchao Zhang(postdoc)

– Xiaomin Zhu(visitingscholar)

– AshwinAji (Ph.D.)– Abdelhalim Amer (Ph.D.)– Md.Humayun Arafat(Ph.D.)– AlexBrooks(Ph.D.)– AdrianCastello (Ph.D.)– Dazhao Cheng(Ph.D.)– JamesS.Dinan (Ph.D.)– Piotr Fidkowski (Ph.D.)– PriyankaGhosh(Ph.D.)– SayanGhosh (Ph.D.)– RalfGunter(B.S.)– Jichi Guo (Ph.D.)– Yanfei Guo (Ph.D.)– MariusHorga (M.S.)– John Jenkins(Ph.D.)– Feng Ji (Ph.D.)– PingLai(Ph.D.)– Palden Lama(Ph.D.)– YanLi(Ph.D.)– Huiwei Lu(Ph.D.)– JintaoMeng (Ph.D.)– GaneshNarayanaswamy (M.S.)– Qingpeng Niu (Ph.D.)– Ziaul Haque Olive(Ph.D.)– DavidOzog (Ph.D.)

– Renbo Pang(Ph.D.)– Sreeram Potluri (Ph.D.)– LiRao (M.S.)– Gopal Santhanaraman (Ph.D.)– ThomasScogland (Ph.D.)– MinSi(Ph.D.)– BrianSkjerven (Ph.D.)– RajeshSudarsan (Ph.D.)– LukaszWesolowski (Ph.D.)– Shucai Xiao(Ph.D.)– Chaoran Yang(Ph.D.)– Boyu Zhang(Ph.D.)– Xiuxia Zhang(Ph.D.)– XinZhao(Ph.D.)

Advisory Board– PeteBeckman(seniorscientist)– RustyLusk (retired,STA)– MarcSnir (divisiondirector)– RajeevThakur (deputydirector)

Current andRecentStudents

Charm++Workshop2016

Page 30: Argobots and its Application to Charm++ · 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads

Q&A

§ Thankyouforyourattention!

Questions?

Charm++Workshop2016