high level abstractions for irregular computations...programming irregular computations for...

70
High level abstractions for irregular computations David Padua Department of Computer Science University of Illinois at Urbana-Champaign

Upload: others

Post on 27-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Highlevelabstractionsforirregularcomputations

DavidPaduaDepartmentofComputerScience

UniversityofIllinoisatUrbana-Champaign

Page 2: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

1.INTRODUCTION:THEPROBLEM

2December4,2017

Page 3: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Programmingforperformance

• Islaborious– Particularlysowhenthetargetmachineisparallel.

• Mustdecidewhattoexecuteinparallel• Inwhatordertotraversethedata• Howtopartitionthedata…

• Today,tuningmustbedonemanually.• Maintenanceisdifficult

– andtheresultisnotportable.

3December4,2017

Page 4: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Programmingforperformance

• Thereisstillmuchroomforprogressinhighperformanceprogrammingmethodology.

• Ourgoalistofacilitate(automate)tuningandenablemachineindependentprogramming.

4December4,2017

Page 5: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

2.IRREGULARCOMPUTATIONS

5December4,2017

Page 6: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Whatareirregularcomputations?

• Irregularcomputationsaredifficulttodefine.• Onepossibledefinitionis:

– Computationson“irregular”datastructures.– Irregulardatastructures=notdensearrays.

• Pingali etal.Thetao ofparallelisminalgorithms.PLDI’11

– Andsubscriptexpressionsnon-linear.

6December4,2017

Page 7: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Whatareirregularcomputations?

• Examplesinclude– Subscriptedsubscripts:

• A[C[i]]– Linkedliststraversal:

• leftCell = leftCell->ColNext;• result+=leftCell->Value;

7December4,2017

Page 8: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Programmingirregularcomputationsforperformance

• Today,programmingofirregularcomputationsistypicallyatalowlevelofabstraction.

8G.HagerandG.Wellein.IntroductiontoHighPerformanceComputingforScientistsandEngineers.CRC Press

do diag=1,Nj, 2 ! two-way unroll amd Jamdiaglen=min( ( jd_ptr(diag+1)-jd_ptr(diag) ) , \

( jd_ptr(diag+2)-jd+ptr(diag+1) ) )offset1 = jd_ptr(diag) - 1offset2 = jd_ptr(diag+1) - 1do i=1, diagLen

C(i) = C(i) + val(offset1+i)*B(col_idx(offsetr1+i))C(i) = C(i) + val(offset2+i)*B(col_idx(offset2+i))

end do! peeled off iterationsoffset1 = jd_ptr(diag)do i=(diagLen+1), (jd_ptr(diag+1)-jd_ptr(diag))

C(i) = C(i)+val(offset1+i)*B(col_idx(offswt1+i))end do

end do

December4,2017

Page 9: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Programmingirregularcomputationsforperformance

• Programmingandoptimizingirregularcomputationsmorechallengingthanforlinear-subscriptscomputations.

• Wanttohavecompilersupportforprogrammingata lowlevelofabstractiontoenable– Automaticallymappingcomputationontodiversemachineclasses– Localityenhancement– Othercompileroptimizations(e.g.commonsubexpression eliminations)

• Inaddition,oftenusinghigherlevelsofabstractionsandapplyingoptimizationsatthatlevelisabettersolution.

• Bothapproacheswillbediscussedinthispresentation

9December4,2017

Page 10: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Whencomputationsarenotirregular

• Compilercandoagoodjobattransformingprograms:– Loopinterchange– Loopdistribution– Tiling– …

• Andwehavethethetechnologytogenerateefficientcodeforavarietyofplatformsevenwhentheinitialcodeissequential– Vectorextensions– Multicores– Distributedmemorymachines– GPUs

10December4,2017

Page 11: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

ExampleAutomatictransformationofaregularcomputation

• Considertheloop

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

11December4,2017

Page 12: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

a[1][1]=a[1][0]+a[0][1]

a[1][2]=a[1][1]+a[0][2]

a[1][3]=a[1][2]+a[0][3]

a[1][4]=a[1][3]+a[0][4]

12

j=1

j=2

j=3

j=4

a[2][1]=a[2][0]+a[1][1]

a[2][2]=a[2][1]+a[1][2]

a[2][3]=a[2][2]+a[1][3]

a[2][4]=a[2][3]+a[1][4]

i=1 i=2

• Computedependences(part1)

ExampleAutomatictransformationofaregularcomputation

December4,2017

Page 13: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

a[1][1]=a[1][0]+a[0][1]

a[1][2]=a[1][1]+a[0][2]

a[1][3]=a[1][2]+a[0][3]

a[1][4]=a[1][3]+a[0][4]

13

j=1

j=2

j=3

j=4

a[2][1]=a[2][0]+a[1][1]

a[2][2]=a[2][1]+a[1][2]

a[2][3]=a[2][2]+a[1][3]

a[2][4]=a[2][3]+a[1][4]

i=1 i=2

• Computedependences(part2)

ExampleAutomatictransformationofaregularcomputation

December4,2017

Page 14: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

1 2 3 4 …

1

2

3

4

j

i

14

1,1

or

ExampleAutomatictransformationofaregularcomputation

December4,2017

Page 15: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

15

• Findparallelism

ExampleAutomatictransformationofaregularcomputation

December4,2017

Page 16: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

16

• Findparallelism

ExampleAutomatictransformationofaregularcomputation

December4,2017

Page 17: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

17

• Transformthecode

for (k=4; k<2*n; k++) forall (i=max(2,k-n):min(n,k-2)) a[i][k-i]=...

ExampleAutomatictransformationofaregularcomputation

December4,2017

Page 18: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Analyzingirregularcomputations

• Analysisofthefollowingcodeisdifficultandsometimesimpossible

• Dependencesareafunctionofthevaluesofc andd• Intheabsenceofadditionalinformation,thecompilermustassumetheworst.

• Andthisprecludesautomatictransformations(parallelization,vectorization,datadistribution,…)

18

for (i=1; i<n; i++) {a[c[i]]=a[d[i]]+1;

}

December4,2017

Page 19: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Irregularityisabeastprogramminglanguageresearchershavebeen

fightingforalongtime

19December4,2017

Page 20: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Ithasdefeatedussometimes

20

IgnoringtheimportanceofirregularcomputationswasoneoftheReasonsforthefailureofHighPerformanceFortran

December4,2017

Page 21: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Tamingtheirregularbeast

• CompilersandRuntime(Section3)

• Betternotations(Section4)

21December4,2017

Page 22: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

3.COMPILERANDRUNTIMESOLUTIONSFORLOWLEVELIRREGULARPROGRAMMING

22December4,2017

Page 23: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Twoapproachestodealingwithirregularcomputations

• Analyzingirregularcodesdirectly– Statically– Dynamically

• Convertingintohigherlevelofabstraction(densearraycomputations)

23December4,2017

Page 24: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Staticanalysis

• Insomecases,itispossibletoanalyzeirregularalgorithmswritteninconventionalnotation.

• Moreeffortthanforregular/linearsubscriptcomputations

24December4,2017

Page 25: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Staticanalysis

25

Dependenceanalysis

do k = 1, nq = 0do i = 1, p

if ( x(i) > 0 ) thenq = q + 1ind(q) = i

end ifend dodo j = 1, q

x(ind(j)) = x(ind(j))* y(ind(j))end do

end do

December4,2017

Page 26: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Staticanalysis

26

do k = 1, nq = 0do i = 1, p

if ( x(i) > 0 ) thenq = q + 1ind(q) = i

end ifend dodo j = 1, q

x(ind(j)) = x(ind(j))* y(ind(j))end do

end do

Dependenceanalysis

Needinformationaboutind(j)

December4,2017

Page 27: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Staticanalysis

27

do k = 1, nq = 0do i = 1, p

if ( x(i) > 0 ) thenq = q + 1ind(q) = i

end ifend dodo j = 1, q

x(ind(j)) = x(ind(j))* y(ind(j))end do

end do

Dependenceanalysis

Needinformationaboutind(j)Canbefoundbyanalyzingtheprogram

December4,2017

Page 28: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Staticanalysis

28

do i=1, ndo j=2, iblen(i)

do k=1, j-1x(pptr(i)+k-1) = ...

end doend dodo j=1, iblen(i)-1

do k=1, j... = x(iblen(i)+pptr(i)+k-j-1)

end doend do

end do

Arrayprivatization

December4,2017

Page 29: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Staticanalysis

29

do i=1, ndo j=2, iblen(i)

do k=1, j-1x(pptr(i)+k-1) = ...

end doend dodo j=1, iblen(i)-1

do k=1, j... = x(iblen(i)+pptr(i)+k-j-1)

end doend do

end do

Arrayprivatization

Todecideifx canbeprivatized

December4,2017

Page 30: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Staticanalysis

30

do i=1, ndo j=2, iblen(i)

do k=1, j-1x(pptr(i)+k-1) = ...

end doend dodo j=1, iblen(i)-1

do k=1, j... = x(iblen(i)+pptr(i)+k-j-1)

end doend do

end do

Arrayprivatization

Todecideifx canbeprivatizedIssufficienttoshowthatawriteprecedeseachread

December4,2017

Page 31: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Staticanalysis

• Apossibleapproachistoquerythecontrolflowgraph,boundingthesearch.

31

YuanLinandDavidPadua.Compileranalysisofirregularmemoryaccesses.(PLDI'00).

December4,2017

Page 32: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Dynamicanalysis

• Embeddeddependenceanalysis• Inspectorexecutor• Speculation

32December4,2017

Page 33: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Embeddeddependenceanalysis

• ThisisaclevermechanismduetoZhuandYew.

33

Zhu,C.Q.,&Yew,P.C.(1987).SCHEMETOENFORCEDATADEPENDENCEONLARGEMULTIPROCESSORSYSTEMS.IEEETransactionsonSoftwareEngineering,SE-13(6),726-739

for i=1; i<n; i++a(k(i)) = c(i) + 1

a_tag(k(:)) = 0parallel for i=1;i<n;i++

critical a(k(i)) if a_tag(k(i)) < i

a(k(i)) = c(i) + 1a_tag(k(i)) = i

December4,2017

Page 34: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Inspectorexecutor

• Parallelscheduleandcommunicationaggregationcomputedatexecutiontime– Analyzingthememoryaccesspatternofacodesegment(typicallyaloop)

• Overheadreducedwhenthecodesegment(loop)isexecutedmultipletimeswiththesameaccesspattern

34

SeeEncyclopediaofParallelComputingRunTimeParallelization(Saltz andDas)

December4,2017

Page 35: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Speculation

• Thereisanextensivebodyofliteratureonspeculation.• Acodesegmentisexecutedinparallelinthehopethatthereisnoproblem.

• Ifthereis,executionbacktracksandthecodesegmentisre-executedserially

See LawrenceRauchwerger,DavidA.Padua:TheLRPD Test:SpeculativeRun-TimeParallelizationofLoopswithPrivatizationandReductionParallelization.PLDI 1995:218-232

EncyclopediaofParallelComputingThreadlevelspeculation(Torrellas)TransactionalMemories(Herlihy)

35December4,2017

Page 36: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Convertingintodensearraycomputations

• Theobjectiveistoconvertsparsecomputationsintoequivalentdenseform.

• Thisincreasesthenumberofoperationsinvolvingzeroes(oridentity)– Sometimesmakingthecomputationimpossible

• Butitallowstheapplicationofcompile-timetransformations.

• Sparsity isrecoveredbyapplyingafinaltransformation(seebelow)

36December4,2017

Page 37: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Convertingintodensearraycomputations

37

for(col=0; col<cols; col++) {for(row=0; row<dimensions; row++) {

leftCell = left.Rows[row];while( leftCell != NULL ) {

result[row][col] +=leftCell->Value * right[leftCell->ColIndex][col];

leftCell = leftCell->ColNext;

for( col = 0; col < cols; col++ ) for( row = 0; row < dimensions; row++ )

result[row][col] += A_Valuep[row][:] * right[:][col];

December4,2017

Page 38: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Convertingintodensearraycomputations

38

Harmen L.A.vanderSpek,HarryA.G.Wijshoff:Sublimation:ExpandingDataStructurestoEnableDataInstanceSpecificOptimizations.LCPC2010:106-120

Anand Venkat,MaryHall,andMichelleStrout.2015.Loopanddatatransformationsforsparsematrixcode.In Proceedingsofthe36thACMSIGPLANConferenceonProgrammingLanguageDesignandImplementation(PLDI'15).

December4,2017

Page 39: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Wheredowestand?

• Thereisanextensivebodyofliteratureoncompilationandruntimetofacilitatetheparallelprogrammingofirregularcomputations.

• Althoughmoreideasareimportant,whatweneedmosturgentlyisanevaluationoftheireffectiveness.– Ameansforrefinementandprogress.

• Allcompilertechniquessufferfromlackofunderstandingoftheireffectiveness.

39December4,2017

Page 40: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Needtoolstoevaluateprogress

• Measuringadvancesincompilertechnologyiscrucialforprogress.

• Anidea:apubliclyavailablerepository– Containingextensiveandrepresentativecodecollections.

– Keeptrackofresultsforcompilers/techniquesalongtime.

• Ongoingworkononesuchrepository:– LORE:ALoopRepositoryfortheEvaluationofCompilersbyZ.i Chen,Z.Gong,J.

JosefSzaday,D.Won,D.Padua,A.Nicolau ,A.Veidenbaum ,N.Watkinson,Z.Sura,S.Maleki,J.Torrellas,G.DeJong.IISWC-2017

40December4,2017

Page 41: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

4. NOTATIONS

41December4,2017

Page 42: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

4. NOTATIONS4.1ABSTRACTION

42December4,2017

Page 43: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

WhatareProgrammingabstractions?

§ Anabstractionisformedbyreducinginformation.

§ Ineverydaylife,itformstheworldofideaswherewecanreason.No(naturallanguage)statementcanbemadewithoutabstractions

43December4,2017

Page 44: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

WhatareProgrammingabstractions?

§ Abstractmachinelanguage/lowerlevelimplementation§ Scalarcomputations:

§ a=b*c+d^3.4§ Noloads,stores,registerallocation.

§ Scalarloops:§ for (i= …..) {…}§ Noifstatements,branch/goto instructions.§ Givesstructuretotheprogram.

§ Abstractalgorithmimplementation§ min(A(1:n))§ it = find (myvec.begin(), myvec.end(), 30);§ Hidealgorithm,datarepresentation

44December4,2017

Page 45: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Whatarethebenefitofusingabstractions?

• Programmerproductivity(expressiveness)– Codesareeasiertowrite– Tounderstand,maintain

• Portability• Optimization(manualandautomatic)

– Programsareeasiertoanalyze– Easiertotransform– Deepertransformationsareenabled

45December4,2017

Page 46: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Abstractionsandirregularcomputations

• Whatabstractionsshouldweuseforprogrammingirregularcomputationsatahigherlevel?

• Bestanswerseemstobetorepresentirregularalgorithmsintermsof:

Densearrayoperations

46December4,2017

Page 47: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

4. NOTATIONS4.2.ARRAYNOTATIONANDIRREGULARCOMPUTATIONS

47December4,2017

Page 48: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Whyarraynotation

• Couldaccessarraysinscalarmode• Butoperationsonaggregatesare(often)easiertoread.

for (i=0; i<n; i++) for (j=0;j<n;j++)

a[i][j]=b[i][j]+c[i][j]

a=b+c

• Theyarewellunderstoodabstractions.• Powerful.

48December4,2017

Page 49: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Arrayoperationsandperformance

• Butusefulnessofarraynotationgobeyondexpressiveness.

• Thereareatleastthreereasonstousearrayabstractionsforperformance– Torepresentparallelism– Toreducetheoverheadofinterpretation.– Toenableautomaticoptimizations.

49December4,2017

Page 50: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Representationofparallelism• Popularintheearlydays:

• Usedtodaybutnotwidely

• Tensorflow• Futhark

– T.Henriksen,N.Serup,M.Elsman,F.Henglein,andC.Oancea.Futhark:PurelyFunctionalGPU-ProgrammingwithNestedParallelismandIn-PlaceArrayUpdates PLDI 2017

50

do 10 i = 1, 100, 2

M (i) = .true.M (i+l) = .false.

10 continueA(*) = B(*) + A(*)

C(M(*)) = B(M(*)) + A(M(*))

Illiac IVFortran

C[i][ j] = __sec_reduce_add( a[ i][:] * b[:][j]);

Cilk plus

December4,2017

Page 51: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Array operations and optimizationsAnexample

• Applyingarithmeticrulessuchasassociativityofmatrix-matrixmultiplicationtoreducethenumberofoperations.– Whichoneisbetter?

• (((A*B)*C)*D)*E,• (A*((B*C)*D))*E,• (A*B)*((C*D)*E),• …

– Findbestusingdynamicprogramming– YoichiMuraokaandDavidJ.Kuck.1973.Onthetimerequiredforasequenceofmatrixproducts.Commun.ACM16,1(January1973),22-26.

51December4,2017

Page 52: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Reduceoverheadofinterpretation

• SeeHaichuan Wang,DavidA.Padua,PengWu:Vectorization ofapplytoreduceinterpretationoverheadofR.OOPSLA 2015:400-415

52December4,2017

Page 53: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Arrayoperationsandirregularcomputations

• Arrayoperationsareanidealnotationtomanipulatesparsearrays.

53December4,2017

Page 54: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

SparsearraysinMATLABload west0479.matA = west0479;S = A * A' + speye(size(A));pct = 100 / numel(A);

figurespy(S)title('A Sparse Symmetric Matrix')nz = nnz(S);xlabel(sprintf('nonzeros = %d (%.3f%%)',nz,nz*pct));

• Arraysappearasdense,butrepresentedinternallyasspaarse.• Theinterpreterandlibrarieshandlethesparseness

54

S = sparse(m,n)

December4,2017

Page 55: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Sparsearrayscanbeabstractedasdensearraysincompiledlanguages

• BikandWijshoff*developedastrategytotranslatedensearrayprogramsontosparseequivalents.

55

do i=1,nacc=acc+a(i,:)*<1:n>

do i=1,ndo j=1,n

if (i,j) ∈ E a thenacc=acc+a’(σa(i,j))*j

do i=1,ndo ad∈PADa

j=π2σa-1(ad)acc=acc+a’(ad)*j

do i=1,ndo ad=alow(i),ahigh(j)

j=aind(ad)acc=acc+aval(ad)*j

*Aart J.C.Bik,HarryA.G.Wijshoff:CompilationTechniquesforSparseMatrixComputations.InternationalConferenceonSupercomputing1993:416-424December4,2017

Page 56: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Arrayoperationsandirregularcomputations

• Notonlycanarraynotationbeusedtorepresentdensecomputations,itcanbeusedforanumberofotherdomains.

• Awiderangeofdomains

56December4,2017

Page 57: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Howwide?

• Atthetime(ca.1959),Illinois,withthefirstofits‘ILLIAC’supercomputers,wasagreatcentre ofcomputing,andGolubshowedhisaffectionforhisAlmaMaterbyendowingachairthere50yearslater.Rumour hasitthatthefundsforthegiftcamefromGooglestockacquiredinexchangeforsomeadviceonlinearalgebra.Google’sPageRanksearchtechnologystartsfromamatrixcomputation— aneigenvalueproblemwithdimensionsinthebillions.Hardlysurprising,Golub wouldhavesaid:everythingislinearalgebra.Nature2007

57December4,2017

Page 58: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

New operatorsArraysforgraphsalgorithms

• AsimplealgorithmforSSSP(Bellman-Ford)canberepresentedasfollows:

58

for(i=1; i<=n; i++)d = d ⨁ A ⨂ d

JeremyKepner andJohnGilbert.2011.GraphAlgorithmsintheLanguageofLinearAlgebra.SIAMPressT.Mattson,D.Bader,J.Berry,A.Buluc,J.Dongarra,C.Faloutsos,J.Feo,J.Gilbert,J.Gonzalez,B.Hendrickson,J.Kepner,C.Leiserson,A.Lumsdaine,D.Padua,S.Poole,SteveReinhardt,M.Stonebraker,S.Wallach,A.Yoo.StandardsforGraphAlgorithmPrimitivesDecember4,2017

Page 59: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

4. NOTATIONS4.3.EXTENSIONSTOARRAYNOTATIONFORPARALLELISM

59December4,2017

Page 60: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

repmat(h, [1, 3])

circshift( h, [0, -1] )

transpose(h)

60

Communicationasarrayoperations

December4,2017

Page 61: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

A00 A01

A10

A20

A11

A21 A22

A12

A02 B01

B10

B20

B11

B21 B22

B12

B02B00

A00B00

A01B11

A02B22

A12B21

A11B10

A10B02

A22B20

A20B01

A21B12

A00B00

A01B11

A02B22

A12B21

A11B10

A10B02

A22B20

A20B01

A21B12

initial skew

shift-multiply-add

61

Cannon’salgorithm

December4,2017

Page 62: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

for k=1:nc = c + a × b;a = circshift(a,[0 -1]); b = circshift(b,[-1 0]);

end

Cannnon’s Algorithm

62

JamesC.Brodman,G.CarlEvans,MuratManguoglu,AhmedH.Sameh,María Jesús Garzarán,DavidA.Padua:AParallelNumericalSolverUsingHierarchicallyTiledArrays.LCPC 2010:46-61

December4,2017

Page 63: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Barrierelimination

63

X(3)=Y(3)*2X(1)=Y(1)*2 X(2)=Y(2)*2 X(4)=Y(4)*2

Z(1)=X(1) Z(2)=X(3)

Z(3)=X(2) Z(4)=X(4)

X(1:4)=Y(1:4)*2;Z(1:2)=X([1,3]);Z(3:4)=X([2,4]);

December4,2017

Page 64: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Barrierelimination

64December4,2017

Page 65: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Asynchronoussemantics

• Insomecases,itcouldbedesirabletodelayupdatesacrossthewholemachine.Atsomepoint,aglobalupdateismade.

• Thisisaparticularexampleofcommunicationinasynchronousalgorithmswheredataassignmentsufferdelays.

65December4,2017

Page 66: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

TiledLinearAlgebra– SSSPExample

// the distance vectord = inf; d(1) = 0;// the mask vectorL = 0; L(1) = 1;while (L.nnz())for i = 1:LOC_ITERr <- d + (trans(A)*(d*.L);L <- (r != d);d <- r;

r += d;L <- (r != d);d <- r;

Partialmultiplication

Localcom

putatio

n

Globalreduction

66

Saeed Maleki,G.CarlEvans,DavidA.Padua:TiledLinearAlgebraaSystemforParallelGraphAlgorithms.LCPC2014:116-130December4,2017

Page 67: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

4. NOTATIONS4.3.WHEREDOWESTAND?

67December4,2017

Page 68: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

Arraynotation

• Arraynotationhasalonghistory• Isapowerfulnotationthatcanbeusedforawiderangeofapplications.

• Forparallelism,itisanunfulfilledpromise• Thehighlymathematicalnotationisachallenge• Weseeincreaseuseofthenotationandexpectittobecomepopularalsoforirregularalgorithms.

• Challenges:– developthenotation– Designcompileroptimizations

68December4,2017

Page 69: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

5.EPILOG

69December4,2017

Page 70: High level abstractions for irregular computations...Programming irregular computations for performance •Today, programming of irregular computations is typically at a low level

• In1976notmuchwasknownaboutparallelprogrammingofirregularcomputations.

• 40yearslaterifisimpressivehowmuchhavebeenlearnedoncompilationandnotations.

• Butthedevelopmentofwidelyusedtoolsbasedontheseideasremainachallenge.

70December4,2017