frédéric brégier thèse présentée à l’université de bordeaux i 21 décembre 1999
DESCRIPTION
Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières. Frédéric Brégier Thèse présentée à l’Université de Bordeaux I 21 Décembre 1999. Frame of Work. Parallel program by compilation - PowerPoint PPT PresentationTRANSCRIPT
Frédéric Brégier - LaBRI 1
Extensions du langage HPF pour la mise en œuvre de Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de programmes parallèles manipulant des structures de
données irrégulièresdonnées irrégulières
Extensions du langage HPF pour la mise en œuvre de Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de programmes parallèles manipulant des structures de
données irrégulièresdonnées irrégulièresFrédéric Brégier
Thèse présentée à l’Université de Bordeaux I
21 Décembre 1999
Frédéric Brégier - LaBRI 2
Frame of Work
•Parallel program by compilation
•HPF: standard for Data-parallel programs (regular programs)
•Need investments for irregular programs: poor efficiencies
•Optimizations at compile-time
•Optimizations at run-time (generated at compile-time)
Frédéric Brégier - LaBRI 3
Plan
•Optimizations at compile-time
•Irregular Data Structure (IDS)
•A Tree to represent an IDS
•Optimizations at run-time
•Inspection-Execution principles
•Irregular communications: irregular active processor sets
•Irregular iteration spaces
•Scheduling of loops with irregular loop-carried dependencies
•New data-parallel irregular operation: progressive irregular
prefix operation
•Conclusion and Perspectives
Frédéric Brégier - LaBRI 4
IF (B(I) is local) THEN Send(B(J) to Owner(A(I))) END IFIF (A(I) is local) THEN Receive(in TMP from Owner(B(J))) A(I) = TMP + XEND IF
A
B
X Y
HPF (High Performance Fortran):HPF (High Performance Fortran): data-parallel languageMay 1993 HPF 1.0, January 1997 HPF 2.0
• Fortran 95 source code + structured comments (!HPF$) (distributions + parallel properties)
• Target Code : SPMD parallel code
•« Owner computes » rule• Runtime guards and communication generations
A(I) = B(J) + X
A A AB B B
X Y X Y X Y
Frédéric Brégier - LaBRI 5
Optimizations at compile-timeLoop iteration space
•Affine expression
•Local loop bounds
•Not optimizable
!HPF$ INDEPENDENTDO I = 1, N A(I) = A(I) + 1END DO
! Cyclic Distribution caseDO I = PID+1, N, NOP A(I) = A(I) + 1END DO
! Block Distribution case (N dividable by NOP)LB = BLOC * PID + 1UB = min(N, LB+BLOC)DO I = LB, UB A(I) = A(I) + 1END DO
! Indirect distributionDO I = 1, N IF (A(I) is local) THEN A(I) = A(I) + 1 END IFEND DO
•Irregular = « what is not regular », not optimizable
Frédéric Brégier - LaBRI 6
Plan
•Optimizations at compile-time
•Irregular Data Structure (IDS)
•A Tree to represent an IDS
•Optimizations at run-time
•Inspection-Execution principles
•Irregular communications: irregular active processor sets
•Irregular iteration spaces
•Scheduling of loops with irregular loop-carried dependencies
•New data-parallel irregular operation: progressive irregular
prefix operation
•Conclusion and Perspectives
Frédéric Brégier - LaBRI 7
Irregular Data Structure (IDS)
•Standard irregular format: indirect access arrays, example CSCI II III IV V VI VII VIII
12345678
1 3 5 6 9 12 16 18 21
1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8
1
1 5
3
2 5
A(1,1) DA(JA(1)) (IA(JA(1)) = 1)A(6,4) DA(JA(4)+1) (IA(JA(4)+1) = 6)A(:,4) DA(JA(4):JA(5)-1)
JA(1:9)
IA(1:20)
DA(1:20) = Non zero values of A
•Irregular distribution formats:
!HPF$ DISTRIBUTE JA(BLOCK) !HPF$ DISTRIBUTE IA(GEN_BLOCK(/5, 10, 5/))
Frédéric Brégier - LaBRI 8
Problems at compile-time
•Distribution : unknown alignment between arrays of the IDS
•Data accesses: unknown indexes (indirection)
1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8
1 3 5 6 9 12 16 18 21
DA(JA(4)+1) JA(4) = ?
6
6
•Implies additional run-time guards and communications•Inefficient SPMD code
JA(1:9)
DA(1:20)
Frédéric Brégier - LaBRI 9
Related Works
•Regular to Irregular Compilation•Bik et Wijshoff : « Sparse Compiler »
•Sparse Matrix with known topology•Regular analysis + known topology•IDS chosen by the compiler
•Pingali et al.•Relational description (between components and access functions)•Non standard and difficult notations
•Compilation of irregular programs•Vienna Fortran Compilation System: SPARSE directive
•Storage format specification•Limited to storage formats known by the compiler
Frédéric Brégier - LaBRI 10
Plan
•Optimizations at compile-time
•Irregular Data Structure (IDS)
•A Tree to represent an IDS
•Optimizations at run-time
•Inspection-Execution principles
•Irregular communications: irregular active processor sets
•Irregular iteration spaces
•Scheduling of loops with irregular loop-carried dependencies
•New data-parallel irregular operation: progressive irregular
prefix operation
•Conclusion and Perspectives
Frédéric Brégier - LaBRI 11
I II III IV V VI VII VIII
1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8
The Tree: a generic data structure with hierarchical access
•From a data to a tree:I II III IV V VI VII VIII
12345678
•Representation in HPF2: derived data type of Fortran 95type level2 integer ROW !row number real VAL !non zero valueend type level2
type level1 type (level2), pointer :: COL(:) !columnend type level1
type (level1), allocatable :: A(:) !matrix with a hierarchical access by column
!HPF$ TREE
Tree Matrix CSC
A(i)%COL(j)%VAL A(j,i) DA(JA(i)+j-1)
A(i)%COL(:)%VAL A(:,i) DA(JA(i):JA(i+1)-1)
Frédéric Brégier - LaBRI 12
Distribution of a TREE
I II III IV V VI VII VIII
1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8
!HPF$ DISTRIBUTE A(BLOCK)!HPF$ DISTRIBUTE A(INDIRECT(/1,2,3,2,1,2,3,1/))
Frédéric Brégier - LaBRI 13
Example of improvement!HPF$ DISTRIBUTE A(BLOCK)
!HPF$ INDEPENDENT FORALL (I = 3:N-2) A(I)%COL(:)%VAL = A(I-2)%COL(:)%VAL + A(I+2)%COL(:)%VAL END FORALL
!HPF$ DISTRIBUTE DA(GEN_BLOCK(array))!HPF$ INDEPENDENT FORALL (I = 3:N-2) DA(IA(I):IA(I+1)-1) = DA(IA(I-2):IA(I-1)-1) + DA(IA(I+2):IA(I+3)-1) END FORALL
TMP(:) = Global Copy with BCAST(DA(:))DO I = 3, N-2 local_bound(DA(IA(I):IA(I+1)-1), lb, ub) DO J = lb, ub DA(J) = TMP(J1)+TMP(J2) END DOEND DOIA(I-2) = ?? : IA(I-1)-1 = ??
Communications on frontiers onlyAs SHADOW in HPF2
Global Copy+Bcast of DA
local_bound(A(:), lb, ub)TMP(lb:ub) = Local Copy of Local Part(A(lb:ub))Shadow_Update(TMP(:), -2,+2)local_bound(A(3:N-2), lb, ub)DO I = lb, ub A(I)%COL(:)%VAL = TMP(I-2)%COL(:)%VAL + TMP(I+2)%COL(:)%VALEND DO
Frédéric Brégier - LaBRI 14
Arrays
DALIB
MPI
Trees/Derived Types
DALIB TriDenT
MPI
I II III IV V VI VII VIII
1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8
I II III IV V VI VII VIII
1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8
Frédéric Brégier - LaBRI 15
Serial Product
0
10
20
30
40
50
60
70
80
90
Tim
es in
sec
onds
F90 Derived Type
F90 ADAPTORMatrix (F77)
F90 ADAPTORTriDenTM
atri
x V
ecto
r P
rod
uct
Parallel Product (dense notations)
70
80
90
100
processors (1-16)
Rel
ativ
e E
ffic
ienc
ies
%
HPF2/Matrix
HPF2/TREE
IBM SP2-LaBRI4096x4096
Frédéric Brégier - LaBRI 16
•Advantages:
•Less indirections
•Less unknown alignments
•Better compile-time analysis (locality and dependence)
•Generic (defined by the user)
•Low overhead
•Disadvantages:
•Not necessary implemented in HPF compilers: portability
•Need to rewrite irregular code (with derived types)
Frédéric Brégier - LaBRI 17
Plan
•Optimizations at compile-time
•Irregular Data Structure (IDS)
•A Tree to represent an IDS
•Optimizations at run-time
•Inspection-Execution principles
•Irregular communications: irregular active processor sets
•Irregular iteration spaces
•Scheduling of loops with irregular loop-carried dependencies
•New data-parallel irregular operation: progressive irregular
prefix operation
•Conclusion and Perspectives
Frédéric Brégier - LaBRI 18
Inspection-Execution
Inspection: scan the program to analyze in order to get useful informationExecution: execute the true computations according to the optimized scheme induced by the inspected information
DO I = 1, N A(I) = B(INDEX(I))END DOModify B
DO I = 1, N if (A(I) is local) then Add INDEX(I) to local_index end ifEND DOExchange info on local_index (what indexes to send, to receive)
Gather (B(local_index(:)) into Copy_B)I_local = 1DO I = 1, N if (A(I) is local) then A(I) = Copy_B(I_local) I_local = I_local + 1 end ifEND DOModify B
DO STEP = 1, S
END DO
DO STEP = 1, S
END DO
INSPECTION
EXECUTION
often iterative schemes
Related works:
•PARTI: iterative scheme•CHAOS: iterative and adaptive scheme (by steps)
Integrated in Fortran D and Vienna Fortran Compilation System
•PILAR: iterative and multi-phase scheme, basic element = sectionCompiler PARADIGM
•ADAPTOR: directive TRACE, dynamic adaptive scheme
Frédéric Brégier - LaBRI 19
•ON HOME Directive: to control the computation mapping
!HPF$ ALIGN (I) WITH A(I) :: B, C
!HPF$ INDEPENDENT DO I = 1, N
C(INDEX(I)) = A(I) * B(I) END DO
DO I = 1, N-1 if (A(I) is local) then call Send(A(I) to Owner( C(INDEX(I)) )) call Send(B(I) to Owner( C(INDEX(I)) )) end if if (C(INDEX(I)) is local) then call Receive(TMP1 from Owner( A(I) )) call Receive(TMP2 from Owner( A(I) )) C(INDEX(I)) = TMP1 * TMP2 end ifEND DO
DO I = 1, N-1 if (A(I) is local) then TMP = A(I) * B(I) call Send(TMP to Owner( C(INDEX(I)) )) end if if (C(INDEX(I)) is local) then call Receive(TMP from Owner( A(I) )) C(INDEX(I)) = TMP end ifEND DO
!HPF$ ON HOME (A(I))
HPF2: communication optimizations with active processor sets
Frédéric Brégier - LaBRI 20
Irregular Active Processor Sets
I II III IV V VI VII VIII
12345678
ON HOME A(1,I) + ON HOME A(1,V)ON HOME A(2,II) + ON HOME A(2,V)ON HOME A(3,III)
•Less active processors in collective communications•Less communications (reduction or broadcast)•Less synchronizations
Extensions to the ON HOME directive:!HPF$ ON HOME (A(K,:)) !HPF$ ON HOME (A(K,INDEX(K))
FORALL(J=I:VIII, J .eq. K .or. A(K,J) .ne. 0.0)!HPF$ ON HOME (A(K,J), J=I:VIII, J .eq. K .or. A(K,J) .ne. 0.0)
IIIIIIIVVVIVIIVIII
A B
!HPF$ ALIGN A(*,K) with B(K) B(K) = Sum(A(K,:))
Frédéric Brégier - LaBRI 21
I II III IV V VI VII VIII
12345678
Cholesky Example: TREE and Set (Matrix with 65024 columns)
DO K = 1, N
allocate (TMP(N)) TMP(:) = 0.0
DO J = 1, K-1 IF (A(K,J) .ne. 0.0) THEN CMOD (TMP, A(:,J)) END IF END DO A(:,K) = A(:,K) + TMP(:) CDIV (A(:,K))
END DO
!HPF$ INDEPENDENT, REDUCTION (TMP(:))
!HPF$ ON HOME (A(K,J), J = 1:K, J.eq.K .or. A(K,J) .ne. 0.0), NEW(TMP), BEGIN
!HPF$ END ON
20
40
60
80
100
120
140
160
180
200
1 2 4 8 16Processors
Tim
es in
sec
onds
V0Vset
IBM SP2-LaBRI2D-Grid 255x255
Frédéric Brégier - LaBRI 22
Plan
•Optimizations at compile-time
•Irregular Data Structure (IDS)
•A Tree to represent an IDS
•Optimizations at run-time
•Inspection-Execution principles
•Irregular communications: irregular active processor sets
•Irregular iteration spaces
•Scheduling of loops with irregular loop-carried dependencies
•New data-parallel irregular operation: progressive irregular
prefix operation
•Conclusion and Perspectives
Frédéric Brégier - LaBRI 23
Irregular Iteration Space!HPF$ INDEPENDENT, REDUCTION(B) DO J = 1, K-1 IF (A(K,J) .ne. 0.0) THEN … END IF END DO
!HPF$ DISTRIBUTE A(:,BLOCK)
Cholesky
15
35
55
75
95
115
135
155
175
195
1 2 4 8 16Processors
Tim
es in
sec
onds
VsetVset+Loop
IBM SP2-LaBRI2D-Grid 255x255
Frédéric Brégier - LaBRI 24
Plan
•Optimizations at compile-time
•Irregular Data Structure (IDS)
•A Tree to represent an IDS
•Optimizations at run-time
•Inspection-Execution principles
•Irregular communications: irregular active processor sets
•Irregular iteration spaces
•Scheduling of loops with partial loop-carried dependencies
•New data-parallel irregular operation: progressive irregular
prefix operation
•Conclusion and Perspectives
Frédéric Brégier - LaBRI 25
Loop with Partial Loop-Carried Dependencies
•Loop-carried dependencies:DO I = 1, N DO J = 1, I-1 A(I) = A(I) + A(J) END DOEND DO
•Partial loop-carried dependencies:DO I = 1, N DO J = 1, I-1 IF (TEST(I,J)) THEN A(I) = A(I) + A(J) END IF END DOEND DO
•Precomputable partial loop-carried dependencies: PPLD LoopTEST never modified
Frédéric Brégier - LaBRI 26
PPLD Loop
DO I = 1, N
B = 0.0!HPF$ INDEPENDENT, REDUCTION(B) DO J = 1, I-1 IF (TEST(I,J)) THEN B = B + A(J) END IF END DO A(I) = A(I) + B
END DO
Steps P1 P2 P3 P41 1 1 1 12 2 2 2 23 3 3 3 34 4 4 4 45 5 5 5 56 6 6 6 67 7 7 7 78 8 8 8 89 9 9 9 910 10 10 10 1011 11 11 11 11
Steps P1 P2 P3 P41 1 42 2 23 3 34 5 6 6 55 7 86 9 97 10 10 10 108 11 11
I Owner (A(I)) TEST(I,J) = TRUE1 1 -2 2 13 3 14 4 -5 1 1 46 2 2 37 3 38 4 49 1 1 4 5 810 2 4 5 7 911 3 6 7
!HPF$ ON HOME (A(J), J=I .or. TEST(I,J))
!HPF$ END ON
Set(I)P1
P1 P2P1 P3
P4P1 P4P2 P3
P3P4
P1 P4P1 P2 P3 P4
P2 P3
4
4
Frédéric Brégier - LaBRI 27
PPLD Loop Scheduling
•Associates one iteration with one task•Precomputable Partial Loop-Carried Dependencies = task graph•Scheduling problem: HPF context
•Known mapping (HPF data distribution => task mapping)•Data distribution => possible multi-processor tasks
•« Scheduling multi-processor tasks on dedicated processors »
Related Work:•Complexity: Drozdowski 97, Krämer 95: NP-Hard Problem
•Wennink 95: Scheduling algorithm
•PYRROS / RAPID libraries: precomputable task graph with mono-
processor tasks (inspection-execution)
Frédéric Brégier - LaBRI 28
Scheduling Tasks Associated to a PPLD Loop
1) DAG GenerationNew SCHEDULE directive
2) SchedulingSimple and Wennink’s scheduling
3) ExecutionStatic execution / Dynamic executionSingle thread / Multi-thread execution
4) Experimental Results
Frédéric Brégier - LaBRI 29
1011
976
8532
SCHEDULE directive
Dependencies between iterations (inspection-execution):
DO I = 1, N!HPF$ ON HOME (A(J), J=I .or. TEST(I,J)) B = 0.0!HPF$ INDEPENDENT, REDUCTION(B) DO J = 1, I-1 IF (TEST(I,J)) THEN B = B + A(J) END IF END DO A(I) = A(I) + B!HPF$ END ON END DO
!HPF$ SCHEDULE (J = 1:I-1, TEST(I,J) )I TEST(I,J) = TRUE1 -2 13 14 -5 1 46 2 37 38 49 1 4 5 810 4 5 7 911 6 7
1 4
1011
976
8532
1 4
Frédéric Brégier - LaBRI 30
Distributed Scheduling Algorithms
•Simple Scheduling: local tasks only
1011
976
8532
1 4a d
a b a c a d d
c a db c
b c a b c d10
9
532
1
1 2 3 5 9 10
Steps P1 P2 P3 P41 1 42 2 2 83 3 34 5 6 6 55 9 7 96 10 10 10 107 11 11
Order in task scheduling: priority criteria based on critical path
1
2
333
4
123334
1
2 3 5
9
10
2 3 5
Problem of scheduling coherence between processors: prevent deadlockBy step scheduling algorithm
List for task execution
Frédéric Brégier - LaBRI 31
Scheduling•Wennink’s Scheduling: multi-processor tasks + insertion principle
1 2 3 5 9 10Simple:
Wennink: 1 23 5 9 102
Steps P1 P2 P3 P41 1 42 3 3 83 2 2 74 5 6 6 55 9 11 11 96 10 10 10 10
Complexity: Simple WenninkComputations O(N log N) O(N²)Memory O(|E|) O(N² + |E|)
1011
976
8532
1 4
Frédéric Brégier - LaBRI 32
Static execution / Dynamic execution•HPF context: task costs not known at compile-time => unit costs•Static Critical Path = longest path (in edges) to the virtual « End » vertex
1011
976
8532
1 4
1
2
33
4
1
22
3 3
4
1 2 3 5 9 10
2
3
4
6 10 11
6 7 10 11
8 5 9 10
Static Scheduling: static order of execution
a
b
c
d
•Iterative program: first iteration records times, then re-scheduling Dynamic Scheduling
1 2 3 5 9 10
2
3
4
6 10 11
6 7 10 11
8 5 9 101011
976
8532
1 4
t10
t9
t5t3
t1
t11
t7t6
t2 t8
t4
1 3 2 5 9 10
2
3
4
6 11 10
7 6 11 10
8 5 9 10
E
Frédéric Brégier - LaBRI 33
Single Thread / MultiThread execution
0
1 2
•2 independent tasks on the same processor•Same priority: which task first ?
•Single Thread: the lower rank first
•MultiThread: both
•User mode thread system: Marcel from PM² HighPerf
ComputationsWaiting for communicationCommunications
Task K
Task K’
Task K
Task K’
Overlapping communicationsby computations
Frédéric Brégier - LaBRI 34
Experimental Results: Matrix with 261121 columns
•Cholesky on sparse matrix with column-block access•Irregular data structure: TREE•Distribution: INDIRECT (minimizing communications)
•VSet: V0 + Set•Stat: VSet+SCHEDULE (static simple scheduling)•Dyn: VSet+SCHEDULE (dynamic simple scheduling)•Stat_th: Stat + Threads•W: VSet+SCHEDULE (dynamic Wennink’s scheduling)
Relative efficiencies (global time)
30
50
70
90
110
130
150
1 2 4 8 16
% v
s V
set
VsetStatDynStat_thW
Relative Efficiencies (Re-execution only)
90
110
130
150
170
190
210
1 2 4 8 16
% v
s V
set
VsetStatDynStat_thW
IBM SP2-LaBRI2D-Grid 511x511
Frédéric Brégier - LaBRI 35
Plan
•Optimizations at compile-time
•Irregular Data Structure (IDS)
•A Tree to represent an IDS
•Optimizations at run-time
•Inspection-Execution principles
•Irregular communications: irregular active processor sets
•Irregular iteration spaces
•Scheduling of loops with partial loop-carried dependencies
•New data-parallel irregular operation: progressive
irregular prefix operation
•Conclusion and Perspectives
Frédéric Brégier - LaBRI 36
Irregular Progressive PREFIX Operation
•Irregular Progressive PREFIX Operation: found in PPLD Loop
],1[)( iBavecXgfX iBk
ki
i
•Irregular Coefficient:
%,1
i
Baverage i
ni
•Exploit independencies with specific communication schemes
Frédéric Brégier - LaBRI 37
6
4 5
3
Irregular Progressive PREFIX Operation
1 2
Asynchronous communication
Synchronous REDUCTION
6
5
3
1 2
4
Frédéric Brégier - LaBRI 38
Irregular Progressive PREFIX Operation
PREFIX directive/clause: differs from REDUCTION clause
DO I = 1, N
DO J = I+1, N IF (TEST(J,I)) THEN A(J) = A(J) + A(I) END IF END DOEND DO
DO I = 1, N B = 0.0
DO J = 1, I-1 IF (TEST(I,J)) THEN B = B + A(J) END IF END DO A(I) = A(I) + BEND DO
!HPF$ INDEPENDENT, REDUCTION(B)!HPF$ INDEPENDENT, PREFIX(B)
!HPF$ PREFIX(B)
Inspection(A,TEST)DO I = lb, ub (ON HOME A(I)) Finalize(A(I)) (receive contributions prev. send) DO J = I+1, N IF (TEST(J,I)) THEN A’(J) = A’(J) + A(I) (send when ready) END IF END DOEND DO
DO I = 1, N (Set(I)) B = 0.0 DO J = lb, ub (ON HOME A(J)) IF (TEST(I,J)) THEN B = B+ A(J) END IF END DO A(I) = A(I) + REDUCTION(B)END DO
Comparisons: PREFIX vs REDUCTION
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
2
Irregular Coefficient (TEST)
Equality
PREFIX vs REDUCTION
IBM SP2-LaBRI
Frédéric Brégier - LaBRI 39
Irregular Progressive PREFIX Operation: Cholesky ExampleIrregular coef. = 0.1%
Global Time
75
125
175
225
275
325
375
425
475
1 2 4 8 16processors
Tim
es v
ersu
s V
1 (V
1/T
) (%
) Vset
VsetP
Stat
StatP
PaSTiX
Re-Execution Time
100
150
200
250
300
350
400
1 2 4 8 16processors
Tim
es v
ersu
s V
1 (V
1/T
) (%
)
Vset
VsetP
Stat
StatP
PaSTiX
IBM SP2-LaBRI2D-Grid 511x511
Frédéric Brégier - LaBRI 40
Conclusion•TREE: Irregular Data Structure, more information at compile-time
Locality and dependence analysis => TriDenT
•Inspection/Execution: Still information not known at compile-time=> CoLUMBO
•Irregular Active Processor Sets: fundamental inspection/executionUp to a factor of 10
•Irregular Iteration Space: minor improvement
•Loop with Partial Loop Carried Dependencies:•DAG associated with loop iterations•Semi-automatic task scheduling at run-time•PREFIX operation
•Inspection costs repayed with only one iteration
•Experimental Results: Efficiency close to hand-made codes (time ratio between 1.25 and 2.5)
Frédéric Brégier - LaBRI 41
Perspectives
•Integration in a HPF compiler: preliminary experiments
•TREE: ADAPTOR•Set inspection/execution, PREFIX inspection/execution:
NESTOR (Silber 98)
•Transposition to other parallel languages:
•Irregular Data Structures: always a problem => TREE
•Irregular iteration space
•OpenMP: Virtual shared memory => Data distribution
Irregular active processor sets