of · parafrase-2 compiler. experimental results generated using the spec9.5 and nas bench- mark...

71
Ian Christopher Maione -4 thesis submitted in conforniity with the requirements for the degree of Master of Science Graduate Department of Computer Science Cniversi ty of Toronto @ Copyright by Ian Christopher hfaione 1997

Upload: doannguyet

Post on 13-Dec-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Ian Christopher Maione

-4 thesis submitted in conforniity with the requirements for the degree of Master of Science

Graduate Department of Computer Science Cniversi ty of Toronto

@ Copyright by Ian Christopher hfaione 1997

Page 2: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

National Library Bibliothéque nationale du Canada

Acquisitions and Acquisitions et Bibliographic Services sewices bibliographiques

395 Wellington Street 395. rue Wellington OttawaON K1AON4 Ottawa ON K 1 A ON4 Canada Canada

Your IZk Vofre m h m œ

Our hie Notre reiermce

The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or seil reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/film, de

reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.

Page 3: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Abstract

Enabling Dependence Ana lp i s in C

Ian Christ op her hlaione

Master of Science

Graduate Department of Computer Science

University of Toronto

1997

Dependence a n a l p i s is a fundamental tool used in many compiler transformations

which optimize and parallelize scientific code written for high-performance vector and

parallel corn put e r architectures. Lmplementing dependence analysis for t h e C prograni-

niing language is diffictilt because of complications caused by nonstandard cont rol flow

a n d use of pointers t o reference arrays.

In order t o enable dependence analysis in C. code can be preprocessed t o convert loops

which violate FORTR-AN-like conventions into a canonical form which can be processed

success fu l l~ by the dependence analyzer. We developed two algori t hms t O enable suc11

processing. Loop control flou normalizntion (LC'FiV) normalizes loop control flow and

array reference subscripts. Pointer nrra y access n o m a k a t i o n ('4 .VJ recoïers implici t

array references through pointers.

.A prototype implementation of the LCFN and P.AN methods was built using the

Parafrase-2 compiler. Experimental results generated using the SPEC9.5 a n d NAS bench-

mark sui tes showed t hat t hese techniques can successfully enable dependence analysis.

Page 4: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Acknowledgement s

First of ail. I would like to thank my supervisor. Professor Tarek AbdeIrahnian. for

his helpful support and advice in tlie development of this thesis. 1 would also like to

thank my second reader. Professor Ken Sevcik. for his helpful advice on iinproviiig the

quality and presentation of t bis thesis.

1 ivould also like to thank rny fellow students in the zoo. for the enjoyable work

environment they have provideci during the writing of this thesis. 1 particiilarly ~vould

like to t hank t h e CO-creators of DAWG. .lin Lee and Anuj Gujar. for Iielping to niake

possible tlie endIess hours of recreation without wliich this tliesis would never have beeri

completed. In this vein. I rvould also Iike to tliank Daniel Slarcu. for keeping nie Ii~inibla

by defeating me day after d a . 1 would like to ttiank Fraiicois Pitt for his help a n d advice

i r i dealing wit h the intricacies of DmX. Ricli Paige for ensuring tliat I never went iiito

sports-wit hdrawal. Angela Dernke for briefly putting up witli me as ati officeniate. and

Jeff Tuppcr for putting up with m e as an officeniate for mucli longer.

I particularly w o d d like to express niy appreriation for the loving support and rii-

coiiragement of niy faniily. especially niy parents. wliicli lias beeri irivaliiablr to rne tliiriiig

tlie tinie that 1 have beeri conipleting tliis work.

Finally. 1 tvoiilti like to gratefiilly ackiio~vlecige the finaririal support providrd 11)-

SSERC' aiicl tlie Vniversity of Toronto for the tlevelopnient of tliis thesis.

Page 5: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Contents

1 Introduction 1

1.1 The Dependence Analysis Problem '1 . . . . . . . . . . . . . . . . . . . . .

1.1.1 T h e Dependence Probleni in FORTRAN . . . . . . . . . . . . . . 4

1.1 . 2 The Dependence Probleni i r i C . . . . . . . . . . . . . . . . . . . 3 -

1 . 7 .A drnissible Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1 2 1 .A dniissible Loop Normalization . . . . . . . . . . . . . . . . . . . S

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Thesis Contributions 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Thesis Organization 10

2 Normalizing Loop Control Flow I l

. . . . . . . . . . . . . . . . . . . . . . . . . 2.1 LCFN .\ lgorit tirn Owrview 13

. . . . . . . . . . . . . . . . . . . . . . . . . . 2 Loop Syiitas Preprocessing Ili

. . . . . . . . . . . . . . . . . . . . . . . . 2.3 Cornputing Loop Trip Counts 1S

. . . . . . . . . . . . . . . . . . . . . . 3 . 1 HandlingZeroTripLoops 21

2.4 Subscript Sormalizatioti . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

. - . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Canotiical Loop Generat ion 1.3

3 Pointer Array Access Normalization 27

3.1 P.4N.Algorithrn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

. . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 One-Dimensional P.AS 3 1

. . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 ~Iultidimensional PAX 3%

4 Prototype Implementation 36

. . . . . . . . . . . . . . . . . . . . . . . . . . 4 . I The Parafrase-2 Compiler 36

Page 6: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

3.1.1 Overview of Parafrase . . . . . . . . . . . . . . . . . . . . . . . . 36

4 - 1 2 LCFN Implernentation . . . . . . . . . . . . . . . . . . . . . . . . 40

-1.1.3 PAN lmplementation . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1

4.2.1 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . -Il

1 Experiniental Results . . . . . . . . . . . . . . . . . . . . . . . . . 4 3

5 Related Work

6 Conclusion and Future Work 53

6.1 LoncIusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A Induction Variable Analysis 56

Page 7: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Chapter 1

Introduction

Dfpcndence anaiysis is a fundamental tool used in niany conipiler transformations istiicti

liave been developed to optiniize scientific and nunierical code written for Iiigli-perfortii-

arice vector and parallel cornputer architectures. Transforrnat ions \v hicli at tempt t O max-

imize parallelism and/or memory locality typically require dependence analysis. and tlieir

efficacy is often directly related to the eficacy of the latter analysis. Bacon et al. siiiii-

niarize a number of sucti techniques [BGSS-I]. Because dependence analysis is sucli a

fundamental part of compilation for parallel machines. a nuiiibrr of different tech~iic~iics

liave been developed for aiialyzing dependences between array references in loop riest S.

[Tow'iG. LVolY9. BanSY. BCK79. G KT9 1. L\r-Z9O. WT92. PugS21. Dependence analysis

tecliniques have been iiiiplemented in several rcsearch compiler systenis [AliS-l. ZBCSS].

aiid have also been inipleniented in soriie rorriiiiercial conipilers silch as li.4 P and \-.AS?'.

Ini plement a t ions of dependence aiialysis i r i bot li researcli and coniniercial systeiiis

liave focused on FORTR-AN progranis. Siiicr niost scieiitific code lias been writtrri i n

FORTR-AN. tliis is a natural developnieiit. By contrast. very ferv conipilers are capable

of doing dependence analysis in tlie C prograniniing language. Tliere is n o advaiitage

to writing scientific code in C if compilers cannot effectively parallelize it, because of

the more varied syntax of tlie C language. Conversely. since scientific applications coded

i n C are rare. the neecl for C conipilers to do sophisticated dependence analysis lias

not existed. However. tlie growing use of C++ as a language suggests tliat tlie ability

to handle C constructs may be a requirement for future compiler systenis wliicli do

Page 8: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

dependence analysis.

In order to implement dependence analyzers which are capable of tiandling C effec-

tively. one can eitlier handle the complexities of C syntax within the dependence analyzer

itself. or one can attenipt to preprocess the C source code before dependence analysis

in ortler to make it better adliere to the forni of FORTRAN-style loops. The latter

approach is clearly better from the point of view of modularity. since the particular

syntactic structures of a programming language are not essential to any particular de-

pendence analysis algorithm. It is also attractive from a software engineering point of

view. since handling C const ructs out side t lie dependence analyzer allows irn plementors

to avoid redesigning and rewriting arialyzer code each t inie a new dependence analysis

technique is impleniented. As a first step toward this goal. this tliesis applies a iiuniber of

different compiler techniques to the probleni of generating C code whicli is niore anirriable

to standard dependence analysis met liods. and implernents tlieni witliin an existi~ig corii-

piler environment (Parafrase-2). We teriii the process of generat ing t tiis nornialized code

ndmissible ioop norrnnlizatiori.

The reniainder of tliis chapter is orgariized as follows: Section I . I oiitliries the general

dependence analysis problerri. ln Section 1.1.1 and Section 1.1.2. tlir dependrrirc- anal-sis

problern is outlined in terms of the FORTR.AS and C' languages respectively. and the

difficulties introduced Liy the C laiiguage are described. In Section 1.2 a noriiializrd forni

for C loops is defined. as a goal for aclniissible loop nornializatiori.

1.1 The Dependence Analysis Problem

Dependence analysis is a well-studied proble~ri. and is fiindamental to niany otlier coni-

pi ier analyses designed to parallelize scient i fic and nunierical programs au toniat ically. .A

data dependence is said to esist betrveen ttvo statenients in a loop nest if one staterrieiit

writes a value that the otlier statemetit uses. There are three different types of data

dependence whicli are relevant in the context of parallelization. .\ flou7 d e p e r r d c r m exists

between statements S i and S2 if Si writes a value which is later read by S2. Similarly.

an ariti-dependence exists if SI reads a value which is later used by Sz. .4n output dc-

Page 9: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 1.1 : The dependence analysis problem

for I I = I l t o u l do for ï 2 = 1 2 to u2 do

end for end for

p e r i d ~ n c ~ exists if bot11 SI and S2 write the sanie valiie. Parallelizing conipilers generally

attempt to esecute different iterations of loops in parallel. Since a depencierice represeiits

a serial semant ic relat ionship between statements. a parallelizer must be able to detect

dependences. in particular l o o p - c a h e d dependences. mhicli exist betweeii different loup

iterations. In the absence of dependence analysis. a conipiler cannot parallelize. sincr

esecuting parallel iterations in the presence of dependences cari lead to incorrect code.

In general terms. t lie dependence analysis problem can be formulatecl as tlescril>etl 11).

Kolfe and Tseng [WTS'L]. Giveti a loop nest as in Figure 1.1. a dependeiice test atteiiipts

to deterniiiie if tlifferent array references can access tlie sarne array elemeiit during t lie

rsecution of tlie loop nest. i.e.. tliat tliere esist tivo sets of values { I l = il. . . . . Iri = ici}

and { I l = jl. . . . . Id = j d } suc11 t tiat:

3. Jm(ii. . . . . i d ) = gm(ji . . . . . j d ) for al1 I 5 ni C: S . where s is the numher of array

subscript functions.

The above set of equat ions are known as t lie dcperzderzcr ~quat ions . This forniulation of

t lie depetidence analysis problem iniplicit ly assumes t bat the seniantics of t lie pseuciocotle

f o r constructs have certain propert ies:

Page 10: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 1.2: Loop nest in FORTRr\X DO 100 Il = l1,ul

DO 200 12 = 12,u2

DO 300 Id = Id ,ud A(a1 * Il + . . . + ad * Id ) = . . . . . . = A(b1 * 11 + . . . + bd * I d )

CONTINUE

CONTINUE CONTINUE

Each loop Lias a s tandard syntactic forrn. with an explicitly defined index variable

I wtiicli is not modified by statenients witliin the loop nest (other tlian loop cotitrol

statements).

Eacli of the loops lias explicit iteration liinits and uk. which a re not iiioclifircl

within t h e loop nest (some dependence analysis techniques m a i require tliese to be

literal constants. or tliey tiiay be symbolic expressions. but tliey niust at least I w

loop invariant ).

Each array reference is niade using an expression specifying the naine of a statically

declared k-diniensional array. as well as k array index expressions. Each array incles

is a function solely of the loop indices (i-e.. tlie functions f i . . . 1,. 9,. . - . . gs are

functions only of I l . - . I d ) . as well as possibly constants or loop invariants.

There is no irregular control Row i r i the loop (i.e.. tliere are no statenieiits ivittiiii

the loop body whicti brandi outside the loop j.

1.1.1 The Dependence Problem in FORTRAN

In F0RTR.W. the dependence probleni can be expressed iising a DO loop nest. as

illustrated in Figure 1.2. The FORTRAX syntax corresponds well t o t he pseudocode

formulation of the dependence problein, since the semantics of a DO loop s t ipulate that

tlie values of the initial and upper liniits of eacli loop are the values of the corresponding

Page 11: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

expressions at the beginning of the loop's execution. regardless of whether tliey are later

niodified within the loop. The index variables are not modified within the loop. and

each access to an array is made through an explicit array reference consisting of tlie

narne of a static array and an explicit array index expression. Furthermore. there is a

well-defined loop increment (in this case 1 ). which also does not change mitliin t lie ioop.

hlost importantly. the index variables theniselves. their lower and upper liniits. ' and the

loop increnient can be determined by simple examination of the program syntax. witlioiit

recourse to further analysis. Thus a dependence analyzer for FORTR-AS code does ~ i o t

usually need complex analysis of tlie loop to construct t h e dependence equations.

I t sliould be noted tliat it is possible for aliasing to become a n issue in FORTR-AS

through the use of COMMON blocks. -4 C'041MO'i block can cause a global variable

or array to be referenced by different nanies inside subroutines. Even tliis type of simple

aliasing can cause serious problems for an opt iniizing compiler: nevert heless. t lie situation

is not nearly as complicated as in C. where pointer relationships can be clynarnically

clianged a t runtime tlirougli tlie use of pointer variables.

1.1.2 The Dependence Problem in C

Implenienting dependence analysis in C is substantia

wicler range of loop structures wliirli exist iii C. as ivd

Ily niore complex. I~ecause of t l i r

cl1 as the freer seriia~itics wliicti ( '

allows the programmer in regards to b o p control. and access to arrays. Tlitw aspects of

tlie C' language can be dividecl into two broad categories:

Loop coritrol issues.

0 Pointer access issues.

Loop Control Issues

Tliere are several aspects to C loop control fiow wliicli can violate the assiiriiptioiis rle-

scribed in Section 1.1:

'Having these values means that the t n p courit of the loop can be determined, which is important in several dependence analysis met hods.

Page 12: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

1. The loop can be written with various types of statements. (for. while. do-while.

i f /go to ) . '

2. C allows for and while loops to have free syntax with respect t o loop iiides vari-

ables. i. e:

0 C does not require t hat a f o r loop have a n explicit index variable. Tliere is

rio placeliolder in a w h i l e loop for an index variable.

0 -4 f o r or w h i l e loop rnay have multiple index variables. eitlier explirit!>- or

implici t ly.

0 C allows index variables defined in a f o r s ta tenient to be arbitrarily iiioclifird

within the body of the loop.

C does not require that a f o r statement have an erplicit loop increnient state-

ment. Sirriilarly. there is no placeholder in a whi le statemetit for a loop in-

crenient.

Because of these consideratioris. depetidence analysis in C' is niucli more cliffifi<-iilt tliari

in FORTR-\Y. because t h e basic information which tlie analyzer needs in vrcler to apply

niany of the coninion dependence analysis techniques is no longer iiiiniediately available

from the source code.

Pointer Related Issues

-4 riot lier sigtiificant corn plicat ion for a depeiidencr arialyzer in C arises froiii t lie fact t liat

variables and arrays can be and often are accessed using pointers. Tlie mere presence of

pointers in a program can make many kincls of analysis su bstantially more coniplex. This

is due to the fact tha t many compiler analyses require conservative assurnptions i f certaiti

variable references cannot be analyzed. In the case of dependence analysis. pointers pose

clifficul t ies rv!iich can be separated into two general categories:

1. hlodification of scalars by pointer dereference.

'Sorne of these structures are aIso possible in FORTRAN, but are more commonly used in C

Page 13: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Since tlie dependence analyzer is trying to determine wlietlier sets of array irides

expressions a r e equivalent over the loop iteration space. s dereference througli a

scalar pointer variable could potentially affect any of t h e loop control variables.

or variables which are involved in array index expressions. In sucli a case. tlie

dependence analyzer niay have to assume dependence in the absence of more sperific

information. limiting the efficacy of the analysis. Alttiough the general problerii of

alias analysis in C is a n ertreniely comples one. even partial resolutiori of scalar

pointer references mould make C more arneoable to dependence anaiysis.

3. Access to arrays via pointer derefererice.

In C. even determining what the index expressions of an array reference are cari

be a difficult problem because of the constructs in C which aIlow tlie programmer

to access array structures. For example. C allows the programnier to use pointer

aritlimetic t o access arrays witliout using explicit array index expressions. Tliis type

of syntax can obscure tlie array elenieiits being accessed [rom tlie compiler. Tlie

situation is further coiriplicated by tlie fact that pointer aliasing can even obscure

which array is beirig accessed in a given reference. Once again. in t lie absence of

appropriate inforrnat ion. t he conservative assunipt ion of dependence miist br niade

in t hese cases.

1.2 Admissible Loops

In order to enable deperidenct. arialysis iii C. it is reasoriable t o attenipt to traiisloriii C'

prograiiis which violate the previously described siniplifying assuniptions iiito prograiiis

wliicli coriform to tlierii. Tlius. we can defiiie out- goal as follows:

Definition 1 .A canoiiical loop is a loop of tlie forni:

for(i = O; i < a; i = i + 1)

{ (ioop body)

1

Page 14: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

such that:

(i) i is an i n t variable which is the index variable for the loop. i is not usetl outside the

loop. and is initialized to O in the f o r statement. Furtherniore. i is not niodified

by any statement inside the loop body.

(ii) There is no irregular control Row wit liin the loop ( i . e. break. continue).

(iii) There is an eaplicit loop exit test of tlie form ( i < a). 9 niay be any C expressioii.

but it may not have side effects. and its value must also be loop invariant. Since

t h e index variable i is initialized to O. t h e expression @ directly represents t h e trip

count of the loop.

(iv) There is an loop increnient statenient inside t h e for statement. T h e value of the

increment is 1 and does riot change during execution of tlie loop.

(v) Eacli access t o a n array a in rtie loop is of t lie form:

a [ E I ] [ E 2 ] . . [Ek] wtiere El. E x . . . Ei are al1 expressions wliich involve orily i. Iuop

invariant espressions. or constants.

-4 canonical loop describes our goal: a loop rvliicli closely resenibles tlie seiiiaiitics of a

FO RT R:\S-s tyle DO loop. aiid froiii whicli reqiiired in format ioii for tlepen tieiicr aiialysis

cari be est racted froni t lie source code i tself. Tliiis. following t lie terniinology iiit rodiicrd

by .Justiani and Hendreri [.JH91]. ive define:

Definition 2 Given an arbitrary C loop L. L is adnrissibie if tliere is a canonical loop

L' wliicli is seniantically equivaalent to L 3 .

1.2.1 Admissible Loop Normalization

Iii order to enable dependence analysis for Ci. we wisli to be able to traiisfortii as iiiaiiy

loops as possible into equivalent canonical forms. l i e term this transformation admissibl~

- -

3Note that strictly çpeaking, this is a staternent about the semantics of a certain piece of C: code. regardless of whether a compiler can determine that such a canonical loop exists.

Page 15: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

loop normalization. This transformation attempts to (a) deterniine tliat a given loop is

indeed admissible. and ( b ) construct the appropriate canonical forni for tliat loop. To tliis

end. a compiler pass has been inlplernented wit hin the Parafrase-2 compiler environnierit

to generate canonicai forms in C for some types of admissible loops.

The Parafrase-2 parallelizing compiler [PGHSO] is a vectorizing/parallelizing compiler

wtiicli operates as a source-to-source translater. Parafrase-2 can compile FO RTR.45 or C'

code. and represents source code in an interniediate forni wiiicli can tlien be manipulated

by various passes. Tliis intermrdiate form coritains sufficie~it inlorniation to reconstriict

C' source code after analyses or transformations have been applied. The core conipilrr

contains passes implement ing a nurnber of exist ing analyses and transforniat ions. i ticlud-

ing Row graph construction. code generat ion. constant propagation. inductio~i variable

substitution. dead code eliniination. etc. Parafrase-Y itself is iniplemented in the C'

programming language. and contains metliods which can be used to access tlie inter-

rial representation and iniplenient new passes. Adniissible loop nornialization lias I~erii

implemented on top of t be esisti ng induction elirnination pass in Paralrase-'.

1.3 Thesis Contributions

Tlie primary contributions of t liis t hesis are t h e development of algoritlims for iiorriializirig

C' code ivliicli does riot conforni to tlie requirements of dependeiice analysis. ivitii respect

to the loop statement ( t h e presence of an explicit loop index variable and for statenient.

and an explicit loop trip count expression). tlie format of array index expressions. and t lie

use of pointers to access arrays. For tliis purpose. existing compiler analysis tecliniques

have been applied to the problem. and existing techniques for computing loop trip counts

have been extended. Tliis thesis also shows Iiow tliese techniques may be iniplemented

wit hin a real compiler environnient. and experiments mit h existing parallel bencliniark

programs are used to denionstrate that these techniques a re able to successfully eiiable

depenclence analysis for real C programs.

Page 16: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

1.4 Thesis Organization

This thesis is organized in six chapters. Chapter 1 introduces the dependerice analysis

problem. the difficulties which arise in tlie C language wit h respect to dependence anal-

p i s . and defines a goal for enabling dependence analysis in C. Cliapter 2 describes loop

control Jow nonnalization. which is an algorithni for nornializing C loops i n the absence

of pointer operat ions. C hapter 3 describes pointer a rra y access normafization. wliicli is

an algorithm for normalizing certain types of C pointer operations. III Chapter 1 an

iinplenientation of these algorithms is described within tlie Parafrase-2 compiler environ-

ment. and experirnental results are presented showing the efficacy of tbese nietliods for

selected benchniark programs. Cliapter .i describes related work. reviewiiig otlier work

done on t h e C dependence analysis probleni. as well as important background rilaterial

on induction variable aiialysis and alias analysis. Finally. Cliapter 6 preseiits coiirrlusioris

and possible future extensions.

Page 17: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Chapter 2

Normalizing Loop Control Flow

In this chapter we wili consider t h e problem of admissible loop normalization iintier soiiie

sirnplifying assuniptions. In particular. ive will consicier C progranis idiiçli do iiot have

pointer relerences in tliem. either to scalars or to arrays. These restrictioris will be eased

in Chapter 3. We twi l l also restrict tlie scope of the anaiysis to the intraproreclural Irvrl. so

tliat t here are no function calls tliat affect non-local variables. and no recursive furictiori

calls. Considering only these types of progranis allows us t o focus on issues relating to

loop corlt rol jlo u7.

Figure 2.1 illustrates a source prograni tliat is iinnornialized for dependence arialysis.

Tlie loop in Figure 2 . l ( a ) exhibits several cliaracteristics wliich do iiot îoiiforiii to the

defiiiitioii of a caiioiiical loop (see Section 1.2). CVe want t o be able to take siirli a loop

arici generate an ecltiivalent canonical loop (sucli as tlie one i n Figure 2.1 (b)) . I i i order t u

do tliis. tliere are several aspects of ttie input loop tvliicli must be tiaridlecl:

(a) Loop syntax.

Loops can be ivritten in different syntactir fornis. For exaniple. tlie loop in Fig-

ure ?. l(a) is ivritten witli i f and goto. instead of ivitli for. Similarly. one cari w i t e

loops using other control structures (wh i l e . for. do-while. etc. ). Cencrating a

canonical forrn entails expressing a loop as a for loop regardless of its syntartic

forrn. as in Figure 2.1(1>). It is also necessary to associate an explicit index variable

with tlie loop. The variable i x in the nornialized loop serves tliis purpose. wliereas

the i f / go to form lias iio sucli explicit variable. In C. only the f o r statenient

Page 18: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 2. 1: 'ionstandard Loop Control Flow

i = O; N = 100; j = O ; LI: if((i+j) >= N+k)

goto L2; v = j + (2*i) ; a[vl = 5 ;

goto L1:

(a) input loop

for(ix=O; ix 4 ceil((lOO+k)/(3+r~;ix++) {

a [ ( r + 4 ) * ix] = 5; 1

(b) normalized loop

has a specific placeholder for a loop index variable. and even in this case it is iiot

syntactically required.

(b) Loop trip count coniputation.

A n impor tant aspect of determining admissibility is t h e computation of a trip counl

for the loop. If the c o n i ~ i l e r can generate a loop invariant espression representirig

the number of iterations of the loop. the f o r statement of tlie canonical foriii cati

be straiglitforicardly grrierateci by tlie coiiipiler. To do ttliis. tlie compiler iiiust

use induetton rariable nnalysis in order to attenipt to derive a quaritity for the

trip count based on the type of exit condition for tlie loop. 111 Figure 2.1. tlie

compiler must be able to determine tliat the variables i ancl j are induction vari-

ables for t h e loop. and subsequently conipute a n expression for t he loop trip count.

ceil((i00 + k) / (2 + r)).

( c ) Subscript nornializat ion.

Given that each array reference has esplicit array index expressioris for each di-

Page 19: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

mension. the compiler must re-express each of them in terms of the loop indes

variabie. since t hese expressions may not necessarily be writ ten by the prograninier

in terms of the loop index variable. For example. a programmer niay use an i r i -

duction variable in a loop to avoid having a linear array indes recomputed on eacb

iteration of a loop. CVliile tliis is desirable wlien compiling for a serial niachine. it

can hinder dependence analysis. and t herefore parallelizat ion. Furthermore. t here

are many different equivalent ways of writing polynornial functions syntactically.

due to the properties of comrniitativity. distributivity. and associativity. The ar-

ray iriclex in Figure 2. l(a) is expressed i i i terms of an induction variahle v. aiid is

re-expressed in ternis of the index variable i x in the nornializetl loop. Ici griieral.

we would like to be able to express eacli array index expression as a standard forrii

no + a l r ixi + - - - + a, * ix,. where no. . . . . a , are loop invariant t.spressioiis and

i x l . . . . . ix, are enclosing loop index variables.

Tlie remainder of this cliapter is organized as follows. In Section 2.1. an overview of

the algoritlirn used to normalize loop control Row is presented. Each of tlie priniary pliasrs

ici the algoritlirn is tlien clescribed in the following sections. Section 2.2 tlescribrs loop

preprocessing. Section 2.3 describes tlie conipiitation of trip coiints for loops. Srrtiori 2.4

describes subscript nornialization. and finally Section 2.5 describes tlie gerieratiori of

canonical loop fornis.

2.1 LCFN Algorit hm Overview

Tlie algorithni for loop control j lo~r riornta[izatiorl ( L C F N ) can be describeci at a liigli lewl

i c i ternis of the tliree aforenieritioned pliases: loop syntax. trip courit coniputation. aiid

siibscript normalizat ion. Tliese t hree phases prepare a loop for canonical loop generatiori.

in which the loop is replaced wi th a f o r loop satisfying the conditions describrd iii

Section 1 .'L. Figure 2.2 siimmarizes the steps involved. LVe can coiisider the operation of

t tiis algorithm on t h e exaniple giveri in Figure 2.3.

Figure 2.3(a) shows the original loop frotii Figure 2.1 espressed as a f o r loop' Tlit.

'The LCFN algorithm assumes input loops are either in nhile or for form.

Page 20: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 2.2: LCFN algorit hm overview

Input: -4 C function f wliich satisfies t he following:

(a) f does not contain function calls tha t have side effects on variables appearirig within f .

(b) T here a r e no assignnients or references t hrotigii pointer variables in f .

(c) .\Il loops in f are either while or for statements.

(d) There a re no goto statenients in f.

for each loop L in the prograni preprocess L for anaIÿsis if L vioiates conditions for analysis

mark L inadmissible continue {skip I O O ~ L)

analyze L for induction unriables (f C-)s conipute trip count for L based on IV analysis if trip count coulci not be coniputed

mark L inadmissible continue

for each array reference expression E in i, if E is an induction expression

replace E by its equivalent induction expression end for generate canonical forni for L replace L in parse tree by canonical form

end for

loop syntax preprocessiiig pliase takes t lie following steps. whose results are seeri i 11

Figure 2.9(b) :

( i ) The original for loop is converted to an equivalent while.

( i i ) A compiler-generated index variable ix is added t o the loop. ix is initialized to zero

inimediately before the while. and is iiicremented by 1 in tlie last statexiierit of the

while loop body.

(ii i) The quantity 'ï = ((N+k) -(i+ j)) is added t o t he loop representing the expression

whose value determines wben tlie loop will exit (see Section 2.3).

Page 21: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 2.3: LC'FN transformation

(a)

original loop loop after preprocessing

Induction uar iab l~ ( t I.) aiialysis is tlien eiiiployed to espress tlie quantity T. as well

as the variable v appearing iii tlie array reference a[v]. in terriis of tlie index variahle 1x1.

The loop trip count can b r sortiputrd frorii tliis t o be c e i l ( (100+k) / (2 + r) ) .

Induction variable analysis is an importarit and conimon technique for arialyziiig tlir

values of variables within loops at conipile tiine. aiid is crucial to the success of adriiissible

loop normalization. IV analysis involves examining the assignments to variables witliiri a

loop in order to discover wlietlier t he values assigned to a given variable on each iteration

forni a sequence wliich can be described by a closed-form expression in ternls of the loop

iteratioii. Tliose unfaniiliar rvitli inductioii variable analysis are referrecl to in .-\ppeticlis

ri. wliere both the analysis and the various techniques for doing it are reviewecl in cletail.

Finally. in Figure ? .3 (c ) . the canonical form is generated for tlie loop. by moviiig t h e

Page 22: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

compiler-generated index variable initialization and update into t lie f o r statenient. aiid

placing the computed trip count expression into the exit test of the f o r . The computed

IV expression for the array access is substituted. and dead code elimination renioves the

ext raneous variables.

2.2 Loop Syntax Preprocessing

In the absence of got O statements. handling loop control Bow is considerably sini pli fiecl.

Since programmers tend not to write code using gotos in riiost cases. and eli~iii~iatioii

of g o t o statements is a well-studied problerii. this is not a crucial issue. Hokvevrr iri

rriany situations the descript ion and iniplernentation of conipiler algoritlinis are great ly

simplified by removing tliem from consideration. In particular. Erosa and Hendren [EH941

descri be a goto eliminatiori transforniat ion for C' programs. This trarisforrtiat ion rtmiovt-s

goto statements by replacing them witti equivalent structured prograinniing constriirts

(i.e.. while. do-while. etc). Tlius. we assimie that such a transforniatiori lias alrearly

been applied before processing begiiis. and tliat tlie loops wliicli are ~ r e s e n t e d to tlir

LCFX anaiysis are either while or for statements.

Since the subsequent phases of the algorit lim assunie tliat loops are iii while foriii.

the preprocessing phase first converts any input f o r loops to while forni. I f the 1001)

ends up to be aii admissible one. tlien the while will hc coriverted back to the ratioriii-al

f o r forni at tlie etid of the arialysis. Satc that for loops cari be convertcd to while fo î r i i

directly [KRSS].

Iri order to further analyze a given loop. tlie loop niust satisfy the lollowing conditiotis:

a The loop has only one exit. tiiat k i n g tlie condition appearing in tlie while state-

ment itself.

a The exit condit ion appearing in t h e while statement niust be an iiitqv-r rom paris or^'.

That is. the while statement must have the form

'Although the operators (<. >. s,? are valid for non-integer types, the later induction variable and trip count analysis phases will only be effective for integer variables. so this restriction is made here.

Page 23: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

vhile(~), where E is a n expression of the form:

( E , O P E ~ ) . and a2 are arbitrary C expressions. and OP is one of the arithmetic

cornparison operators (<. >. 5.2) .

0 The exit condition does not contain side effects.

.A loop which violates any of these conditions is deemed inadmissible. Giveri that the

above coritlitions are satisfied. the compiler proceeds by adding aii explicit index variable

to the loop. At this point. a variable is also atlded to the loop to bold tlie tr ip roirrif t f s t

rzpr~.ssion (TCTE) for the loop (see Section 2.3). The coniplete preproressing phase is

summarized in Figure 2.4.

Figure 2.4: Loop Syntax Preprocessing

Input: A loop L which is either a w h i l e or for bop .

if L is a f o r loop let I be t lie ini t ializat ion expression of t lie for niove 1 ininiediately preceding t h e f o r statenient let U be the update espression of the for let S be the last statement of t h e f o r loop body rnove U to immediately follow S let E I>e the exit expression of the f o r replace the for stateiiient by vh i l e (E )

for each basic block B B in L if BB not= heacI(L) and BI3 contains a braricli oiitside L

mark L inaclniissible if E is not an arithnietic con~parison

mark L inadmissible if E contains side effects

mark L inadmissible if L already marked inadmissible

return add index variable i x to sytnbol table insert i x = O before w h i l e statenient insert i x = i x + 1 as last statement of loop body add assignment T = T C T E as tirst staternent of loop body

Figure 2.5 illustrates an input loop and tlie s tate of the loop after preprocessing lias

been applied.

Page 24: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 2.5: -4 ltered loop alter preprocessing

(a) original loop loop &ter preprocessing

2.3 Computing Loop Trip Counts

Ciiven that we are dealing witli a w h i l e loop. wliich lias only a single esit. we wish to he

able to compute a trip count for tlie loop. Vnder these conditions. tlie trip couiit of the

loop is ttie nuniber of loop iteratioris (possibly zero or x ) whicti will be execiitecl iiiitil

the condition inside the whi le statenient beconles false.

C'oniputiiig a trip coiirit for a w h i l e loop is riot as straightforward as for a F0RTR.-\S

DO bop. because ttie exit conditio~i can tia~re varioris fornis. ancl because tlie variahles

appearing in the esit condition nia!. be iiiodified irisitle the loup. r\lso. tlie i yn t a s of

t h e w h i l e statement does not specify how thta increrrient to tlie index variable orciirs.

Figure 2.6 illustrates three equivalent w h i l e loops wliicli have tlifferent esit ronclitioiis.

In order to determirie a trip count for a given exit condition. the compiler niust be able

to analyze the values that t h e exit condition will take on each iteration of the loop. Wolfe

[LVo192] describes Iiow to do tliis if tlie loop exit condition is a integer comparison. wliicli

can be classified by the compiler as an induction expression. Given a pseudocotle exit

condition of tlie foriii ( i f E, 5 C? exit ioop) for expressions ri and 52. Wolfe's iiiettiod

t reats t h e comparison as a subtraction. Thus the exit condition is treated as equivaletit

to t lie condition ( i f ai - E:, 5 O exit loop). Wolfe tben cornputes a trip count if the

Page 25: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 2.6: while loops witli nonconstant exit conditions

subtraction c l - ci can be classified by the coiripiler as a iirzrar inductiorr ~.rprrssiorl of

the form cl - t2 = (c x ix + 3). wliere

(a) cr and 3 are integer constants.

(b) i x is the loop index variable.

The t r ip count cari then be expressecl as follows:

Since the semantics of a while loop specify tliat the loop is to be exited when the

condition appearing within the while statemerit itself beconies false. t lie ot lier integer

coniparison operators (<. >. 2) can Lic Iiandled using the table in Figure 2.7:

The expression in the tliird coliitiiii of Figure 2.7 is terrried the tr ip courit trnt r x p r r s -

s ion ( TCTE) for tlie loop. Duritig preprocessing tlie compiler adds a temporary variable.

wliicli is assigned the TCTE a t the beginning of each loop iteration. This allows the

coinpiler to analyze it as a n induction variable. as i f it were any otlier ordiriary program

variable.

Page 26: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

In practice. ttiere are several other issues tha t the compiler must deal witli. Sirice

several induction variable analysis techniques exist wtiich are capable of detect ing and

representing nonlinear induction expressions. the compiler must be able to determine

whether a given induction expression is linear in the loop index variable and derive the

expressions a and 3 from it. Furthermore. the compiler should deal witli situations in

which CI and 3 are symbolic (but loop invariant) expressions at compile time. since i t i

realistic prograins cr and 3 niay not be kriown constants. The coinpiler muet genrratr

an appropriate trip count espressioii in ternis of n ancl 3 aiid ensure tliat it cleals tvitli

cases involving zero or infini t e trip counts reasonabl -

Figure 2.7: Trip count test expressions

S o t e tha t in L\yolfe's method. the enpressions n and 3 are literal coiistatits. so tliat tlir

result ing t r ip count expression is also a coiist an t . Wlien syniholic expressioris are involved.

t lie coniputation of the trip count rspressiori ( that is. the expression @ appeariiig i r i the

carionical for statenierit ) can be describecl according to one of several cases:

(a) n and 3 are constant values known at conipile tirne. in rvhicli case we can generatr

a trip count expression based on LVolfe's forniula as described above:

Trip Count Test Expression i f (C 5 O ) e x i t

( E l - E2)

(E2 - E l )

w h i l e statement (whi le (C) )

(E l > E2) (El < E2)

(b) n and .3 are symbolically dicisiblr. I f n arid/or 3 are not compile t iiiie coiistarits. t lie

conipiler may still be able t o compute a trip couiit directly if the synibolic value of

a divides the syriibolic value of J3. Sucli a situation can occur in a loop sucli as

the following:

Positive Exit Condition ( i f ( C ) exit)

( E l 5 Ed ( E 2 I El)

3Parafrase-2 contains functionaiity to cornpute such symbolic divisions where a and ,3 are polynomial expressions.

Page 27: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

N = f o o 0 ; i = O; ix = 0 ; while(i < (N * N) + N)

Tl = ((N * N) + N) - i; i = i + N ; ix = ix + 1;

In tliis loop. n = -(IV). and 3 = ( N * N + N). and so the t r ip count can be expressecl

as N + 1. In general. the result of tlie syrnbolic division is used as tlie trip (:ount

expression:

( c ) n and 3 are neit her constant nor synibolically divisible. Iri t his case. t lie syntactir

trip count will involve ail esplicit call to the C library c e i l function. since o and j

are not divisible either as constants or syrnbolic expressions. To gerierate tlie trip

count expression. the compiler generates t lie function call ceil( -J/cr).

2.3.1 Handling Zero Trip Loops

In cases where the values of a and .3 are riot known a t compile tinie. situations in i v l i i d i

a loop trip count is zero or r ? ~ arc niore dimctrlt to detect. The former are more of a

concern. silice infinite loops sliould iiot Lie encounterecl in correct scientific code. and (-an

be considered to be ei t lier programmer error or invalid input.

These types of ~ r o b l e m s can be illustratetl by t lie following progranis:

III Figure Z.S(a). tlie w h i l e statenierit will have a trip count of O i f N 5 O. but a trip

count of N otherwise. In Figure '>.S(b). the w h i l e statement will have a t r ip count of O

i f i < 0. but x otlierwise. Figure 2.9 estends LVolfe's trip count fortiiula for cases i i i

wliicli a or 3 are iinknown a t coriipile tinie. However. tlie coiiipiler must still insert an

espression into tlie loop exit whicli appropriately reflects the various possibilities at riiii

time.

From Figure 2.9. in cases (iii). (vii) and (ix). we have a situation wliere at run tiiiie

-3 inay be negative. but tlie trip count is S. I f we regard such a situation as eitlier

Page 28: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 2.8: C programs with unknoivn t r i p couiits

i = 0; scanf (i, &NI; while(N > i) {<body>; i++;)

/ * if (N <= i) exit * / ,/* T = N - i *,ll

i/* a = -1 " /

/ * B = N * / - Ni(-1) * / / " - B / a -

Figure 2.9: Tr ip coiint coiiiputation for general expressions

Value a t compile time 1

I

programmer error or iiivalid input (since in scieiitific applications we d o not ant ic ipate

i r i fini t e loops). we can safely ignore t tiese cases. Tlie ot lier potential difficiilty is t liat t liere

niay Le a zero trip count at ruri t i m e t h a t caniiot be detected a t compile time. Sote t t iat

if t h e quanti ty < 0. t here is no protileni witli code generation. since the s ta tenient

f o r ( i x = O ; ix < c e ( a ; ix++) will bellave as if t h e t r ip count is O. However.

the re a r e also cases where $ > 0 a t run tirne. but t h e proper t r ip count is zero ( th i s

occurs in cases ( i i i ) and ( ix ) ) . In these cases. the compiler can wrap t h e loop iri a

conditional test (a zero trip loop ( Z T L ) test ) wliich only executes t h e loop if t h e t r ip

r o u n t is nonzero at r u n time. regardless of t h e value of +. This is a common way of

( i l ( i i ) ( i i i ) ( i v )

(4 ( v i )

( vii) (v i i i )

( is)

cl

+ + + -

-

-

unknown

iinkrioïvri

unknown

J +

5 0 unknotvn

+ < 0

un known

+ 5 0

uiiknown

-J - -?

+ unknown

+ -

iinknown

iiiiknown

un known

unknown

t r i p count X

U x, o r O

- J - LI

O ]SI or O

x:orI--1 -J

1

O -3 x o r O o r 1 1

Page 29: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

addressing the zero-trip loop probkm [EH L9 11.

In cases where tlie t r ip count is obtained by synibolic division. a zero trip loop test

rnust still be generated for t he loop. since t he TCTE (and hence. t he tr ip count) niay

Lie zero even if the result of t h e symbolic division is positive. However. in this case. the

compiler must use t h e original TCTE expression as the ZTL test. iristead of the result

of t h e symbolic division. An exarnple of tliis can be seen in the loop in Figure 2 . IO.

Figure 2.10: ZTL Test

(a) Ioop before trip count andysis (b) canonicd loop with ZTL test

IF. ( a ) . ri = -N and 3 = (10 * N). so the synibolic division gives a trip cotirit of 10.

However. tliis is clearly orily a valid value if the initial expression (IO * N) is larger tliaii

zero. Tlius. the compiler uses t h e TCTE expressioti ((10 * N) - i) as a test a t ruii

tirne. instead of t he constant 10.

Wlien the tr ip count is computed using an explicit ceil call. t h e compiler must agairi

insert a ZTL test. since tlie sign of t h e TCTE is unknown at compile time. In this case.

the conipiler caii siiiiply insert the resulting trip count expression itself into t tir eiiclosiiig

i f statenient. The cornplete algorithni for t r ip count coniputation is sun in ia r izd in

Figure 2.1 1.

Page 30: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 2.1 1 : LCFN Trip Count Computation Algorit lini

iv-expr = IV expression associated with TCTE if there is no IV expression

niark L inadmissible return

ix = index variable associated with L if iv-expr is not a linear function of' iz

mark L inaclniissible ret urn

cornpute cr. 3 such that iv-expr = (a) ' ix + ( 3 ) if û and 3 are both Iiteral constants

cornpute trip count as 121 if a symbolica1ly divides -3

tr ip couiit = result of symbolic division niark L as requiring ZTL test return

if a positive and 3 unknown tr ip count = O return

if a unknown and 3 negative or zero t r ip count = O ret iirn

else constriict t r ip coiint as ceil( - 3 / a) mark L as requiring ZTL test

2.4 Subscript Normalization

Espressing array siibscript espressioiis in a norriializ~d form is also closely rclated to

iricluction variable analysis. In orcler to deterniine if a given subscript cari be espressecl

in t he form a0 + ai i i x l + - --. + a,, * ix,,. the compiler niust classify the siibscript as an

induction expression in ternis of the enclosing index variables l x i . . . . . ix,. -411 exartiplr

of subscript normalization can be seen in t h e program in Figure 2.12.

In Figure ?.I?(a) the tliree references to tlie array a al1 use tlie inciuctioii variable

v in place of expressions involving loop index variables. In tlie nornialized code in Fig-

ure 2.12(b). the appropriate induction expressions for v have been substituted in rach

case.

Page 31: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 2.1:': Subscript normalization

f o r ( i x 3 = 0 ; i x 3 < 1 0 ; i x 3 + + ) {

a[6OO * i x 3 ] = 0; for(ix2 = 0 ; ix2 < 20 ; ix2++) {

a f 3 0 * i x 2 + 6 0 0 * i x 3 ] = 5; for(ix1 = O ; i x l < 3 0 ; i x l + + ) {

aiixl i 3 0 * ix2 + 600 * ix3] = 5 ; 1

1 j

2.5 Canonical Loop Generation

Generating a canonical loop is a fairly straightforward task once the loop trip courit

lias been coniputed. Since the loop is already in while forni. the conversion to caiioii-

ical f o r forin can br done directl. If a ZTL test is necessary. ail if claiisr witli tlir

appropriale test expression ivraps the for loop. This is suiiiniarized in Figure 2.13.

Page 32: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 1.13: Canonical loop generat ion Input: w h i l e loop L after trip count analysis and addition of loop inc l e s variable aiid

iipdate.

Output: for loop in canonical forni equivalent to L

ws tmt = w h i l e statenierit of L i x in i t = statenlent preceding w-stnit is-upd = last stateiiient of whrle body reniove i x u p d froni loop body tc = cornputeci trip count for loop create f o r statement witli i s i n i t . is-upd aiid t r

if L requires ZTL test ztl = ZTL test conclitiori enclose for in i f (ztl)

remove i x i ni t replace whi l e stateiiieiit b>- geiieratecl for stateiiiciit

Page 33: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Chapter 3

Pointer Array Access Normalizat ion

In addition to the problenis relating to the nornialization of loop control How. the C'

prograrnming language a lso presen ts problems for depenclence analysis becaiise of t lie use

of pointer variables by prograrnmers. There are several ways in ivliicli siicli rompl icat ioiis

cazi occiir:

( i) L'se of pointer variables to refereiice scalars wit liin a loop.

.A pointer dereferencr that afects a scalar variable witliin a loop caii ulm-lire thr

rffect of an arra?; reference or a loop COI] t roi variable frein t hr conipi l ~ r . This occiirs

in the loop in Figure 3.1.

F i g e 3.1 : llodificat ion of scalars I>J. pointer

1

Page 34: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

If t he compiler is unable t o determine that the statement *p = *p + 1 updates

t he variable i by 1. it will be unable to determine tliat t h e array refererice a[i] is

equivalent to a[3 * ix+ 11. Furt hermore. if the conipiler cannot deterniine what vari-

able the dereierence of the variable p affects. it will have to make the conservativr

assumption tha t the array reference a[i] could refer t o any array element. signif-

icantly reducing the efficacy of any dependence analysis. This type of difficulty

arises in nearly any type of compiler analysis wlien tliere a re unanalyzed pointer

references and wliere conservative assumptions must be niade in the absence of

specific information. This affects snch analyses as constant propagation. intliictioti

variable analysis. etc.

(ii) References made t hrougli dynaniically allocated da t a structures. References to dy-

namically allocated d a t a structures are generally made in C ttirough pointers. and

t hrougli esplicit calls to t lie malloc library. These types of situations typically

involve comples da ta structures sucli as linked lists and trees. which are clifficult to

analyze in a n - case. but ordinary arrays cari also be used in t liis mariner.

( i i i ) References to statically declarecl arrays niade tlirougli pointer variables. C allows

t h e equivalence of array references usi ng esplici t array indices. and equivalent Iy

using pointer dereferences. This Is illustrated in Figure 3.2.

In tliis chapter. ive will foriis on dealing witli the sorts of programs cfescribeci t,y

( i i i ). Tlie pririiary difficiilty wit l i sucli progranis is tliat tlie iucies espressioii(s) iised to

relerencr tlie array a re inrplicit. and are Iiitldeii from tlie conipiler because of tlie use

of pointer aritlinietic. For example in Figure 3.2(a). tlie index expressiori 2 * i is iiiacle

implicit by the assignment of p before tlie loop. and by the increment t o p rvhicli occurs or1

each iteration of the loop. Our goal is to recover sucli indes expressions wvliere possible.

again expressing tlieni in ternis of loop iiitlex variables.

These types of probleriis are estreniely difficult to deal with in the geiieral case.

because of the iiecessi t y for con1 plex alias analysis wlieri arbi t rary pointer operat ions are

allowed. Alias analysis is a probleni tliat has been extensively studied. [HHN94. WL9.5.

EGH94. JMSI. LR92. Ban79. Bari;] but for which research is still progressing. Because of

Page 35: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 3.2: Stat ic array references via pointers for one and multi-diniensional arrays

int a[100] ;

int *p;

(a) One-dimensional array

int a[10] [IO] ;

i n t *p;

(b) Multidimensional array

the cornplesity of tlie probleiii. ive will atteriipt to avoicl alias analysis wherever possihlr.

bu t will still a t tenipt to deal witli a reasonable range of progranis.

3.1 PAN Algorithm

The aforementioned normalizatiori. ivhich we will terni poirzter nrrny acrcss normnliza-

t ion (P.-\N). operates oii statically declared arrays. 111 addition. ive niake the follon-iiig

sitnplifying assurnpt ions coiicerning t lie type of pointer operations t hat may orcur i t i

input programs:

(a) Any pointers in tlie prograni point only t o statically declared arrays. Tliere a rc no

pointers t o dynamically allocatecl da t a structures. nor are there pointers t o scalars.

(b) For a given pointer variable p. aiiy assigniiients to p in the prograni are of t h e forni:

Page 36: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

CHAPTER 3. POINTER ARRAY ACCESS XORMALIZATION

where a is a s ta t ically declared k-dimensional array. and r . . < k are arbi t rary

expressioiis of integer type '. or

p = a[ci] - - . [ rk- , ]

w liere a.e - - - el.- 1 are as above'

a p = p i 5

where r is an arbitrary expression of integer type.

( c ) .A pointer variable p can only point to one array a dur

grani. altliougli it cati be assigncd niultiple times using

statement .

ing the course of tlir pro-

eit her forni of assigrinieiit

(d) .4ny pointer dereference in the prograni is of tlie forni 'p. or * ( p I E). ivitli p a

pointer variable and r an arbitrary expression of integer type3.

Tliese simplificatioris alloiv the compiler to analyze array references witlioiit tlir iirrd

for sopliisticatecl alias analysis in deterniiiiitig pointer-array relatioiisliips. ï i ider tliese

assuniptions. deterniining wliicli array a given pointer points to requires orily a siniplr

scan of the prograni. and the relationsliip between a pointer and its associatecl arraJ-

does not change diiring the erecution of tlie prograrn. Thus. t lie conipilsr can fociis on

convert ing pointer dereferences to eqiiivalent array accesses basetl on the iniplici t access

pattern created by ivliatever pointer operatioris rs is t .

Iii order to derive ttiese indcs espressions. the primary idea eniployed is the obsrr-

vatioii tliat i n progranis suçli as tlir one in Figure :].?(a) the assignirierits to pointer

variables resenible the pattern of siniple induction variables. This is a result botli of the

siniplifying assumptions made and tlie fact tliat tlie C language restricts tlie nianner i i i

wliicli pointer variables can be assigned. In Figure 3.2. ive can consider the poi~iter p to

be an induction variable of a special type. wliicli is initially set to an offset of zero froiii

the beginning of tlie array a. ancl lias its offset increased by a value of 2 on eacli loop

' Note also that 5; should not contain side effects. ? ~ o t e that for a one-dimensional array. this will be a siniple assignment of the form p = a 3 ~ n al1 of these cases 5 should also be free of side effects.

Page 37: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

iteration. This is exactly analogous to an ordinary integer induction variable wliicli is

assigned a value of zero before t h e loop and is incremented by 2 on eacli iteration.

3.1.1 One-Dimensional PAN

In this section. a PL\N algorithm ivill be described for one-dimensional arrays wliicli are

accessed by pointer. In Section 3.1.2 P A S will be extended to tiandle multidimensional

arrays. .-\ssuming that t lie same type of induction variable analysis is available as was

used in the LCFN algorithni (see Cliapter '?).LW can use it to do PAN b ~ . adtlirig i-oiiipilrr-

generated integer variables to t lie loop which correspond to t tie niodificat ions of a pointer

induction variable. The algorithm for this is surnmarized in Figure 3 . 3 .

Figure 3.;): One-Dimensional P.4Y .Algorit hm

for each statement S in P if S is a pointer assignnient p = &(a[e])

if variable I,, does not already erist add durnniy var I,, to symbol table

{Ipo denoces var 1 indexing array a via pointer p )

adtl assignnierit I,, = E iniriiediately following S

if S is a pointer assignnient p = a add assignnient I,, = O ininieciiately following S

if S is pointer assignnient of foriii p = p f 5

add assignrnent I,, = I,, + E imrnediately follot~irig S

if S coritains a poiriter dereference *p add assignriieiit &, = 1,. irnmediately belore S

if S contains a pointer dereference *(p I E) add assignnient b., = I,, k z imrnediately beforr S

end for

run IV anal-çis for each statenlent S

if %, is an induction variable at S let r = induction expression associated witli 4, replace *p in S by a [ ~ ]

end for

Essent ially. t tiis algori t hm works by t reat ing t lie pointer variable p as an iiiduct ion

Page 38: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

variable. In the example illustrated in Figure 3.1. p is initialized to point

element of the array a before the loop. ancl is updated by a constant aniount

:3 2

to the first

(via pointer

arithmetic) on each loop iteration. Thus. on eacli iteration of t h e loop. the pointer p

points to an element of a whose index value is an induction variable of the loop. The

ciumrny variables 1,. and &, which are added into the loop by the compiler correspo~id to

the index values resuiting from pointer operations. The variable 1,. is used to mode1 the

effects of pointer assignments. ivhile tlie variable %, is used to represent t lie value of t lie

irnplicit index at program points where cleferences to p are made. Associatecl witli eacli

type of pointer assignment is a corresponding assignment to I,,. .-\ direct assigntiient

to an array element using the & operator results in f,, being assigned the correspoiid-

ing expression. .A pointer increnient or ciecrenient results in I,, beiiig iiicreitiented or

decremented by a corresponding amount. respectively. In the case of a pointer tlrrefrr-

ence. ha is assigned the value of I,, at tliat point iii the prograni. iinless the dereferrrice

also contains pointer arithrnetic. in wliicli case the atlditional increiiieiit or clecreriiciit is

inclucled in the assignnient to b,. Once tliese dumiiiy variables are in place. p cari br

analyzed wit i i ordinary IV analysis techniques. Corresponding pointer derefervnces cari

t heii be coriverted to array accesses.

3.1.2 Multidimensional PAN

I t is fairly st raiglitforward to estent1 steps 1- 1.5 in Figure 3 . 3 for riiult idiniensioiial arraJ-s

refrrenced usirig pointers. However. rvei i if t lie conipi ler lias deterniiried ail indiict ion

expression for a givm pointer reference iii step 19. t tiere cari be cliCficilltirs in gcrierat irig

the appropriate array indices. Tliis can be illustrated in loops sucli as tliose in Figure ll.5.

111 Figure 3..5(a). ive have a '-dimensional array rvhich is referenced witliin two enclos-

ing loops. Tlie proper array reference can be easily seen to be a[i]['L * j] Ily iiispectiori.

However. it is more clificult in general for tlie compiler to derive the proper expressioii

for each dimension. Specifically. the coinpiler niust. giveii enciosing loop iiitles variables

ix,. . . . . i x r . an induction expression of the form Ji * i x i + - - . + Jr * i x i (where 3, are loop

invariant expressions). and a k-dimensional array a[ul] - [uk]. generate index expressions

el.....sk such that

Page 39: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 3.4: P.iN Processing Example

(-4, * r i + -. . + .-Ih * E B = * i x I + --• + Jl * i x I ) subject to O 5 5, 5 u;. a~icl

k 4 = + u . It is nontrivial for the compiler to derive tliese expressions. partit-iilarl>-

if the expressions J, are synibolic.

In Figure :).5(b). a two-diniensional array is referenced wi th only a single enclositig

loop. In a case sucti as ttiis. even i f the compiler couid derive the proper indes espres-

sions. these expressions would necessarily involve dic and mod operators. Tliese types of

expressions typically cannot be analyzed I>y dependence analysis techniques in any rase.

111 order to avoid t hese difficult ies and handle riiirlt idiniensiond array rcferences iri a

uiiified mariner. a m a y li~zean'zation [BC86]. [WB871 can be used to convert multidinieii-

sional arrays iiito eqiiivalent one-dinieilsional arrays wliicli can tlien be analyzed using

the tecliniques of the previous section. :\rra? liiiearization. which is summarized in Fig-

ure :3.6. has been used as a technique for doing dependence anaiysis on multidimensional

arrays. There are sotne advantages and disadvantages to doing so. wiiicli are discussed

by- Girkar and Polychronopoulos [G PM].

'Since C reqiiires the bounds of static arrays to be literal constants. .-li can be computed at compile t ime.

Page 40: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 3 . 5 : Resolving riiult idimensional array refereiices

int a 1 1 0 1 [IO] ; i n t *p;

while(i < 1 0 ) (

p = a i i ] ; Ipa = 10 * i; while(j < 5 )

*p = 5 ; / * a [ i ] [ 2 * j ] = 5 *./ Rpa = Ipa; / * R p a = ( 1 0 * i ) + ( 2 * j ) * / p = p + 2 ; Ipa = Ipa + 2 ; j = j + l ;

! i = i + I ;

i n t a [ 1 0 ] [ I O ] ; i n t *p;

i = O ; P = & ( a [ 0 1 [ O 1 1 ; Ipa = 0 ; w h i l e ( i < 1 0 0 ) (

*p = 5 ; :* a [ i / 1 0 0 ] [i % i O O ] = 5 * . /

Ipa = Ipa + 1;

The full P..\$ normalization algorittim can be suniniarized in Figure 3.7.

Page 41: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure :3.6: Array linearization

Given a n array declaration a[ll] . . - [lk]: for i = 1 . A

k compute .A, = &,+, end for compute S = '$, 4 replace array declaratiori by a[S] for each expression E in tlie program

if E is a reference a [ ~ . - - [EL] compute expression El = (.-II * 51 ) + . replace E by a[Ei]

end for

Figure 3.7: P.-\N Algorithni

for each array a in tlie prograrii if a is referenced 1>y pointer

liricarize( a ) end for for each statement S in the pro, =rani

if S is a pointer assignnierit acld appropriate indes stateiiieri t .\igoritiini 3.3)

end for for each expressioti E i r i tlie prograrii

if E is a pointer dereference replace E by eqiiivalent array reference {see .Algorittirri 3.3)

end for run IV analysis

Page 42: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Chapter 4

Prototype Implementation

4.1 The Parafrase-2 Compiler

A prototype implenientation of tlie LCFX and P.AN metliods descritxd in tlie previous

chapters has been bui lt using the Parafrase-2 compiler environment. Paralrase-? \vas

chosen as a platforni for several reasons. Firstly. Parafrase-2 is capable of ronipiling <'

programs and has a high-level interniediate represen ta t ion wliicli allows source-to-source

transformations. Most importantly. it lias an infrastructure which is extrertiely rvell suiteci

to enabling admissible loop nornializatioii. .-\ nuniber of supporting analyses rvhicli are

necessary for adniissible loop nornialization. iiicluding constant propagation. subscript

tioritializat ion. syiiibolic analysis. aiid especially induction variable ( I L v ) cletrct ion. are

preserit .

I r i addition to the passes rvliich a rc part of the native compiler. Parafrase-2 allows

addi t ional passes to be added by t lie programmer. Eacti pass niani pulates t lie interttie-

diate form to implement code transformations. Tlie P.4.l' code lias beeii iiiiplenieiited as

a separate pass in the Parafrase-2 erivirotinieiit. and the code to inipleinriit LCFS lias

ber11 at tached direct ly to the induction elimiriation pass of Parafrase-2.

4.1.1 Overview of Parafiase

Tlie primary benefit in using Parafrase-2 as a environment is tlie strengtli of its symbolic

analysis jramework [HPSG]. Because of the strongly unified nature of this framework.

Page 43: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

wtiich is based on an abstract interpretation approach [CC((. CCi91. Parafrase'L is able

to irnplement supporting analyses in t h e presence of various sytitactic structures. as well

as symbolic expressions. The induction eliminat ion and constant propagation passes are

based upon t liis framework. The relat ionships among t liese native Parafrase-2 passes.

and the added passes and code are illustrated in the following figure:

u Existing Module n r d - - - - - - - '- - - -, , , ,: Added Code - - - - - - - -> Attached Code

Intermediate Representation e 1 I I PAN I

Code Generation eI The symbolic interpretation engine represents tlie values of source expressions at rom-

pile tinie as riiultivariate polynoniials of prograiii variables in a caiionical suni-of-protliicts

forrn. These abstract symbolic values are central to the conlpiitation of induction rspres-

sioris. and allow the conipilrr to aiitoiiiatically generate nornialized array indices witlioiit

tlie need for furtlier processitig. Tliis caiioiiical representatioii also greatly facilitates

otlier aspects of the iniplenientatioii. For example. deteriiiiriing tliat a given induction

expression is linear in a certain variable and then computing the expressions correspond-

ing to the dope and intercept of tlie linear function is simple once the expression is in a

unique canonical form.

Page 44: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Induction Expression Detection in Parafrase-2

The most important factor in successfully detecting admissible loops is the ability to de-

tect inductiori expressions in the program. This relates directly to the ability to coinpute

trip counts, as well as nornializing array ioder expressions. wliicli a re the priniary factors

i n determining admissibility. .\ltliough many compilers impiement analyses sucli as con-

stant propagation and induction lrariahle detection. Parafrase-2 offers several irtiportarit

features whicb allow a avïider range of admissible loops to be detectetl:

O Induction espressions are represented in a canonical. program-point specifir forni.

and are comprited as explicit functioris of a loop index variable. This is in contrast

to siniple IV detection algorithms. like tliat described by Alio. Sethi. and Llliiiaii

[:\S US61 . whicli express induction variables in ternis of o t lier prograni variables.

but not necessarily in terrns of a single index variable whicli expresses the iteratioii

number of the loop. Also. siniple algorithnis do iiot take into accoiint tlie fart

tliat an induction variable may be modified more tlian once in a hop. aiicl niay

have different characteristic functions' a t different prograni points. Furtlirrniorr.

Parafrase-2 associates ari iiiduct ion expression wit li eacli source ~xpress ion . rat lier

t han simply wit h program variables.

0 Iiiduction expressions can be recogiiized regarclless of t he syntactic forrri of t tir

updates to induction variables. Sonie IV detection algoritlinis operate by sran-

ning source code or pattern rtiatcliing certain types of syntactic fornis. Parafrase-2

cornputes I l s based on tlie seniantics of prograni statenients. not tlieir syiitas. ab-

stracting away these differences. and tlius is capable of recognizing a witler range

of indiict ion expressions.

O Induction expressions wliicli result froni updates along different contrul Row patlis

can be tletected. This allotvs updates to variables along conditional control Row

paths to b e handled. and also allows updates to variables via inner loops to be

- --

' ~ h e chnmctenstic functior~ of a variable is its closed form at a specific point in the source code.

Page 45: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

rnodeled. This is valuable in practice. since we are often not dealing tvith single

loops. but loop nests.

Parafrase-2 is able to detect induction variables which are nonlinear polynomial or

esponent ial funct ions of the loop index variable.

Enabling of IV Detection in C

;\lt liougli the native Parafrase-2 compiler handles C. the induction eliminat ion rnocliile

\vas not fully operational for C prograniç. Hence. several fixes and additions were niade

to enable the module for use in admissible loop tiormalization:

Code tvas added to create an explicit index variable for C' w h i l e loops. In the case

of a while loop. the new index variable is always initialized ininiediately before tlir

wliile statenient. its increment is always 1. and ttie increnient occiirs as the last

statenient in the w h i l e ioop.

a C'ode was added to the induction module to reflect the addition of new variable ir i -

troduceri above. and niodify t tie appropriate interna1 data structures i n the riiotiiil(~.

This ensures t hat t lie synibolir r x e r ~ i t ion enginr will correctly esrrlitr t lie tiiodifirrl

loop. since the source prograni being conipiled is bcing changed.

a Parafrase-2 represents tlie t r ip count of' FORTR-AS DO loops during iiiductior~

variable analysis. and tlic synibolic analysis engine uses t lie values to niodel t lie

entire effect of a loop on program variables. This also enables the detection of

iiiiiltiloop induction variables. Cocie was added to tlie phase o f the induction iiiocliile

in order to return an appropriate abstracf symbolic expression to tlie IV analysis for

while loops. The native IV module extracts a loop count directly for FORTR.43

DO loops. but does not do so for any other type of loop. If the trip count coniputeti

by ttie admissible loop analysis is a constant. tlien a n equivalent abstract syrnbolic

constant is returned. If the coniputed trip count is the result of a synibolic division.

t hen again an equivalent abstract syinbolic expression is returned to t tie inductioii

Page 46: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

module. If the t r ip count involves a cal1 to the C ceil function. a Y Y L L esprrssioti

is rcturned (indicating an unknown trip count) '.

Since much of t he existing Parafrase-2 code is not as well enabled for Ç as for FOR-

TR-AX. several other fixes were made to the existing code to allow C' to be handlecl prop-

erly. Tliis process was Iiiiidered soniewbat by the lack of detailed docuiiientatiori avail-

able on t lie implenientat ion of t lie native ParaFrase-? passes. In addition. t lie Parafrasr-2

infrastructure does riot prevent t h e progranimer from making inconsistent or invalid niod-

ifications to the syn ta r tree. Tliis. combinecl witli tlie lack of native routines to do soiiie

coninion types of manipulations of source code constructs. and sparse docunieritation of

existing routines. also hindered implementation soniewhat.

4.1.2 LCFN Implementation

T h e code to implement LCFN lias been added directly t o the iriductioii elimiriatiori

niodule. Since this module handles induction variable analysis and subscript nornializa-

tiori autoniatically. t h e added code inipleriierits t h e preprocessing riecessary to generatr

canonical loops. and interrupts the induction variable analysis to generate the riecessary

in format ion for canonical loop gerierat ioii as eacli loop is processed. Oiirr t lie i iitli.ict iwi

r l i i i i i nat ion niodule is finished processing. ariy ranonical loops generated are t lien iriserttd

into the code.

4.1.3 PAN Implementation

Code t O i mplenient array access norrnalizat ion lias also been i rnpleniented i n Parafrasr-2.

-411 of the code necessary for the purposes of P.4Y implenientation lias beeii written as a

prepass which makes tlie appropriate changes to t lie input prograrn before the irivoratioii

of the Parafrase-2 induction p a s . Since Parafrase-2 autoniat ically coniputes iioriiializrcl

subscripts for any array accesses in the loop. the PAN prepass first linearizes arrays

where necessary. and then converts aiiy pointer deferences to their equivalent array fornis.

'The symbolic analysis engine of Parafrase-2 can only represent values that are poIynomial functions of prograrn variables. However, [p(x)l, for p(x) a polynomial, cannot itself be represented by a poiynomial,

Page 47: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 4. 1: Experimental Procedure Run FORTR-AN 77 code through Parafrase-2 dependence analysis Generate statistics on parallelizable loops for FORTR-AS code Convert FORTR.4N 77 code to C 4lodify C code to obscure loop accesses from dependence analysis Run adniissi ble loop t ransforniat ion on C code Apply dead code elimination/fises to C code Run normalized C code through dependence analysis and generate statistics

regardless of ivhether the array index expression is an induction expression or not. If so.

the Parafrase-2 induction pass autoniatically completes t h e PAN process by sulist itiitiiig

appropriate induction expressions ( i f any) for the array index. If not. the array acress is

left wi t 11 an unnornialized i [idex expression.

4.2 Experimental Evaluat ion

hi order to evaluate the prototype iniplementation. three benchmark applications were

chosen as sample inputs for t lie admissible loop transformation. Tliese beticli~iiarks are

t lie tomcatc prograni containrd in the SPEC95 benclimark suite and the frnbar aiid

conjugate gradient applications wliicli are part of the Numerical Aerodynaniics Siniiilat ion

( SAS) parallel bencliniark suite [BBS?].

4.2.1 Experimental Procedure

The procedure used to evaluate the implenientatioii is sunimarized in Figure 4.1.

Since the aforementioned applications are codecl in FO RTR.43 77. eacli was convertecl

to C using the G N Y PLc FORTR-AS-to-C translation tool [FCSO]. fk pro\-itled a base

of C code whicli \vas then niodified by tiand in order t o obtain code wliich obeys the

sytitactic requirenients of the Parafrase-2 implementation. In most cases. oiily niininial

changes were required. -Additional niodifications were made to the program loops in order

to introduce difficult ies for t lie dependence analyzer. These niodificat ions included:

(a) conversion of f-c-generated for loops to while loops to obscure index variables and

Page 48: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

loop trip counts from the compiler.

(b) replacement of array indices by expressions involving introduced loop induction vari-

ables.

(c) replacenient of array access expressions by equivalent pointer dereference expressions.

(d) use of varying loop exit conditions.

Changes made t o t h e code to conform to syntactic constraints included 3:

(a) PLc-generated goto statenients replaced by appropriate structureci coristructs.

(b) f2c-generated << operators replaced by iriul t iplicat ions.

( c ) €2~-generated ++. +=. etc operators by equivalent explicit operators4.

(d) explicit code introduced corresponding to FORTR-AN SIX?( and ABS funct ions5.

Dead code eiiminat ion and the irisert ion of declarat ions for variables int roduced 1iy

t lie induction iriodule were acroniplisiiecl by Iiand after the adniissihle loop nornializat ioii

pliase because of difficuities encountered tvitli tlie Parafrase-2 iiuplemetitatioii for C'.

T h e data-drprridc.rzcc p a s of Parafrase-2 provitles a dependence arialyzer for arr-

references shicli have simple linear incles expressions. Parafrase-2 uses tlie gcd ancl

bounds dependence tests to construct a data drpendcncc graph (DDC;). Parafrase also

provides the dotodoall pass. whicli analyzes the D D G in order to mark eacli DO loop as

parallel or nori-parallel. based or1 the clependeiice inforniatioii generatetl. Tlirsr passes

ivere used to generate a count of parallelizable loops for eacli applicatioii. Altliougli tlir

data-dependence and dotodoall passes were only part ially enabled for C'" an e'tt ra rtiiiii-

pass was written wliicli modifieci tlie interna1 C syntax tree of eacli admissible f o r loop

'fLc introduces scalar pointer variables to represent reference parameters in function calls. This technically violates the requirements of admissibility/IV analysis. The Parafrase-2 modules do not analyze C pointer syntax. and since none of the introduced code affects either the dependence analysis or the IV analysis. this does not affect the final results.

4The Parafrase-2 symbolic analysis engine does not support these C operators. '.Although the presence of certain calls in FORTRAN (i . e. SQRT. LOG) does not present problerns

for Parafrase-2, the equivalent calls did so in C. For this reason, these calls were temporarily removed diiring the analysis to alIorv the appropriate IV anaiysis to proceed.

"n particular. they analyzed individual statements properly. but only handled FORTRAX DO loops.

Page 49: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 3.2: Parallel Loop Stat ist ics

to correspond to t hat of a FO RTR.4X DO. This allowed dependence and parallelizatioiis

res-sults to be generated for tlie nornialized C' code as well.

Parallel Loops Admissible Loops Fi7 1 C'

embar (N.-\S) cg (NAS)

tomcatv (SPEC95)

4.2.2 Experimental Results

Number of Loops r

Parallelizat ion Results

Application

10 :3 1 16

Figure 4.2 sumniarizes the results obtained for the tliree bencliniark applications.

-4s can be seen from t h e figure. exact ly the same parallel loops were tletected for each

application in FORTR.4S ancl in C. indicating tliat the depericieiice aiialyzer kvas able to

proccss the loops and deterniine t liat the loops were parallelizable. despi te the preserirr

of tlie problematic C' constructs. Tlius. dependence analysis for (' [vas enablecl.

Dependence Results

1 10 5

Tlir ability of the adniissible loop riorrtializat ion to enable dependence aiialj~sis fur ( '

caii also be illust rated by esaniining selected loops froni t lie bencliniark appliîat ions iii

f i t rt lier detail.

1 10 -3 -

Tlie follorviiig loop frorii tlie enibar application illust rates a parallelizable loop t liat

lias no loop-carried dependeiices. Tlie loop is a simple initialization of an array. as sseeii

in Figure 4.:3.

r\lthougli in FORTR.4N this loop can clearly be parallelized. in C tlie dependence

I

analyzer must report dependence for the loop if tlie assignment *p = O: cannot be

analyzedi. However. in the nornialized loop. tlie syntax matches tha t of the FO RTR.4N

-

-- - . - - -

' ~ o t e however that the C dependence analysis is not properly enabled in the native Parafrase-2 code and it will incorrectly ignore the possible effect of the pointer dereference.

8 28 14

Page 50: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 4.3: Sample loop from embar application

DO 110 1 = O , NQ - 1 Q(1) = O.dO

110 CONTINUE

(a) FORTRAN loop

i = 0 ; p = & ( q [ O I ) ; while(nq - 1 >= 1-1 {

*p = 0; p = p + l ; i = i + 1;

1 (b) unnonnaiized C loop

(c) normdized C loop

loop aiid t lie loop is detected as parallelizable.

\Ve can also look at an eraniple of a luop rvliicli does carry depeiirleiices a n d tliiis

raniiot be parallelized. Siicli a loop can be foiiiid in tlie toniratv application ( s r r Fig-

ure 4.4).

In tliis case. the iniier I loop lias no ioopcarried depencleiices. This loop cari II<.

parallelized. However. the oiiter J loop has a flow and anti dependence carricd betnvrri

tlie assignnient and references t o tlie arrays RX aiid RY respectively. Eacli of tlie four

resulting dependences have a dependeilce distance of 1. Thus. tlie outer loop caiiriot Lw

parallelized. In tlie unnornialized C loop. the array references and the loop trip co~ in t s

have been obscured by use of t h e while loops. as tvell as by the introduction of induction

variables w , VI. and v2 to tlie array index expressions. III the normalized C code geii-

erated by the atlrnissible loop nornialization. tliese array references have been converted

back to expressions involving loop index variables8. T h e Parafrase-9 dependence aiialyzer

'Also. the array d- has been linearized. This is n result of transformations elsewherr in the program.

Page 51: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 1.1: Sample loop from toiricatv

(a) FORTRAN loop

( C J norrnalized C loop

iletects t lie sanie four clependetices for tlie noriiialized C loop. and correctly cleteriiiiiies

tha t t h e inner loop is parallelizal>le. The dependences detectecl for tlie toiiicatv applica-

tion are sunimarized in Figure 4..j9. Eacli of the 16 loops in the prograin are listec!. wit 11

t lie count of dependences detected for eacli. Note tliat t lie parallelizable loops are t hose

wliirti have a dependence count of zero.

-41 t hough t hc dependences detected mat ch exact ly for tliose loops ivtiicli are paral-

Loops L 1, L2 and L 16 are 1/0 loops and were not coded in C. Loop L.3 was inadmissible in C' and t hus not analyzed for dependence.

Page 52: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure 4.5: Dependences for torncat v

lelizable, t here are differences in the dependences detected between the FORTR.43 code

and the corresponding C code in loops L4.L.i.L6.Lï'.L9.L10 and L12. In the case of L6

and L7. this is due to differences in t he cocling of these programs. In particular. the C'

versions of tliese loops have explicit code to compute the FORTR-AN SI AS and ABS

functions. leading to multiple dependences rvliicli correspond to the siniplî fiirirtioii ralls

i n FORTR-AS. Tlie other loops require furtlier arialysis. ancl are sliowii in Figure 4.6

(loops L 12 and L 13 are shown in Figure 1.4).

In loops L-I and Lei. there is a single extra output dependence cletectecl by tlir

Parafrase-2 dependence analyzer because of the lincarization of the ana!- dd. wliicli i i i

Figiire 4.6(a) is assigiied by t lie two-diiiierisio~ial reference DD( 1. J). but in ( ' lias Iwcii

cotiverted to dd(5 14 * ixi + ix2 + 10301. .4lt hougli tliere is rio actual depeiidriice. tlir

Parafrase-2 dependerice anall*sis is relatively tinsopliisticated. and cannot analyze the

linearized reference. Because botli loop inclex variables appear in C witliin a single ar-

ray index expression. Parafrase-2t reats t lie o t lier index variable wit liin each loop as an

unanalyzed symbolic variable wit liin t h e array index. and t lius reports dependence. .A

siniilar probleni occurs witli loop L12 . In tliis case. the original FORTR.4N loop coiints

downwards with an increnient vaiue of -1. but the normalized C' code has converted the

loop t o couiit upwards witli an increment value of 1. As a resiilt. t h e array references

RX(1, J) and RY(I, J ) beconie rx[ix9+21 C-ixlO+n-21 and ry[ix9+2] C-ixlO+n-21 re-

spectively. The variable n then becomes a n ex t ra symboiic value in t he array reference.

Page 53: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

leading to an extra dependence.

In ioop L 10. t liere a re an extra output. Rotv and anti depeiiderice detected I>ecaiisr

of the reference and assignment to the linearized array d- in the loop. Siniilarly. in loop

L9. there is an extra output dependence because of the assignnient to ci-. Hotvever. in

the case of L9. the total number of dependences detected is actually smaller because ttvo

invalid dependences (antidependences wit h dependence distance - L ) wbich are erroiieoiisl>.

detected by Parafrase-2 in FORTR-AN do not appear in C.

Figure 4.6: Xon-parallelizable loops in toriicatv

(a) L4. L5 (F77) I (b) L4.LS (C)

(d) L9. L IO (C)

Page 54: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Chapter 5

Related Work

Tlie problem of supporting dependence testing in C has also been tackled by Justiarii

and Hendreii [JH94]. They have implemented support phases to convert some types of

C loops to a canonical forni similar to that implemented by LCFS. witliin the MC'C-AT

ronipiler environment. Tlie differences between t h e MCCr\T aiialysis and tliat of L('F9

cari be suniniarized as follows:

(a) .\.ICC.AT assunies tliat an' loop is defiiied as a f o r loop witli an explicit initializatioii.

increment. and test of a single loop variable'. Tlius. SICCAT does iiot Iiaiidlr

missing or implicit index variables. nor does it conipute loop trip couiits.

(b) 'rlCC.4T does not allow the loop body to niodify loop control variables. Siirli ~iiotli-

firations are tletected 11- SICC'r\T. but result in the loop beiiig iiiarked as iiiadmis-

sihle.

( c ) .LICC'..\T analyzes scalar pointer references. but st il1 requires t liat array references

be made using explicit index expressions. .\rray references made t hrougii pointers

are not hancileci.

(d) 'lCC:.-\T incorporates analysis of scalar pointer references into the induction variable

detect ion and the su bscript iiornialization process. supported by t lie st roiig points-

to-artalysis alias analysis available in t tie MCC AT compiler. Tlius. MCC AT ran

' A scalar pointer variable of the form *p can be used as the loop index variable.

Page 55: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

incorporate the effects of such dereferences into the IV analysis. LCFX and Pr\S

do not deal wi th scalar pointers.

(e) MCC.4T also can analyze some types of stack-based aliases betweeii array ciames.

so that an array which is accessed using a different namr can be cietected. P.43

makes sirnplifying assumptions about the form of pointer assignments to make the

relationship between pointer and array unarnbiguous.

The priniary strengtli of the aiialyses iniplemetited in hICCr\T over tliosr preseiit i n

Parafrase-2 is its abili ty to handle sralar pointer references and incorporate t lieir efferts

i oto t lie ot her analyses needed to implement acimissi ble loop normalization. Conversely.

LCFN and Pr\N handle a wider range of syntactic structures witli respect to loop control

flow. and can analyze sonle types of pointer-based array references. whicli SICC.4T does

not handle.

There are several research projects currrtitly irivolved witti developing exteiisions ro

t lie C++ programiiiing language. for t lie purposes of parallel coniput ing. The pC'++

project [BBGSI] and the CC++ project [C1\'9S] have definecl extensions to the C'++

language for the purposes of providirig a mode1 for parallel C'++ prograiiiniing. I i i

addition. the HPC++ project [BG.IS.i] lias focusetl on providing a runtirne library. as wrll

as compiler directives to provide parallel prograiiiniing support. HPC'++ provides loop

directives for the piirposes of paralleliziiig well-beliaved loops under coiiditioiis irliicli

are siniilar to those defiiied by a canoriical loop. I n particular. the HPC-INDEPENDENT

direct ive allows t lie conipiler to parallelize loops. provided t liat t lie loop:

(a) Tlie loop is a for stateirient.

(b) Tlie loop terniinatioii condition i~ivolves oiily loop I l s .

( c ) The loop update condition only modifies loop M.

The niost important aspect of admissible loop normalization is induction uarinbfr

arialyszs. IV analysis is needed to iiormalize array su bscripts. corn pute loop trip counts.

and resolve array indices for pointer-based array references. Tlie siniplest IV algorithnis.

Page 56: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

such as tha t described by i\ho et al. [r\SUY6] detect variables wliose only assigrinients

[vit liin a loop are increments or decrenients by a constant value. This allows such variables

to be classified as linear functions of other IVs. However. since IV algoritlinis w r e initial1~-

used for t lie purposes of strengt li recluctioii. ttiese are not necessarily expressed iii ternis

of a loop index variable. Furthermore. tlie ;\ho et al. algorithm depends on detertirig

sperific syiitactic forriis for updates. and does riot deal rvitli internai loop roritrol How.

CVolfe [Wo192] descri bes a niore advaiiced IV algori t lin1 designed sprci tically for t lie

purposes of advanced loop transformations used in paralleIizing conipilers. Wolfe's niet hod

uses an analysis based on the SS.4 form [c'FR911 to find iinear induction expressions i n

loop nests. Wolfe's metliod is capable of detecting multiloop indwt ion cnriablcs in ~ ~ l i i r l i

the initial value or step of the induction variable occurring in an inner loop may Vary iii

an outer loop. Wolfe's niethod is also capable of detecting otlier types of iridiictiori r s -

pressions. includirig rvrap-around variables. periodic variables. and riioriotonic variables.

LVolfe's IV analysis is also used for the purpose of coniputing !oop trip coiints. an(l his

rnethod lias been estended iri tliis tliesis to deal with cases i r i wtiicli synibolic esprrssioris

are present.

IV techniques like tliat of \\elfe. and tliat in the Polaris conipiler [PESI]. are also

capable of detecting nordirrmr. induction imiablrs. wliicli are polynoniial or geotiirtrii-

fiinctions of the loop incles. Tlicse t!-pes of IL-s cati arise iii triangular loop iicsts. ~vliicli

appear iri soiiie scieiitific applications. or in sitiiatioris in wliicli aii IL- is iipdatecl b>- a

noricoiistant value on each loop i terat ion.

Tlie IL- analysis exist ing in the Parafrase-2 conipiler is also capable of tletect iiig rionliii-

ear induction variables. and can detect niultiloop IVs witliiri loop nests. The Parafrase-2

ILr detection is very strong. and as sucli provides an excellent franiework on wtiicli to

base admissible loop tiornializatioii. Tlie Parafrase-2 IV aiialysis is based on a symbolic

nrinîysis frameuwrk. in wliicli t lie source prograiii is executed wi t lii n an abstract tloriiaiii

representing the synibolic values of prograni variables at run tinie. Parafrase-2 is ca-

pable of representing expressions whicli are multivariate polynoniials or esponentials.

Parafrase-2 syiiibolically executes loops. and uses symbolic interpolation to at teiiipt to

fit tlie sequence of values assumed by a giveri expression to a polynoinial or esporiential

Page 57: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

funct ion. Because Parafrase's approacli is hased on sy nibolic execut ion as opposecl to

pattern-matching or ad tioc approaches. it detects IVs on a program-point specific b u i s .

In addition. it handles multiple updates to It's. equivalent updates along different control

flow paths. and autoniatically models the effects of inner loops on IVs if the loop trip

count is known.

The problem of accurate alias analysis for C progranis is one to which increasiiig

attention has been paid. but for rvhicli solutions anienable t o practical ilse in rra1 compiler

systems have not yet beeii developed. Early work in detecting alias relatioiisliips i i i

programs [ B a n B . Bari?] focusrd on FORTR.-\N-li ke prograriiriiiiig languages. in rvliicli

t lie primary source of aliaçes is the use of reference parameters in procedure calls. Recent

work has focused on attempting to hanclle the more complex types of pointer relationsliips

tliat occur in C progranis as as result of various C features. These iiiclude:

O the creatioi~ of new pointer relationsliips witli the C' & operator.

a niiiltilevel pointer references ( i . e. **p).

a interprocedural alias relationsliips. including ttiose created by recursive futictioris.

O pointer analysis for dynaniically allocated data structures. as well as statically

allocated variables.

O the use of firnctiori pointers in C'.

0 t lie use of type casts aiid pointer arit hriirtic.

hl1 of tliese aspects of C cati cause sigiiificant coiiiplications for an alias analysis

schenie. -4 basic probleiii occurs in haiitliing dyiiariiically allocated data structures be-

cause t iiere are not explici t rianies for Iieap objects. as coiit rasted to statically allocated

objects on tlie stack. In order to liandle reciirsive da ta structures. sorrie tecliniqties

[LR92. .lhlSl] liniit the depth to rvliicli recursion is niodeled by k-lirnitirig to keep a

finite number of object nanies. This. however. cari lead to overly conservative informa-

tion. Ernarni. et al. [EGH94] and Miilsoli aiicl Lani [WL95] botli describe scliemes to do

context-sensi t ive interprocedural alias analysis for C programs. Enianii. et al. at teiiipt

Page 58: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

to separâte stack-based and lieap-based pointer relationships. and reanalyze a procedure

for each of its calling contexts. This approach has exponential complerity in the worst

case. however. Wilson and Lani attempt to avoid tliis probleni by sumniarizing orily

t tiose relat ionslii ps between procedure parameters t liat act ually occur in t lie prograin.

They also analyze pointers at a low Ievel to avoid problems caused by type cirsting and

conipound data structures. Hummel. et al. [H HSS?] describe a sctieiiie for a~ialyzirig

coniplex pointer data structures sucli as trees and linked lists. but reqiiire inforniation

specifying relevant properties of the data structures being used.

Page 59: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Chapter 6

Conclusion and Future Work

6.1 Conclusion

Exist ing dependence analysis tecliriiques rely on t lie con~piler's abili ty to co~is t riict ap-

propriate dependence equat ions froni t lie prograni source code. Th i s requires a regiilar

loop syntax. norrnalized array references. aiid a known loop trip coiirit. Eriabling depen-

dence analysis in the C language relies on being able to transforni as many of the diverse

possible syntactic structures for loop control Row and array references tiiat are availahlr

to tlie programmer i c i C into equivalent foriiis witli whiclr the dependence analyzer caii

effectively deal. The priniary corriplicatioris that arise in (' resiilt froni tlip iisr of riuri-

f o r loop syntas. iniplicit or niissiiig loop control coiistructs. and the lise of poiiitcts t u

reference s ta t ic arrays.

LC'FN lias bem iiiipleniented in the Parafrase-2 parallelizing roiiipiler to convert rrr-

tain types of C loops into a canonical forni obeying the basic assurnptions of depeiitlencr

aiialysis. LCFY uses induction variable analysis to cotiiptite a t r ip coiiiit for tlie loop

aiid to cornpute noriiialized expressions for array access expressions rvitliin the loop. froiii

whicli a canonical for loop can Le generated.

P-AX tias also been irnpleniented in Parafrase-2 to allow the compiler to handle sitriple

fornis of implicit array references inside loops via pointers. PAN also uses induction

variable analysis to a t tempt to re-extract implicit array index expressions froni array

references. In the case of multidimensional arrays. P.4N uses array linearization in order

Page 60: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

CHAPTER 6. CONCLUSION A N D FUTURE WORK 54

to resolve index expressions in the presence of multiple dimensions and multiple enclosing

index variables.

The LCFN and P-\N implementations were tested on saniple FORTR-AN benchmarks

extracted from t h e Nr\S and SPECS5 benchniark suites and converted to C. -4fter loop

normalization was applied. t h e Parafrase dependence analyzer ivas able to detect tlie

same parallel loops in C as in the original FORTR.45 programs. iridicating that the

dependcnce analysis process was successfully enabled for C. However. the linearizat ion

of arrays was found to cause probienis for simple dependence analyzers which are not

capable of handling symbolic terms in array reference expressions.

6.2 Future Work

Tliere are several ways in ivhicli t lie range of loops wliicli are detectable as adriiissible

loops might be estended. Some of these are sumriiarized as follows:

0 Handling of scalar pointer refrrences.

The most obvious way to extend LCFY and P;\Y would be to merge tliosr analyses

wit li t lie t ype of scalar pointer alias analysis available in the .LICC'.-\T compiler.

Botb scalar and array pointers tend to he tised fairly iviclely i i i tlic C' laiig~iage.

and a compiler iriust be prepared to deal ivitli I~otli. Because Parafrase-2 cloes iiot

have a C' alias package natively tvliicli lias the strengt 11 of the alias atialysis preseiit

in hIC'C.-\T. scalar pointer arialysis lias not been iiicorporated into t lie Parafrase-2

implementat ion. However. t lie unified nature OF Parafrase-2's symbolic esecut iori

engine would allow the supporting ailalyses siich as constant propagation aiid IV

eliiiiination to be enabled by incorporating the effects of pointer operatioiis iiito

t l i e syniliolic esecut ion engine.

a Computat ion of loop trip counts froni nonlinear induction expressions.

Given that many compilers are now able to detect nonlinear induction variables.

it is reasoriable to at tempt to extend trip count computatioii for TCTEs wliicli

are polynomial functions of the loop index variable. -4s in the linear case. t liis is a

Page 61: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

matter of finding the smallest value of the loop index variable such that the function

f representing the TCTE is nonpositive. Doing this for a n arbitrary polynoinial

function is difficult. particularly if symbolic terms are involved in the function.

However. it may be possible to do tliis in cases that arise in practice wliere / is a

quadratic or 3rd order polynomial and the roots of the polynomial can b e computed

analyt ically.

a Computation of loop trip counts from boolean exit conditions.

The c u r r e ~ t LC'FY implenientation is capable of handling loop exit espressiotis

tliat are arithmetic comparisons. .An additional extension to tliis would b e to al!ow

exit conditions which are boolean combinat ions of such aritlinietic coniparisoris.

using AXD. OR and NOT operatioris. However. it is ii1;certain wliether siicli exit

conditions would ocrur ofteri enougli in real progranis t o be usefiil. Iii ordt-r tu

implement these types of operators i n geiieral. the compiler would 11t.t.d to reprrsriit

r a n g ~ s of iterations ovcr whicli a given condition is true. since an AND operation

requires t hat bot li of the component conditions be t rue siniultaneously. Obtainirig

a loop trip count espression would involve esecuting the appropriate intersertioii or

union operat ions on iterat ion ranges corresponding to eacli arithniet ic coniparisuri

condition. These operations are easy to iriipleniciit bvtiere the t)oiit~cls of ranges are

known roiistants. but are prolleniatic where the boitiids irivolirr s>-iiil>olic trriris.

However. Bli in~e aiid Eigrniiiaiiii [BESJ] have inipleiiienteci a n estriisioii of tl ir

range test in the Polaris conipiler for cortiputing syniholic ranges a t conipile t iriitl.

Page 62: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Appendix A

Induction Variable Analysis

Induction varia6ie ( I V ) anoiysis is a n important part of man- optimizing and parallelizing

compilers. r\lthough t he exact definitioo of an induction variable has not altvays been

conipletely consistent. and lias tended to evolve aloiig wit li t lie techniques for dr tect irig

t hem. a fairly general definition can be given as follows:

Definition 3 Given a loop L and a variable v. the variable v is an iriducliori cartabic in

L i f t lie sequence of values assunied Iiy v at a given prograni point witliin t tic esecut ion

of ttie loop L can be rcpresented by a fiiiiction /(i). sucli ttiat i represerits ttie iiiiriiber of

a particular iteratioii of L. and f is an analytir fiinctiori of a certain type. Specifical1~-. f

may be of several different types:

(i) j is a linear function of i. of the forni

f(i) = n x i + 3. wliere a and J' are corlipile tinie constants o r iiivariarit espressions

in L.

(ii) j is a polyiioniial function of i of the forni

j(i) = au + n l i l + - - - + n,,in wllere rl is a compile t ime constant. aiid UO.. . . .CL,, a re

either constants or invariants i r i L.

( i i i ) j is an exponential function of the forni

j(i) = @(') wtlere c is constant or invariable in L. and g(i) is a linear functioii of i.

Page 63: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Following the terminology given by Haghighat and Polychronopoulos [HPSG]. the

function f can be termed the characteristic function for the IV v. Xote tliat a given IL'

may have different characteristic functions at different prograni points. Tliese different

types of induction variables are illustrated in Figure .A. 1. IL'S can also have ciiffereut

characteristic functions with respect to difTerent loops in a Ioop nest. The concept of ari

induction variable can aiso be extended to define a n induction ~xpression [HP96]. rvliirli

is any program expression whose value can be represented as above.

Linear induction variables often ocrur as a result of accessing arrays by step witliin

a loop. and are illustrated in Figure ;\.l(a). Typically. a linear IV occurs as a resiilt of

an update by a constant or invariant expression on eacli loop iteration. biit ttiis cari also

ocrur as a result of linear conibinations of otlier IL-S. Figure ;\ . l(d) also illiistratrs a

linear IV that is updated through the effect of a n inner loop. as opposed to an esplicit

assignment. Polynoniial induction variables coninionly occur iii triangiilar loop riests. as

illustrated in Figure A.l(c). Figure :\.L(li) illustrates a n esponential IV. wtiicli arisrs

from a multiplication on each loop iteration instead of an addition.

Slany siniilar but differeiit defiiiitions of iiicluction variables Iiaw beeri presentrd i i i

tlie literature. The definition presented by Haghighat aiid Polycliroiiopoiilos [H PSG] is

closest to tliat presented liere. becarise it is a defini tion basecl uii program seiiiaiitics. as

opposed to ones based on the syritactic foriiis of ~ a r i a b l e iipdates. This type of cirfiiiitiuii

is desirable l~ecause i t clearly different iates betweeri what an induction variable is froiii

tlie niethods used by a compiler to detect tlieni. Sfost of the definitioiis of the latter

kind rlefine an inductiori variable as a variable wtiose assignnients irittiin the loop have

a specific forni. generally tliat of a iricreiiieiit statenierit v = v f c. The itiost basic IV

defiriitions. such as tliat giveii I->y Aho. Setlii. a n d I-Ilnian [.-4SV86] require c to Iw a literal

constant. Others. sucli as tliat given by Pottenger [Po951 alloiv increiiieiits by loop in-

variant values. as well as CO upled irlductiorls in ivliich IVs niay appear in the increriierit of

ot lier induction variables. Wolfe [Wol92] clifferentiates between basic induct 2071 rnriables.

wliicli are obtained througii simple increiiieiits or coupled iriductiotis. and other iiiduc-

tion variables whicli are linear combinations of ot her IVs. LVolfe also defiries rnultiloop

irlduction ~larzables. tvliich are variables tliat occur in an inner loop. and wliose initial

Page 64: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Figure A. 1: Example programs involving induction variables

(a) linear IVs

(c) polynomial IVs

(d) IV with nested Ioops and controI flow

value is an induction variable in a n outer loop. ivhile being incremented in the iiirier looo

Iq a value that is invariant in the outer loop.

.A long wi t ti t lie various defini t ions of iiidiict ion variables are different itiet liods for

detecting thern. Approaches that define induction variables in terms of the syntax of

tlie variable upclates typically rely oii tlie siniplified forni of ttiese updates to derive the

cliaracteristic formula for the variable. For esaniple. Alio et al. [.4SLYY6] require tliat al1

updates to a variable by of the forni v = v + c. wliere c is a constant. althougti multiple

updates are allowed. I'nder these simplifying assurnptions. IVs can Le detected via a

simple scan of tlie source code. after loop invariant conipiitation lias beeii done. How-

ever. t tiese type of approaclies iniplicit ly assume t liat t lie conipi let- cari al ways deterini ne

irnniediately which variables are referenced by each statement, whicli is not always pos-

Page 65: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

sible in real programs. More corn plex a p proaches use more corn plex support ing analyses

to detect IV's while taking into account t h e effects of nested loops. as well as controi

flow within the loop. Wolfe [LVo197] uses a n approach based on SS.4 forni. to relate

the detection of different types of induction variables to certain types of grapli-theoretic

problerns. Pottenger [Po%] describes an algori t hm whicli recursively models inner loops.

wliile computing the effects of variable updates by additions or multiplications. The total

coniputed effect on a variable in a single iteration is used to derive the closed form. Very

advanced approaches. such as that described by Haghighat and Poiychronopoulos [HP961

operate by execut ing the loop iii a n abst ract domain ( abst rart iritcrprctaLiori) aiid de-

riving closed forms for induction variables from the sequence of values obtained on eacli

s i~nula ted iteration. In particular. tliis approacli uses Newtori's interpolatioii forniula to

fit the sequence of values obtained for t lie variable to a polynomial or exponential func-

tion. This type of approacli is very powerful because it depends only on the operatioris

that can be symbolically modelled. not on specific syntactic forms.

Figure A.%: Incluction variable eliniination

liv = O ; for(ix=O; ix < N; l x + + ) {

liv = liv + 3 ; a[liv] = 0;

f

(a) loop with IV

C a ( 3 * ix] = 0;

1

(b) loop dter IV elimination

Induction variables are iniportarit in parallelizi ng conipilers tiecause t heir detect ion

often allows the compiler to eliniinate dependences wi t liin a loop t hrougli inductiorl cari-

ablo climination. as illustrated in Figure -4.2. In Figure ..\.?(a). the variable liv liinders

parallelization. because the assignment statement causes a n output dependence in t lie

loop. However. in Figure .4.2(L). in wliicli liv lias been elirninated and the array index

replaced by the equivalent expression ( 3 * i x ) . this dependence does not exist. Note

tliat early uses of IV analysis atteriipted to detect IVs for exactly the opposite reason:

narnely. to make loop execution niore efficient for serial machines througli strrngth rr-

Page 66: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

duction. In Figure .-\.-(a)? the relatively expensive multiplication in each loop iteration

lias been replaced by a less expensive addition operation. so the code in Figure .I.?(a)

is actually preferable on a serial machine. For the purposes of admissible loop nornial-

ization. induction variable analysis is needecl in several contests. Firstly. the ability to

detect linear induction variables is necessary in order to compute loop trip coutits. iising

the techniques described by Lb'olfe [Wo13'2]. IV analysis is also important in subscript

iiormalizatioo. wliere array subscript expressions are rewritten in terms of enclosing loop

index variables. In a siniilar way. IV analysis is needed in order to detect the array accrss

patterns caused by pointer arit hmetic operations.

Page 67: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

Bibliography

iAKS-11 J . Allen and K. Kennedy. PFC: a program to convert Fortran to parallel forni. Supercomp ut ers: Design and .-lpplications. K. Huang. editor. IEEE Coiriputer Soci- ety Press. pp. 186-203. August 1984.

[.-\SR36] -4. Aho. R. Setlii. and J . l'llmaii. COmpilrrs: Priricipl~s. Trchniqurs and Tool.5. Addison-Wesley. Reading. MA. 1986.

[Ban791 J. Banning. An efficient way to find the side effects of procedure calls and the aliases of variables. Conferencc Record O/ the Sixth rlnnual .IC.Cl Symposium on Pn'nciples of Programming Languages. pp. 29-4 1. .January 1979.

[Bans S] U. Baiierjee. Dependence .A nalyszs /or Supercornpuiing. liluwer Academic Pii b- M e r s . Boston, Massachusetts. 1988.

[BarTi] J . Barth. An interprocedural data florv analp is algorithm. Conferencf Record o j

the Fourth .-l CM Symposium on Pnnciples o/ Programrning Languages. pp. 1 19- 1 3 1. January 1917.

[BB91] D. Bailey. E. Barszcz. J . Barton. D. Browning. R. Carter. L. Dagurii. R. Fatoolii. S. Fineberg. P. Frederickson. T. Lasinski. R. Schreiber. H. Simon. V. Ienkatakr- islinan and S. Weeratunga. The S.AS Parallel Bencliniarks. R N R Technical Report RS R-!l-l-OO7. hlarch 1994.

[BBC;91] F. Bodin. P. Becknian. D. Gaiiiion. S. Sarayana and k'. Shelby. Distributecl pC++: basic ideas for an object parallel language. Procwding.5 of Supercomputirig 9 1. pp. 273-282. Xovember 199 1.

[BC'S6] SI. Burke and R. Cytron. Interprocedural dependence analysis and paralleliza- t ion. Proceedings of SIGPL.4.V 86 .S~mposium on Compiler Coristruction. pp. 162- 1 i'X .June 1986.

[BCKTS] C. Banerjee. S. C'ben. D. Kuck. and R. Towle. Tinie and parallel processor bounds for FORT RAN-li ke loops. IEEE Transactions on Cornputers. vol. 28. no. 9. pp. 660-670. September 1979.

[BE941 W. Blume and R. Eigenmaiin. The range test: a dependence test for symbolic. non-linear expressions. Proceedings of Supercornputing 94. pp. 528-537. Noveniber 1994.

Page 68: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

BIBLIOGRAPHY 62

[BE951 W. Blume and R. Eigenmann. Synibolic range propagation. Proccedirrgs of thr 9th International Pa rafle1 Processing Symposium. pp. 357-363. -4 pril 1995.

[BGJ95] P. Beckman. D. Gannon and E. Johnson. Portable parailel progranirrii tig i r i

HPC++. Available a t http://www.ext reme.indiana.edu/hpc++/docs/ppphpc++/icpp.htriil

[BGS94] D. Bacon. S. Graham. and 0. Sliarp. Compiler Tratisformations for Higli- Performance Comput ing. .4 CiCl Cornputirig Sumeys . vol. 26. no. 1. pp. 345-420. December 1993.

[C'CîÏ] P. Cousot and R. C'ousot. Abstract interpretation: -4 uriified lattice niodel for static analysis of programs by constrliction or approximation of fispoints. Procwd- ings oof the .#th rlrinual .4C.CI Symposium on Prïnciples of Progranzming Lariguagrs . pp. L3S-252. .January 197'7.

[CC791 P. Cousot and R. Cousot. Systematic design of program analysis franieworks. Proceedings of the 6th .Innual AC.1.1 Symposium on Pr-inciples of Progrnmming Lari- guages. pp. 84-79. January 1979.

[CFRSI] R. Cytron. d . Ferrante. B. Rosen. and 51. Wegman. Efficiently computing statir single assignmen t forni and t lie cont rol dependence grapli. .A C.11 Trarzsactior~n or1 Programming Larcguages and Sys t~rns . vol. 13. no. 4. pp. 4.51-490. Octobrr 1991.

[Cl<93] K. Chandy and C. Iiesselman. CC++: A declarative concurrent object-orie~itrïl programming notation. Research Directions in Concu ment Objtct-Orif nted Pro- gramming. G. Agha. P. Wegner. and A. konezawa. eds.. MIT Press. pp. 2S 1-3 13. 1993.

[EGH%] 51. Eniami. R. Gtiiya and L. Hendren. Context-sensitive interprocedural points- to analysis in t lie preserice of funct ion pointers. Procccdings of thr 1994 SIGPL-4 .\- Coriferencr orz Programining Lnrcgiiagr Dcnign nr2d Irnplrnwrltntiori. pp. 242-256. June 1993.

[EH%] L. Hendren aiid -4. Erosa. Tmiing coiitrol Row: A structurecl approacli to rliiiii-

iiating goto statenients. Proce~dirqs of the 1994 Internatiorzal Corz fcrcrzr-c 071 Corn- puter Lnnguagrs. 1 E E E Coni puter Society Press. pp. 29-240. May 1991.

[EHLSI] R. Eigenmann, .J. Hoeflinger. 2. Li. and D. Padua. Experience in t h e aiito-

matic parallelization of four Perfect-Bencliniark programs. Procecdings of th€ Fou rth rlrinual Workshop on Languagcs and Compilrrs for Parallet Computing. Springer- Verlag. L N C S 589. pp. 65-82. -4ugust 1991.

[FC;90] S. Feldnian. D. Gay. M. Mainione and N. Scliryer. A Fortran-to-C Converter. Coniputing Science Technical Report Xo. 119. ATtT Bell Laboratories. 1990.

[G KT9 11 G. Goff, K. Kennedy. and C. Tseng. Pract ical dependence test h g . SICPL.4 ,V Notices. vol. 26. no. 6. pp. lr5-29. June 199 1.

Page 69: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

BIBLIOGRAPHY 63

[GPSY] M. Girkar and C. Polychronopoulos. Compiling issues for supercornputers. Pro- ceedings of Supercornputing 88. pp. 164- 172. 'lovernber 1988.

[HHN94] J . Hummel. L. Hendren and .A. Nicolau. .A general d a t a dependence test for dy- narnic. pointer-based data structures. Proc~edii i~s of the 1994 SIGPL.4 .V ('on fcrt7rlcf on Programmirzg Language Design and Implemcntation. pp. 2 1s-229. .lune l99-I.

[HP961 41. Haghighat and C. Polychronopoulos. Symbolic analysis for parallelizing coiii- pilers. ACM Trartsactions on Programming Languagrs and Systenls. vol. 1s. 110. -1. pp. 4i7-v51Y. .July 1996.

[HP901 41. Haghighat and C. Polyclironopoulos. Symbolic dependence analysis for tiigli- performance parallelizing corn pi lers. Proce~dirigs of the Third -4 nn ual Ct'orlishop or1 Languages and Compilers for Parailcl Coniputirig. pp. 9 10-XIO. A ugust 1990.

[ J Hg41 Just iani and L. Hendren. Support ing array dependence test ing for an opt i miz- ingJparallelizing C compiler. Proc~edirzgs of thr 5th Inteniationnl Conjerrrrcr or1 Compiler Constructiorz. CC094. Springer-Verlag. L C N S 786. pp. 309-323. April 1991.

[JMS 11 Y. .Jones anci S. Muclinick. Flow analysis and optimization of LISP-like strur- tures. In Program Flow ..lrialysis. Theory. nnd ;Ipplications. Prentice-Hall. S. .LIucli- nick and Y. Jones. ecls.. pp. 102- 13 1. 1 OS 1.

[liRSS] B. Kernighan and D. Ritcliie. Thc C Prograrnmirig Larlguagc. Second eriitiuii.

Prentice HaIl. 1988.

[LRW] W. Landi and B. Ryrler. .-\ safe approsiniate algorithni for interprocedural pointer aliasing. Proceedings o j th€ 1992 SICPL.4 .V Symposium orr Progrnnciriirtg Larlgungc Design and ln~plerncntation. pp. Z3.5-2-118. .lune 1992.

!LL'Z9O] 2. Li. P. hw. and C'. Zhu. 1990. Data dependence oii riiulti-dinierisional arraj- reierences. IEEE Transactiorln ori Parnllrl nrld Dist ri6 utcd Sgs t~ms . vol. 1. ilo. I . pp. 26-34. January 1990.

[SIHLSI] D. Maydan. J . Herinessy and SI. Lani. Efficient and esact data deperirleiiîr aiialysis. Pror~cdings of thc -4 C.11 SICf L.-!.V 91 ( o n fcrrrtcr ort f rogran~rnirig Lnrl- guagr Design arid Irnpl~m~ritation. pp. 1-14. 1991.

[PE94] B. Pot teiiger and R. Eigeiiniariii. Parallelizatioii in t lie presence of generalized induction and reduction variables. Technical Report 1396. I:niversity of Illinois at ITrbana-Chainpaign.

[PGHSO] C. Polychronopoulos. .LI. Cirkar. .LI. Haghighat. C. Lee. B. Leuiig. ancl D. Scliouten. T h e structure of Parafrase-2: an advanced parallelizing corn piler for C aiiti Fortran. Procwdings O/ th€ Third A r tr i ual IVorkshop on Larcguagra and Conlpilt-rs for Parallel Computing. 311T Press. August 1990.

[Po951 \IV. Pot tenger. Inductiorr \/briable Substitut ion and Reduction Recognition in the Polaris Parallelking Compiler. Master's thesis. University of Illinois a t Iirbana- Champaign. 1995.

Page 70: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

[Pug92] W. Pugh. A pract ical algorit hm for exact array dependence analysis. Comm u-

nications of the .K.CI. vol. 3.7. no. 8. pp. 102-114 hugust 1992.

[Toivï6] R. Towle. Control and Data Dependence /or Program TransJormations. Ph D t hesis. ïniversi ty of Illinois at Vrbatia-Chanipaign. Marcti 1976.

[WB871 51. tVo!fe and C. Banerjee. Data dependence and its application to parallel processing. International Journal of Parallel Programming. vol. 16. no. 2 . pp. 137- 178. April 19SÏ.

[WL95] R. Wilson and hl. Lam. Efficient Context-Sensitive Pointer hnalysis for C Pro- grams. Proceedirigs O f the 1995 SIGPLAiV Conferencc or2 Prograrnmirlg Languagc Design and fmpiernentation. pp. 1-12. June 199.5.

[WolS9] M. Wolfe. Optimizing Supt-rcompilcrs for S u p e rcomputcrs. Researcli Motiograp hs in Parallel and Distributed Computing. IIIT Press. Cambridge. hlassachusetts. 19S9.

[Wol92] M. Wolfe. Beyond induction variables. Proceedings of the SIGPL.-I.\- 92 Con- ference on Programrning Language Design and Implementation. pp. 162-174. .lune L9Y-2.

~actiorzs [LVT92] 51. Wolfe and C'. Tseng. T h e power test for data dependence. IEEE Trarl: o n Parallel and Distributed Sgstems. vol. 3. no. 5. pp. 591-601. Septeniber 1992.

[ZBGSS] H. Zima. H. Bast and H. Gerndt. SVPERB - a tool for semi-automatic .LI I.LID/SIiLID parallelization. Parall~l Computirig. vol. 6. pp. 1- 1 S. June 19SS.

Page 71: of · Parafrase-2 compiler. Experimental results generated using the SPEC9.5 and NAS bench- mark sui tes showed t hat t hese techniques can successfully enable dependence analysis

I MAG t tVALUATION TEST TARGET (QA-3)

APPLIED IMAGE. lnc - = 1 653 East Main Street - -. - Rochester. NY 14609 USA -- --= Phone: 7 1 6i482-0300 -- -- - - Fax: 71 61288-5989

O 1993. Applted Image. Inc.. All Rights Reserved