[ieee the ieee computer society's 12th annual international symposium on modeling, analysis,...

8
Statistical Selection of Compiler Options R.P.J. Pinkers P.M.W. Knijnenburg M. Haneda H.A.G. Wijshoff LIACS, Leiden University, The Netherlands {rpinkers, peterk, haneda, harryw}@liacs.nl Abstract Compilers have many switches or options that enable certain code optimizations. How- ever, it is well known that the optimal set of op- tions to be turned on is dependent on both the application as well as the target architec- ture. In many cases, standard settings like -O3 produce suboptimal results due to negative in- terference of some of the options they contain. In this paper, we propose an automatic itera- tive procedure to turn on or to turn off compiler options. This procedure is based on Orthogo- nal Arrays that are used for a statistical analysis of profile information to calculate the main ef- fect of the options. We show that our approach outperforms -O3 of GCC on six SPEC bench- marks. 1. Introduction Ever since compilers were first invented, many code op- timizations have been proposed. In fact, many of the op- timizations that are present in modern compilers were al- ready present in the very first Fortran compiler [1]. Today, compilers incorporate dozens of optimizations that often can explicitly be enabled by the programmer using compiler switches or options. Compilers generally have pre-defined sets of options that are enabled by -Ox switches. These sets of options have been coded into the compiler based on ex- perience and judgment of the compiler developer. Mostly, the higher the -O level, the more options are enabled. How- ever, interference between these options is not taken into account. As is well known and shown in Section 2, each ap- plication requires its own specific setting of these options to obtain maximal performance improvement. Therefore, it is clear that these general -Ox settings are not optimal for each application. Worse, it is not even the case that -O3 out- performs -O2 or -O1 in all cases. When there are k compiler option, there exist 2 k differ- ent compiler settings. Hence, it is clear that an exhaustive search for the optimal compiler setting is infeasible. On the other hand, if a programmer would be able to reason about the effect of the different options, it might be possible to find a good setting without the need to explore these large opti- mization spaces. However, there are a number of problems to this approach. One would need to study the actual code of the compiler in detail to find out what is precisely hap- pening when a option is turned on. It can be the case that an option causes a negligible effect on the code but prevents another option that has a much better effect from being ap- plied. It is extremely complicated to understand all interac- tions between options, in particular when there are so many. Finally, on some applications two options may positively in- teract while on others they can interact negatively. Nevertheless, an application programmer needs to pro- duce optimal code given the collection of optimizations that happens to be available in his compiler. It is therefore im- portant that there exists a way to find the best compiler set- ting without the need of fully understanding how the com- piler tries to optimize the code or the need to fully explore the extremely large optimization spaces involved. This is particularly important for embedded systems where there is a strong need for fast code, in particular in the real time domain, coupled with short development times. Currently, there does not exist a systematic way of doing this [16]. In this paper, we provide a solution to this problem by employing statistical techniques that are borrowed from the framework of the Design of Experiments [3]. These tech- niques are capable of identifying the options that have the largest effect on the resulting code quality by inspecting only a small fraction of the optimization space. We then switch on those options that have a large positive effect and switch off those options with a large negative effect. Next, we iteratively employ this technique to the remaining op- tions. In this way, we reduce the size of the optimization space considerably by cutting out a number of dimensions. We systematically add more and more settings of options Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04) 1526-7539/04 $20.00 © 2004 IEEE

Upload: hag

Post on 09-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Statistical Selection of Compiler Options

R.P.J. Pinkers P.M.W. Knijnenburg M. Haneda H.A.G. WijshoffLIACS, Leiden University, The Netherlands

{rpinkers, peterk, haneda, harryw}@liacs.nl

Abstract

Compilers have many switches or optionsthat enable certain code optimizations. How-ever, it is well known that the optimal set of op-tions to be turned on is dependent on boththe application as well as the target architec-ture. In many cases, standard settings like -O3produce suboptimal results due to negative in-terference of some of the options they contain.In this paper, we propose an automatic itera-tive procedure to turn on or to turn off compileroptions. This procedure is based on Orthogo-nal Arrays that are used for a statistical analysisof profile information to calculate the main ef-fect of the options. We show that our approachoutperforms -O3 of GCC on six SPEC bench-marks.

1. Introduction

Ever since compilers were first invented, many code op-timizations have been proposed. In fact, many of the op-timizations that are present in modern compilers were al-ready present in the very first Fortran compiler [1]. Today,compilers incorporate dozens of optimizations that oftencan explicitly be enabled by the programmer using compilerswitches or options. Compilers generally have pre-definedsets of options that are enabled by -Ox switches. These setsof options have been coded into the compiler based on ex-perience and judgment of the compiler developer. Mostly,the higher the -O level, the more options are enabled. How-ever, interference between these options is not taken intoaccount. As is well known and shown in Section 2, each ap-plication requires its own specific setting of these optionsto obtain maximal performance improvement. Therefore, it

is clear that these general -Ox settings are not optimal foreach application. Worse, it is not even the case that -O3 out-performs -O2 or -O1 in all cases.

When there are k compiler option, there exist 2k differ-ent compiler settings. Hence, it is clear that an exhaustivesearch for the optimal compiler setting is infeasible. On theother hand, if a programmer would be able to reason aboutthe effect of the different options, it might be possible to finda good setting without the need to explore these large opti-mization spaces. However, there are a number of problemsto this approach. One would need to study the actual codeof the compiler in detail to find out what is precisely hap-pening when a option is turned on. It can be the case thatan option causes a negligible effect on the code but preventsanother option that has a much better effect from being ap-plied. It is extremely complicated to understand all interac-tions between options, in particular when there are so many.Finally, on some applications two options may positively in-teract while on others they can interact negatively.

Nevertheless, an application programmer needs to pro-duce optimal code given the collection of optimizations thathappens to be available in his compiler. It is therefore im-portant that there exists a way to find the best compiler set-ting without the need of fully understanding how the com-piler tries to optimize the code or the need to fully explorethe extremely large optimization spaces involved. This isparticularly important for embedded systems where there isa strong need for fast code, in particular in the real timedomain, coupled with short development times. Currently,there does not exist a systematic way of doing this [16].

In this paper, we provide a solution to this problem byemploying statistical techniques that are borrowed from theframework of the Design of Experiments [3]. These tech-niques are capable of identifying the options that have thelargest effect on the resulting code quality by inspectingonly a small fraction of the optimization space. We thenswitch on those options that have a large positive effect andswitch off those options with a large negative effect. Next,we iteratively employ this technique to the remaining op-tions. In this way, we reduce the size of the optimizationspace considerably by cutting out a number of dimensions.We systematically add more and more settings of options

Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04)

1526-7539/04 $20.00 © 2004 IEEE

-4

-2

0

2

4

6

8

10

compiler setting

imp

rove

men

t (%

)

gccOO2O3

Figure 1. Fragment optimization space gcc

to obtain an optimal total setting. This technique is (almost)fully automatic and requires (almost) no knowledge aboutthe compiler or the target architecture. It can be used on topof any compiler that has a collection of options for opti-mization. In order to show the applicability of this statisti-cal approach, we ran our technique on 6 SPEC benchmarksusing the GCC compiler. We show that our iterative tech-nique outperforms all -Ox levels in all cases considered.

This paper is structured as follows. In Section 2, we showthe optimization spaces for three SPEC95 programs and dis-cuss the effects of compiler options on the quality of theproduced code. In Section 3, we briefly discuss OrthogonalArrays and how they can be used to collect statistical infor-mation. In Section 4, we propose our interactive algorithmfor enabling options and in Section 5 we discuss our experi-mental framework. In Section 6, we show the results of run-ning our method. We discuss related work in Section 7 andfinally we give some concluding remarks in Section 8.

2. Motivation

In this section, we discuss our motivation for tuningcompiler backend optimization for different applications. InFigures 1 through 3, we show a fragment of the optimiza-tion space for three SPECint95 benchmarks when compiledwith GCC 2.6.3 and ran on the SimpleScalar platform. Thex-axis shows 512 different settings for the compiler optionsand the y-axis the improvement obtained over no optimiza-tion. The different settings are obtained from an OrthogonalArray which will be discussed in more detail in the next sec-tion. For the purpose of this section, it is sufficient to notethat these settings form a representative sample of the en-tire optimization space.

The optimization space for gcc, shown in Figure 1,shows a optimization space that we might expect to see:the setting -O gives an improvement of 4%, -O2 improves

0

2

4

6

8

10

12

14

16

compiler setting

imp

rove

men

t (%

)

ijpegOO2O3

Figure 2. Fragment optimization space ijpeg

-50

-40

-30

-20

-10

0

10

20

compiler setting

imp

rove

men

t (%

)

liOO2O3

Figure 3. Fragment optimization space li

5.5%, and -O3 improves the most, 8.6%. Only 2 settingsin this reduced optimization space outperform -O3 slightly.This suggests that -O3 is indeed an almost optimal setting.However, in Section 6 we show that our technique can finda setting that is better than the ones shown in this figure. Afew compiler settings give a slight degradation of the pro-gram and the total spectrum of settings reaches improve-ments ranging from −2.2% up to 8.7%, about the level ofimprovement reached by -O3.

The situation is different for ijpeg, as shown in Fig-ure 2. In this case, -O improves more than -O2 which im-proves more than -O3. It is also clear that there are manysettings that are significantly better than each -Ox setting.Our technique finds one of these settings, by explicitly turn-ing off some options that are enabled in -O3. This showsthat tuning compiler settings for a specific application canbe worthwhile.

Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04)

1526-7539/04 $20.00 © 2004 IEEE

The situation is even worse for li. In this case, -O2 ac-tually degrades performance over a non-optimized code by1%. The standard setting -O gives a slight improvement of1.64% and -O3 performs quite well, giving an improve-ment of 13.46%. For this version of GCC, -O3 equals -O2plus inline-functions [5]. For li, when the option inline-functions is switched on of its own, it severely degradesperformance by 25%, as we have shown in [13]. This iscaused by a sharp increase in the instruction cache missrate. -O2 also degrades performance due to an increasedI-cache miss rate. However, -O2 plus inline-functions givesa good improvement, mostly due to the fact that instructionscheduling works well after inlining. This shows that select-ing compiler settings is non-trivial. It should also be notedthat many settings actually degrade the performance of lisignificantly, up to almost 40%. This shows that selectingoptions must be a careful process.

The situation is different for SPECfp benchmarks [13].Exactly one option (instruction scheduling) gives almost allimprovement. Since -O2 and -O3 both contain this option,they perform well. For mgrid, slightly better improve-ments can be found than for -O2 and -O3.

We conclude that tuning compiler optimization optionsto the application being compiled can be worthwhile. Onlyrelying on predefined -Ox sets of options does not give op-timal results and even can actually degrade performance.

3. Orthogonal Arrays

Orthogonal Arrays have been proposed as an efficientDesign of Experiments [3, 9]. In an experiment, we delib-erately change one or more process variables (or factors)in order to observe the effect the changes have on one ormore response variables. In this section, we briefly discussthe notion of Orthogonal Arrays and how they can be usedto statistically analyze response variables to find main ef-fects. More information on Orthogonal Arrays and how toconstruct them can be found in [9]. A library of Orthogo-nal Arrays is given in [14].

3.1. General definition

If a process has k factors or options, the total optimiza-tion space contains 2k settings. This space is called a fullfactorial design. Obviously, such a space is generally toolarge to be exhaustively searched. Orthogonal Arrays are animportant approach to reduce the number of experiments tobe performed and still be able to compute statistically mean-ingful information for the different factors. Such a design ofexperiments is called a fractional factorial design.

Formally, an Orthogonal Array (OA) is an N × k ma-trix of zeros and ones.1 The columns of the array are in-

0 0 0 0 0 0 0 0 0 0 0 0 0 01 0 1 0 1 0 1 0 1 0 1 0 1 00 1 1 0 0 1 1 0 0 1 1 0 0 11 1 0 0 1 1 0 0 1 1 0 0 1 10 0 0 1 1 1 1 0 0 0 0 1 1 11 0 1 1 0 1 0 0 1 0 1 1 0 10 1 1 1 1 0 0 0 0 1 1 1 1 01 1 0 1 0 0 1 0 1 1 0 1 0 00 0 0 0 0 0 0 1 1 1 1 1 1 11 0 1 0 1 0 1 1 0 1 0 1 0 10 1 1 0 0 1 1 1 1 0 0 1 1 01 1 0 0 1 1 0 1 0 0 1 1 0 00 0 0 1 1 1 1 1 1 1 1 0 0 01 0 1 1 0 1 0 1 0 1 0 0 1 00 1 1 1 1 0 0 1 1 0 0 0 0 11 1 0 1 0 0 1 1 0 0 1 0 1 1

Figure 4. OA with 14 factors and 16 rows.

terpreted as the factors or options. The rows of the arrayare the settings of these factors for the experiments. In ourcase, columns correspond to compiler options and each rowis a particular compiler setting that can be used to optimizea program. An Orthogonal Array has the property that twoarbitrary columns contain the patterns

0 0 0 1 1 0 1 1

equally often. Such an OA has strength 2. In Figure 4 weshow an example of an Orthogonal Array of strength 2 with14 factors or columns and 16 rows taken from [14] andwhich is used below. In general, an OA has strength t if ev-ery collection of t columns contains all t-tuples of zeros andones equally often. OAs of strength 2 allow us to computethe main effects of each option. For t > 2, interaction ef-fects of �t/2� factors can be computed also. In this paper,we restrict attention to main effects. We plan to include in-teraction effects in future work.

3.2. Main effect of options

In order to define how main effects of options can becomputed using an OA, we need some notation. First, weview an OA A as a set of compiler settings, namely, the setof rows in the OA. We write s ∈ A for an arbitrary row (set-ting) from A. We denote the ith element in s by s i, that is,the value for the ith option in the setting s. Finally, sincewe are interested in the effect of compiler option on the ex-ecution time in cycles of a program, the response variableof our experiments is execution time of the program using aspecific setting s, denoted by T (s).

Since an OA does not contain all possible settings, ex-act main and interaction effects cannot be calculated. How-ever, by using a statistical model for the response variables,

1 This is called an Orthogonal Array of level 2. OAs of arbitrary level,that is, in which the entries of the OA can take more values than just0 and 1, can be defined also [9] but for the purposes of this paper it issufficient to restrict attention to OAs of level 2.

Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04)

1526-7539/04 $20.00 © 2004 IEEE

we can obtain accurate estimates of the effects [9]. By as-suming a standard model, a measure for the main effect ofan option Oi with respect to an OA A, denoted by EA(Oi),can be defined as the following sum of squares [9]

EA(Oi) =

(∑{s∈A:si=1} T (s)

)2

N/2

+

(∑{s∈A:si=0} T (s)

)2

N/2

−(∑

s∈AT (s)

)2

N

=

(∑{s∈A:si=1} T (s)−

∑{s∈A:si=0} T (s)

)2

N(1)

Note that the main effect of an option is calculated with re-spect to the entire OA: we look at execution times when theoption is switched on in an arbitrary context of other op-tions and when it is switched off in arbitrary contexts. If theOA were in fact a full factorial design, that is, the full setof 2k settings, then the main effect sums up all possibilitiesfor these contexts. In our case, a context in which an optionis switched on needs not be present as a context in whichthe option is turned off also. However, the OA has the prop-erty that

1. There are exactly the same number of rows that switchan option Oi on as there are rows which switch that op-tion off, namely N/2.

2. For an arbitrary other option Oj , in the set of rows inwhich Oi is switched on, there are exactly N/4 rowsthat switch Oj on and there are exactly N/4 rows thatswitch Oj off. Likewise for the set of rows that switchOi off.

This last property of an OA ensures that main effects can becalculated with high precision without needing to considerthe full factorial design.

However, since execution times are given in cycles, itis difficult to compare effects of options across differentbenchmarks. Therefore, we show effects as relative effectsREA(Oi), given by

REA(Oi) =E(Oi)∑k

j=1 E(Oj)· 100% (2)

The relative effect measures to what extend an options con-tributes to the entire change in execution time. Below wefocus on relative effects which we call ‘effects’ for simplic-ity.

3.3. Improvement of options

Since effects are expressed as squares in equation (1),the actual effect of an option can be both positive or nega-tive, that is, an option can improve or degrade the execution

time of a program. The notion of main effect given in equa-tion (1) simply expresses whether an option does or doesnot affect execution time and not whether it does so posi-tively or negatively. To distinguish between these two pos-sibilities, we define the improvement that an option Oi haswith respect to an OA A, denoted by IA(Oi), as follows

IA(Oi) =

∑{s∈A:si=0} T (s) − ∑

{s∈A:si=1} T (s)∑{s∈A:si=0} T (s)

(3)

This equation can be used to decide whether an option isbeneficial for performance or not.

3.4. Information from OAs

In the previous subsection, we can see that the effectsof options found by the procedure do not exactly match thereal value of their improvement due to the fact that only asmall subset of the entire space is considered. For exam-ple, the maximal improvement that can be reached in theexample is 27% when we switch on Options 1, 2, 3, and 4,and switch off Option 5. Option 1 improves 15% which is55.5% of the maximal improvement but we calculate an ef-fect of 73.5%. Hence the option with a large effect is ex-aggerated. On the other hand, the effects of options with alow effect are too low. This is caused by the definition ofthe measure EA using squares which pulls apart large andsmall effects. Furthermore, it can be the case that in a smallOA one option O is switched on together with an option O ′

that has a large effect, but there does not exist a row whereO is switched off and also O′ is switched on. Hence, op-tions with low effect tend to get ‘drowned’ by options withlarge effect. When the OA has many rows, this phenomenonis obviously not so profound. Nevertheless, all OAs we havestudied, having 16 to 512 rows, identify the same optionswith a large effect. We exploit this observation in the nextsection where we propose an iterative method to switch oncompiler options.

4. Iterative Search Algorithm

In this section we present our iterative algorithm, givenin Figure 5, for finding an optimal compiler setting for agiven application. The iterative algorithm first identifies op-tions with a large overall effect and switches them on if theycause an improvement. It then looks at the remaining op-tions to see what improvement they can produce given thepartial setting already constructed. Thus, the algorithm inFigure 5 starts at a high dimensional optimization space andsubsequently cuts down this space by fixing some dimen-sions and zooming in on the remaining options that have asmaller effect.

Note that we do not select each option that has a posi-tive effect but only those that have an effect larger than a

Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04)

1526-7539/04 $20.00 © 2004 IEEE

Algorithm

• Repeat:

– Compile the application with each row from A as compiler setting and execute the optimized application.

– Compute the relative effect of each option using Equations (1) and (2).

– If the effect of an option is larger than a threshold of 10%,

∗ if the option has a positive improvement according to Equation (3), switch the option on.

∗ else if it has a negative improvement, switch the option off.

– Construct a new OA A by dropping the columns corresponding to the options selected in the previousstep.

• until all options are set

Figure 5. Iterative search algorithm

O1 unroll-loops, strength-reduce O8 force-memrerun-cse-after-loop O9 force-addr

O2 fast-math O10 omit-frame-pointerO3 schedule-insns, schedule-insns2 O11 thread-jumpsO4 inline, inline-functions O12 expensive-optimizationsO5 cse-follow-jumps, cse-skip-blocks O13 caller-savesO6 float-store O14 peepholeO7 defer-pop

Figure 6. Factors for Orthogonal Array in Iterative Algorithm

threshold of 10%. The value of this threshold has been de-termined empirically. We use it since, due to the inaccuracyof small OAs, small measured effects can be distorted. Im-posing a threshold filters out this phenomenon.

In many cases, there is little improvement found after thefirst few iterations. For example, we show below that forturb3dwe only obtain negligible improvements of 0.01%after the first iteration. The algorithm could detect this sit-uation and stop the search, reducing the number of profilesconsiderably.

As a final remark, it is obvious that our procedure findsoptions that have a large positive effect on the execution ofa program. If an option has a small effect, this option is se-lected in the later iterations. At the same time, also optionsthat only have a positive effect when turned on together willlikely be selected. Suppose there are two option O and O ′

that have a small effect when turned on separately but a sig-nificant effect when turned on together. Their main effectsare small during the first few iterations. However, in later it-erations only those options are considered that have a smalleffect by themselves and now the relative effect of O andO′ becomes large. Since they cause an improvement, theywill be selected.

5. Experimental Setup

Since, for the purposes of our experiments, we alsowanted to measure cache miss rates, pipeline occupancy andother low level information, we decided to use a simula-tor for our platform. We have used the SimpleScalar simu-lator [2] since on this platform the compilation infrastruc-ture is stable enough to enable the setting of arbitrary se-quences of options. We have used the 4-issue, out-of-ordersuperscalar configuration using default architectural param-eters. The SimpleScalar toolset contains a version of GCC2.6.3 to generate code for the architecture. It contains 19compiler options which are mostly aimed at minimizing thenumber of executed instructions and at maximizing avail-able instruction level parallelism [5]. They are not explic-itly aimed at improving memory behavior. Therefore, theimprovements that we expect are modest.

We define 14 factors for our fractional factorial designin Figure 6. As one can see, factors O1, O3, O4, and O5

are put together from multiple compiler options that arestrongly related. For factor O1, if loop unrolling is turnedon, GCC always enables strength reduction and reruns com-mon subexpression elimination after loop unrolling is per-

Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04)

1526-7539/04 $20.00 © 2004 IEEE

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O11 O12 O13 O14

-O 0 0 0 0 0 0 1 0 0 1 1 0 0 0-O2 0 0 1 0 1 0 1 0 0 1 1 1 1 1-O3 0 0 1 1 1 0 1 0 0 1 1 1 1 1

Table 1. Options for standard -Ox settings.

formed [5]. Therefore, we cannot get an independent anal-ysis of strength-reduce and rerun-cse-after-loop if we useunroll-loops. We have created one factor that incorporatesall loop transformations present in this version of GCC. Theoptions in factors O3, O4, and O5 have also been groupedbecause they perform similar optimizations. Note that thisgrouping of compiler options is the only place where wehave used any (high level) knowledge about the compilerin our methodology. This information can immediately belearned from the on-line documentation [5]. The standard-Ox settings are given in Table 1.

We used the following three SPECint95 and threeSPECfp95 benchmarks: gcc, ijpeg, li, mgrid,turb3d, and apsi. These programs exhibit widely differ-ing behavior under optimization and form a good collectionfor the present study.

6. Results

In this section, we show which options are switched oneach iteration and we compare our improvements with theimprovements from standard settings.

We show by way of example the relative effects aftereach iteration for gcc in Figure 7. The other benchmarksshow similar behavior [13]. We only show the first four it-erations since for gcc, during the fifth iteration all execu-tion times were exactly the same and hence all effects zero.It is clear that main effects are quite pronounced. Note thatit is not the case that options with a large effect in a later it-eration already have pronounced effects in earlier iterations.For example, O4 has a large effect in iteration 3 for gcc, buta negligible effect in iteration 2. At the same time, some op-tions seem to have some effect in an early iteration, but losethis effect later. For example, for gcc, option O13 has someeffect in iteration 2, but almost no effect in iteration 3. Thisshows that small effects are not accurately measured in thepresence of large effects and that using a threshold is a goodsolution to this problem.

In Table 2 we show which options are turned on or offduring the different iterations for gcc. When it is not yetdecided whether or not to switch on an option, we indicatethis by a dash ‘-’ in the tables. Sometimes after 5 iterationssome options are still not decided. This means that the ef-fect of these options is too low to make a decision. Hence,we can safely turn them off without losing improvement.

We show the improvement after each iteration in Fig-ure 8. For the SPECint benchmarks, in each iteration anoticeable gain in improvement is obtained. The most im-portant option for the SPECint programs is the optionomit-frame-pointer. This option results in less instruc-tions and frees a register. Hence, programs with many dy-namic function calls clearly benefit from this. Other optionsgiving a significant extra improvement vary across the in-teger benchmarks. However, for the SPECfp benchmarks,things are different. Two of these benchmarks, particu-larly turb3d, are only improved by one option, namely,schedule insns as we have shown in [13]. The other op-tions have a small or even negligible effect on executiontime. In particular, fast math which gives most improve-ment for apsi has no effect for mgrid and turb3d. Ifwe would have stopped after the first iteration for turb3dand after the second for mgrid and set the remaining op-tions to 0, the overall result would have been the same.This shows that monitoring the progress of the proce-dure and stopping when no significant improvements arefound can reduce its running time considerably.

Next, we compare the results of our iterative procedurewith the standard settings -O, -O2, and -O3 in Figure 9. Asis immediately clear, our approach always outperforms thestandard settings. In some case, in particular for the SPECfpbenchmarks, we are only marginally better than the stan-dard settings (e.g., 26.91% vs. 26.89% for turb3d). Forthe SPECint benchmarks, however, the situation is differ-ent and we always outperform the standard settings signifi-cantly.

One of the reasons that we outperform -Ox is that ouralgorithm explicitly switches off some options while -Oxjust switches on more options for higher levels. For exam-ple, we have shown in [13] that the option peephole thattries to perform machine specific peephole optimizations iseither turned off (for li and ijpeg) or it has so low an ef-fect that it does not get selected at all. Interestingly, for li,it is turned off already in the first iteration, indicating that itcauses a significant performance degradation.

7. Related Work

A number of approaches to select best optimizationshave been proposed by searching the optimization space.Iterative compilation [10] searches for source level trans-

Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04)

1526-7539/04 $20.00 © 2004 IEEE

0

10

20

30

40

50

60

70

80

90

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O11 O12 O13 O14

Rel

ativ

e ef

fect

Iteration 1

0

5

10

15

20

25

30

35

40

45

O1 O2 O3 O4 O5 O6 O7 O8 O9 O11 O12 O13 O14

Rel

ativ

e ef

fect

(%

)

Iteration 2

0

5

10

15

20

25

30

35

40

45

O2 O4 O6 O7 O8 O9 O11 O12 O13 O14

Rel

ativ

e ef

fect

(%

)

Iteration 3

0

5

10

15

20

25

30

35

40

45

O2 O6 O7 O8 O12 O13 O14

Rel

ativ

e ef

fect

(%

)

Iteration 4

Figure 7. Relative effects for gcc after each iteration

Iter. O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O11 O12 O13 O14

1 - - - - - - - - - 1 - - - -2 1 - 1 - 1 - - - - 1 - - - -3 1 - 1 0 1 - - - 0 1 1 - - -4 1 - 1 0 1 1 - 1 0 1 1 1 1 -5 1 0 1 0 1 1 0 1 0 1 1 1 1 0

Table 2. Switching of options in different iterations for gcc

formations. [12] and [6] use genetic algorithms to find opti-mal optimizations. [11] and [15] use machine learning tech-niques to find compiler heuristics. In contrast to these ef-forts, our approach uses statistical analysis to systematicallyprune the search space and is focused on compiler switches.

Granston and Holler [8] propose a tool for automatic se-lection of compiler options, called Dr. Options. This tooluses information about the application supplied by the userand a set of tuning rules that have been created by interview-ing tuning experts and analyzing optimization experiments.However, this approach requires much knowledge about thecompiler and does not solve the problem of which transfor-mations to enable automatically.

Chow and Wu [4] approach the problem of determiningwhich options to set for a given application as a fractionalfactorial experiment based on aliasing or confounding [3].

0

5

10

15

20

25

30

gcc li ijpeg mgrid turb3d apsi

Benchmark

Imp

rove

men

t (%

)

Iter.1Iter. 2Iter. 3Iter. 4Iter. 5

Figure 8. Improvements after each iteration

Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04)

1526-7539/04 $20.00 © 2004 IEEE

-5

0

5

10

15

20

25

30

gcc li ijpeg mgrid turb3d apsi

Benchmark

Imp

rove

men

t (%

)

-O-O2-O3Iterative

Figure 9. Improvements for standard settingsand our iterative approach

In this approach, the value of certain options is defined as afunction of other options. However, we no longer can mea-sure the effect of an option alone, but we also measure inter-actions between the options it is aliased to. Since each aliasactually is a generator for a collection of other aliases [3],we are likely to end up with many derived aliases that ob-scure what is measured. The most important difference be-tween [4] and the present paper, however, is that Chow andWu use complex statistical analysis requiring many new ex-periments to resolve ambiguities and to find options withhigh main effects and interactions. In contrast, we proposea simple analysis to iteratively switch on more and more op-tions, zooming in from options having a large effect to op-tions having less effect.

8. Conclusion

In this paper we have proposed an automatic procedureto select compiler options for a given application based onstatistical analysis of profile information using OrthogonalArrays. This approach can be used on top of any compilerthat allows a collection of options to be set by the user. Itessentially consists of a simple driver that generates a smallnumber of settings and compiles and runs the applicationusing this setting. Statistical analysis of the obtained exe-cution times turns some options on or off. The remainingoptions are examined in a next iteration. In this way, dur-ing each iteration, accurate statistical information is usedfor making decisions. We have shown that our approach al-ways outperforms standard -Ox settings.

References

[1] A.V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles,Techniques, and Tools. Addison-Wesley, 1986.

[2] T. Austin, E. Larson, and D. Ernst. SimpleScalar: An in-frastructure for computer system modeling. IEEE Computer,35(2):59–67, 2002.

[3] G.E.P. Box, W.G. Hunter, and J.S. Hunter. Statistics forEperimenters. An Introduction to Design, Data Analysis, andModel Building. Wiley and Sons, 1978.

[4] K. Chow and Y. Wu. Feedback-directed selection and char-acterization of compiler optimizatons. In Proc. 2nd Work-shop on Feedback Directed Optimization, 1999.

[5] GNU Consortium. GCC online documentation.http://gcc.gnu.org/onlinedocs/.

[6] K.D. Cooper, P.J. Schielke, and D. Subramanian. Optimiz-ing for reduced code space using genetic algorithms. InProc. Languages, Compilers, and Tools for Embedded Sys-tems (LCTES), pages 1–9, 1999.

[7] G.G. Fursin, M.F.P. O’Boyle, and P.M.W. Knijnenburg.Evaluating iterative compilation. In Proc. Languages andCompilers for Parallel Computers (LCPC), pages 305–315,2002.

[8] E. Granston and A. Holler. Automatic recommendationof compiler options. In Proc. 4th Workshop on Feedback-Directed and Dynamic Optimization, 2001.

[9] A.S Hedayat, N.J.A Sloane, and J. Stufken. Orthogonal Ar-rays: Theory and Applications. Series in Statistics. SpringerVerlag, 1999.

[10] T. Kisuki, P.M.W. Knijnenburg, and M.F.P. O’Boyle. Com-bined selection of tile sizes and unroll factors using iterativecompilation. In Proc. PACT, pages 237–246, 2000.

[11] A. Monsifrot, F. Bodin, and R. Quiniou. A machine learningapproach to automatic production of compiler heuristics. InProc. AIMSA, LNCS 2443, pages 41–50, 2002.

[12] A. Nisbet. GAPS: Genetic algorithm optimised paralleliza-tion. In Proc. Workshop on Profile and Feedback DirectedCompilation, 1998.

[13] R.P.J. Pinkers. Analysis of compiler optimizations using or-thogonal arrays. Master’s thesis, LIACS, Leiden University,2004.

[14] N.J.A. Sloane. A library of orthogonal arrays.http://www.research.att.com/˜njas/oadir/.

[15] M. Stephenson, M. Martin, and U.M. O’Reilly. Meta opti-mization: Improving compiler heuristics with machine learn-ing. In Proc. PLDI, pages 77–90, 2003.

[16] M. Zhao, B. Childers, and M.L. Soffa. Predicting the impactof optimizations for embedded systems. In Proc. LCTES,pages 1–11, 2003.

Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04)

1526-7539/04 $20.00 © 2004 IEEE