compilers as collaborators and competitors of high-level specification systems

31
Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign

Upload: amela-harrell

Post on 30-Dec-2015

20 views

Category:

Documents


0 download

DESCRIPTION

Compilers as Collaborators and Competitors of High-Level Specification Systems. David Padua University of Illinois at Urbana-Champaign. Towards a Synthesis. There is much interaction and overlap between compilers and code generation from very high level specifications. - PowerPoint PPT Presentation

TRANSCRIPT

Compilers as Collaborators and Competitors of High-Level Specification Systems

David Padua

University of Illinois at Urbana-Champaign

Towards a Synthesis

There is much interaction and overlap between compilers and code generation from very high level specifications.

Both technologies could merge into “supercompiler” technology. Thesis, antithesis synthesis

Higher Levels of Abstraction…

One of the main goals of Software Research is to facilitate program development.

Raise the level of abstraction. What rather than how. Subroutines – Control abstraction Data abstraction mechanisms

… Higher Levels of Abstraction

Programming is simplified by using macro operations from a catalog.

Modules (subroutines/classes/…) Part of the language (Fortran 90,

MATLAB, SETL) Standard libraries

Hand–written Automatically generated

Application specific (usually hand written)

Performance and Abstraction

In many cases the main mechanism to attain high performance is to develop high-performance library routines. For example, MATLAB programming style is to use

functions as much as possible. This approach does not always work. Real

applications make little use of pre-existing libraries. One reason: Data structures are not always in the right

format. Another: The overhead associated with class accesses.

For this reason, with current technology, Higher-level => Lower performance

Automatic Generation of Modules from Specifications…

Several systems aim at generating the fastest possible routines for certain classes of computations Relatively simple (algorithms) Very high performance implementation can be

tedious and time consuming. Examples of these systems include

ATLAS FFTW Spiral

… Automatic Generation of Modules from Specifications

Other systems try to simplify the generation of complete applications. Although performance is also a concern, language design and correctness are the most important issues. Ellpack GPSS Many CAD systems

ATLAS

Generate several versions of BLAS routines Different tile sizes Different degrees of unrolling Loop ordering is fixed

Run all and choose the fastest

FFTW

Recursive divide-and-conquer Plan: factorization tree Factorization stop at certain sizes Execution: call codelets

Codelet Subroutines for small-size FFTs Optimized and fully-unrolled Generated by a dedicated compiler

Adapt to environment at run-time Dynamic programming

F1024

F128

F16

F8

F8

Frs= (Ir Fs)L(FrIs)T

SPIRAL

Formula Generator

SPL Compiler

Performance Evaluation

Search Engine

DSP Transform

Target Architecture DSP Libraries

SPL Formulae

C/FORTRAN Programs

Supercompilers …

Integration of Very High Level Specifications with Conventional Languages

Besides conventional subroutines selected from a catalog), the languages accepted by supercompilers would also call “macros” which could be used to generate code as a function of the Target machine Value of data Structure of data Shape of data Rest of the program Numerical properties

… Supercompilers …

Macros could be subroutines or class methods. Expanding classes could include data representation selection (including data distribution) SETL Automatic Dense Sparse techniques Automatic data distribution techniques

… Supercompilers

In theory at least, generating code from specifications rather than from specific HLL implementations should lead to better performance.

All the benefits of abstraction without the performance penalty.

Vectorizers and High Level Specifications

do i=1,na(i)=b(i)+c(i)d(i)=a(i)+d(i-1)if (m > d(i)) m=d(i)

end do

do i=1,na(i)=b(i)+c(i)

end dodo i=1,n

d(i)=a(i)+d(i-1)end dodo i=1,n

if (m > d(i)) m=d(i)end do

a(1:n)=b(1:n)+c(1:n)d(1:n) = lin-rec(a,d,1,n)m=min(m,d(1:n)

Back End Compilers and Supercompilers …

Back End Compilers take care of Machine code generation Register allocation Conventional optimizations

But not really trusted by today’s module generation systems (Competitors) The existence of ATLAS is just an indictment of current

compiler technology. FFTW does clustering to improve register allocation. Spiral does a variety of conventional optimizations.

Optimizations in Spiral

SPL Compiler

C/Fortran Compiler

Formula Generator* High-level scheduling* Loop transformation

* High-level optimizations- Constant folding- Copy propagation- CSE- Dead code elimination

* Low-level optimizations- Instruction scheduling- Register allocation

Basic Optimizations (FFT, N=25, SPARC, f77 –fast –O5)

Basic Optimizations(FFT, N=25, PII, g77 –O6 –malign-double)

Basic Optimizations(FFT, N=25, MIPS, f77 –O3)

Can Module Generators Rely on Back End Compilers ?

Not always, but using backend compilers will always be necessary for portability (Collaborators).

But … Compilers can hinder efforts to get good performance. For example, bad register allocation can have a

serious negative impact. Need a standard set of commands to control

transformations applied by compiler

… Back End Compilers and Supercompilers

In Supercompilers transformations should be done by the Back End whenever possible.

Reason: Applies to all parts of the program not only to very high-level components.

Search …

Search is an important component of module generators.

Also used by conventional compilers, but compilers usually work with static predictions rather than actual execution times. KAP tried all possible loop permutations. SGI-PRO tries many combinations of unrolling of

unrolling. Superoptimizer and similar systems. Most compiler optimization algorithms are

heuristics with no search involved.

… Search …

In Supercompilers search could also be done across several algorithms looking for a good data representation and data distribution for the whole program.

… Search …

Search strategy could make use of actual execution times combined with static performance prediction Static prediction not very accurate today. Tight performance bounds to prune the search. Some decisions could be made at run-time

IF statements/multiversion loops JIT compilers

… Search

Some search could be based on data dependent behavior Profiling “Representative” data set

Search strategy is important given that space of possibilities is often large and not monotonic. And it is difficult to know how far the search process is from the optimum. Need to develop tight bounds.

Size of Search SpaceN # of formulas N # of formulas

21 1 29 20,793

22 1 210 103,049

23 3 211 518,859

24 11 212 2,646,723

25 45 213 13,649,969

26 197 214 71,039,373

27 903 215 372,693,519

28 4279 216 1,968,801,519

Coverage

Need a class of specifications large enough to represent most of the computation.

Effectiveness of approach will depend on coverage. Current libraries are a good start. But … it is not clear how much these libraries

typically cover. To impact programming in general current

approaches would have to be extended to other domains such as sparse computations, sorting, searching. …

Conclusions

As we understand better algorithm choices and their impact in performance it becomes feasible to automate much of the process of selecting data structures and algorithms to maximize performance.

A first step: a repository of routines/classes with several implementations for each subroutine.

But generation based on context could lead to better performance.

In particular generation from very high-level specifications could allow the generation of code combining several operations in ways that is impossible to conceive with current encapsulation mechanisms.