optimizing compilers for modern architectures other applications of dependence allen and kennedy,...

47
Optimizing Compilers for Modern Architectures Other Applications of Dependence Allen and Kennedy, Chapter 12

Upload: ophelia-hopkins

Post on 31-Dec-2015

230 views

Category:

Documents


4 download

TRANSCRIPT

Optimizing Compilers for Modern Architectures

Other Applications of Dependence

Allen and Kennedy, Chapter 12

Optimizing Compilers for Modern Architectures

Overview

• So far, we’ve discussed dependence analysis in Fortran

• Dependence analysis can be applied to any language and translation context where arrays and loops are useful

• Application to C and C++

• Application to hardware design

Optimizing Compilers for Modern Architectures

Problems of C

• C as “typed assembly language” versus Fortran as “high performance language”—C focuses more on ease of use and hardware operations

– Post-increments, Pre-increments, Register variable—Fortran focus is on ease of optimization

Optimizing Compilers for Modern Architectures

Problems of C

• In many cases, optimization is not desiredwhile (!(t=*p));—Optimizers would moves p outside the loop

• C++ as well as other new languages focus more on simplified software development, at the expense of optimizability

• Use of new languages has expanded into areas where optimization is required

Optimizing Compilers for Modern Architectures

Problems of C

• Pointers— Memory locations accessed by pointers is not clear

• Aliasing— C does not guarantee that arrays passed into subroutine

do not overlap

• Side-effect operators— Operators such as pre and post increment encourage a

style where array operations are strength-reduced by the programmers

Optimizing Compilers for Modern Architectures

Problems of C

• Loops—Fortran loops provides values and restrictions to simplify

optimizations

Optimizing Compilers for Modern Architectures

Pointers

• Two fundamental problems—A pointer variable can point to different memory locations

during its use—A memory location can be accessed by more than one

pointer variable at any given time, produces aliases for the location

• Resulting in a much more difficult and expensive dependence testing

Optimizing Compilers for Modern Architectures

Pointers

• Without knowledge of all possible references of an array, compilers must assume dependence

• Analyzing entire program to find out dependence is solvable, but still unsatisfactory

• Lead to the use of compiler options / pragmas—Safe parameters

– All pointer parameters to a function point to independent storage

—Safe pointers– All pointer variables (parameter, local, global) point to

independent storage

Optimizing Compilers for Modern Architectures

Naming and Structures

• In Fortran, a block of storage can be uniquely identified by a single name

• Consider these constructs:p;

*p;

**p;

*(p+4);

*(&p+4);

Optimizing Compilers for Modern Architectures

Naming and Structures

• Troublesome structures, such as unions—Naming problem

– What is the name of ‘a.b’ ?—Different sized objects to overlap same storage

– Reduce references to the same common unit of smallest storage possible

Optimizing Compilers for Modern Architectures

Loops

• Lack of constraints in C—Jumping into loop body is permitted—Induction variable (if there’s any) can be modified in the

body of the loop—Loop increment value may also be changed—Conditions controlling the initiation, increment, and

termination of the loop have no constraints on their form

Optimizing Compilers for Modern Architectures

Loops

• Rewrite as a DO loop—It must have one induction variable—That variable must be initialized with the same value on all

paths into the loop—The variable must have one and only one increment within

the loop—The increment must be executed on every iteration—The termination condition must match —No jumps from outside of the loop body

Optimizing Compilers for Modern Architectures

Scoping and Statics

• Create unique symbols for variables with same name but different scopes

• Static variables—Which procedures have access to the variable can be

determined from the scope information—If it contains an address, then the content of that address

can be modified by any other procedures

Optimizing Compilers for Modern Architectures

Problematic C Dialects

• Use of pointers rather than arrays

• Use of side effect operators—Complicates the work of optimizers—Need to be removed

• Use of address and dereference operators

Optimizing Compilers for Modern Architectures

Problematic C Dialects

• Requires enhancements in some transformations—Constant propagation

– Treat address operators as constants and propagate them where is essential

– Replace generic pointer inside a dereference with actual address

—Expression simplification and recognition– Need stronger recognition within expression where

variable is actually the ‘base variable’

Optimizing Compilers for Modern Architectures

Problematic C Dialects

—Conversion into array references– Useful to convert pointers into array references

—Induction variable substitution– Problem with strength reduction of array references– Expanding side-effect operators also requires changes

Optimizing Compilers for Modern Architectures

C Miscellaneous

• Volatile variables—Functions with these variables are best left without

optimization

• Setjmp and Longjmp—Commonly used for error handling—Storing and loading current state of computation which is

complex when optimization is performed and variables are allocated to registers

—No optimization

Optimizing Compilers for Modern Architectures

C Miscellaneous

• Varags and stdargs—Variable number of arguments—No optimization

Optimizing Compilers for Modern Architectures

Hardware Design: Overview

• Today, most hardware design is language-based

• Textual description of hardware in languages that are similar to those to develop software

• Level of abstraction moving towards low level detailed implementation to high level behavioral specification

• Key factor: compiler technology

Optimizing Compilers for Modern Architectures

Hardware Design: Overview

• Four level of abstraction—Circuit / Physical level

– Diagrams of electronic components—Logic level

– Boolean equations—Register transfer level (RTL)

– Control state transitions and data transfers, timing– Synthesis: conversion from RTL to its implementation

—System level– Concentrate on behavior– Behavioral synthesis

Optimizing Compilers for Modern Architectures

Hardware Design

• Behavior Synthesis is really a compilation problem

• Two fundamental tasks—Verification—Implementation

• Simulation of hardware is slow

Optimizing Compilers for Modern Architectures

Hardware Description Languages

• Verilog and VHDL

• Extensions in Verilog—Multi-valued logic: 0, 1, x, z

– x = unknown state, z = conflict– E.g. division by zero produces x state– Operations with x will result in x state -> can’t be

executed directly—Reactivity

– Propagation of changes automatically– “always” statement -> continuous execution– “@” operator -> blocks execution until one of the

operands change in value

Optimizing Compilers for Modern Architectures

Verilog

—Reactivityalways @(b or c)

a = b + c;—Objects

– Specific area of silicon– Completely separate area on the chip

—Connectivity– Continuous passing of information– Input port and output port

Optimizing Compilers for Modern Architectures

Verilog

—Connectivitymodule add(a,b,c)

output a;

input b, c;

integer a, b, c;

always @(b or c)

a = b + c;

endmodule

Optimizing Compilers for Modern Architectures

Verilog

• Instantiation—Verilog only allows static instantiationinteger x, y, z;

add adder1(x,y,z);

• Vector operations—Viewing other data structures as vector of scalars

Optimizing Compilers for Modern Architectures

Verilog

• Advantages—No aliasing—Restriction of form of subscripts—Entire hardware design given to compilers at one time

Optimizing Compilers for Modern Architectures

Verilog

• Disadvantages—Non-procedural continuation semantics—Lack of loops

– Loops are implicitly represented by always blocks and the scheduler

—Size

Optimizing Compilers for Modern Architectures

Optimizing simulation

• Philosophy—Increases level of abstraction—Opts for less details

• Inlining modules—HDLs have two properties that make module inlining

simpler– Whole design is reachable at one time– Recursion is not permitted

Optimizing Compilers for Modern Architectures

Optimizing simulation

• Execution ordering—The order in which the statement is executed can have a

dramatic effect on the efficiency—Fast in hardware, but not in software—Grouping increases performance—Execute blocks in topological order based on the

dependence graph of individual array elements– No memory overhead

Optimizing Compilers for Modern Architectures

Dynamic versus Static Scheduling

• Dynamic scheduling—Dynamically track changes in values and propagate them—Mimics hardware—Overhead of change checks

• Static scheduling—Blindly sweeps through all values for all objects regardless

any changes—No need for change checks

Optimizing Compilers for Modern Architectures

Dynamic versus Static Scheduling

• If the circuit is highly active, static scheduling is more suitable

• In general, using dynamic scheduling guided by static analysis provides the best results

Optimizing Compilers for Modern Architectures

Fusing always blocks

• High cost of change checks motivates fusing always blocks

• Output of a design may change

Optimizing Compilers for Modern Architectures

Vectorizing always block

• Regrouping low level operations back together to bring higher lever abstractions

• Vectorizing the bit operations

Optimizing Compilers for Modern Architectures

Two state versus four state

• Extra overhead in four state hardware

• Few people like hardware that enters unknown states

• Two state logic can be 3-5x faster

• Utilization of two valued logic where ever possible

• Finding out part executable in two state logic is difficult

• Use interprocedural analysis

Optimizing Compilers for Modern Architectures

Two state versus four state

• Test for detecting unknown is low cost, 2-3 instructions

• Check for unknowns but default quickly to two state execution

Optimizing Compilers for Modern Architectures

Rewriting block conditionsalways @(posedge(clk)) begin

sum = op1 ^ op2 ^ c_in;

c_out = (op1 & op2) | (op2 &

c_in) | (c_in & op1)

end

always @(op1 or op2 or c_in) begin

t_sum = op1 ^ op2 ^ c_in;

t_c_out = (op1 & op2) | …

end

always @(posedge(clk)) begin

sum = t_sum;

c_out = t_c_out;

End

Optimizing Compilers for Modern Architectures

Basic Optimizations

• Raise level of abstraction

• Constant propagation and dead code elimination

• Common subexpression elimination

Optimizing Compilers for Modern Architectures

Synthesis Optimization

• Goal is to insert the details

• Analogous to standard compilers

• Harder than standard compilers—Not targeted towards a fixed target—No single goal. Minimize cycle time, area, power

consumption

Optimizing Compilers for Modern Architectures

Basic Framework

• Selection outweigh scheduling

• Analogous to CISC

• Body of tree matching algorithms

• Needs constraints

Optimizing Compilers for Modern Architectures

Loop Transformationsfor(i=0; i<100;i++) {

t[i] = 0;

for(j=0; j<3; j++)

t[i] = t[i] + (a[i-j]>>2);

}

for(i=0; i<100; i++) {

o[i] = 0;

for(j=0; j<100; j++)

o[i] = o[i] +m[i][j] * t[j]

}

Optimizing Compilers for Modern Architectures

Loop Transformations

for(i=o; i<100; i++)

t[i] = 0;

for(i=0; i<100; i++)

o[i] = 0;

for(i=0; i<100; i++)

for(j=0; j<3; j++)

t[i] = t[i] + (a[i-j] >> 2)

for(i=0; i<100; i++)

for(j=0; j<100; j++)

o[i] = o[i] + m[i][j] * t[j];

Optimizing Compilers for Modern Architectures

Loop Transformations

for(i=0; i<100; i++)

o[i] = 0;

for(i=0; i<100; i++)

t[i] = 0;

for(j=0; j<3; j++)

t[i] = t[i] + (a[i-j] >> 2);

for(j=0; j<100; j++)

o[j] = o[j] + m[j][i] * t[i];

Optimizing Compilers for Modern Architectures

Loop Transformationfor(i=0; i<100; i++) {

o[i] = 0;

a0 = a[0];

a1 = a[-1];

a2 = a[-2];

a3 = a[-3];

for(i=0; i<100; i++) {

t = 0;

t = t + (a0>>2) + (a1>>2) + (a2>>2) + (a3>>2)

a3 = a2; a2 = a1; a1 = a0; a0 = a[i+1];

for(j=0; j<100; j++)

o[j] = o[j] + m[j][I] * t;

}

}

Optimizing Compilers for Modern Architectures

Control and Data Flow

• Von Neumann architecture—Data movement among memory and registers—Control flow encapsulated in the program counter and

effected with branches

• Synthesized hardware—Data movement among functional units—Control flow is which functional unit should be active on

what data at which time steps

Optimizing Compilers for Modern Architectures

Control and Data Flow

• Wires—Immediate transfer

• Latches—Values hold throughout one clock cycle

• Registers—Static variables in c—Held in one or more clock cycle

• Memories

Optimizing Compilers for Modern Architectures

Memory Reduction

• Memory access is slow compared to unit access

• Application of techniques —Loop interchange—Loop fusion—Scalar replacement—Strip mining—Unroll and jam—Prefetching

Optimizing Compilers for Modern Architectures

Summary

• Not limited to Fortran

• Have other applications

• Early stage of research