crosscutting issues: the rôle of compilers architects must be aware of current compiler technology...

25

Upload: corey-atkins

Post on 13-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Crosscutting Issues: The Rôle of Compilers

• Architects must be aware of current compiler technology

Compiler

Architecture

Modern Compilers

Front End

High-level Optimisations

Global Optimiser

Code Generator

E.g. procedure inlining,loop transformations

Register allocation

Machine dependentoptimisations

Compiler Technology

• Multiple passes complicate matters– E.g. common subexpression elimination must

assume that a register will be allocated for the temporary value

– E.g. Procedure inlining before size is known

• Register allocation is critical– Uses graph colouring techniques– Requires at least 16 registers to be effective

Architectural Issues

• How are variables allocated and addressed?– Stack: local variables, scalars– Global data area: global variables, constants,

arrays– Heap: dynamic objects, not scalars

• How many registers are needed?– Integer: 26 registers– FP: 20 registers

Aiding Compiler Writers

• Architectures should:– Be regular (orthogonal instruction set)– Provide primitives, not solutions– Simplify trade-offs among alternatives– Not require run-time interpretation of data

known at compile-time• VAX CALLS

Keep it simple!

Compiler Support for Multimedia Instructions

• SIMD instructions act on multiple smaller data items in a large “word”– Solutions, not primitives!– Too few registers!– Data types not found in programming

languages!

Result: Only used by low-level graphics libraries.

Multimedia Instructions

• These SIMD instructions act like a “mini-vector” architecture– E.g. MMX in 64 bits

• 8 × 8-bit vectors

• 4 × 16-bit vectors

• 2 × 32-bit vectors

– SSE: 128 bits– Much more limited than genuine vector

processors

Putting It All Together: MIPS• 64-bit load/store design

• RISC features:– GPR, load-store architecture– Small, simple instruction set– Designed for efficient pipelining (fixed length

instructions)– Efficient compiler target

MIPS

• 32 64-bit integer registers– R0…R31– R0 fixed: 0

• 32 64-bit or 32-bit floating point registers– Supports “paired single” operations

MIPS Data Types

• Integer:– Bytes, 16-bit halfwords, 32-bit words, 64-bit

double words• Operations are all 64-bit

• Floating point:– 32-bit and 64-bit

MIPS Addressing Modes

• Only immediate and displacement– 16-bit displacements/immediates– Register-indirect: set displacement = 0– 16-bit absolute: use R0

• Byte addressable with 64-bit addresses

• Big-endian or little-endian

• Alignment required

MIPS Instructions

• Three instruction formats:

opcode rs rt immediate

6 5 5 16

I-type

opcode offset

6 26

J-type

opcode rs rt shamt

6 5 5 5

R-type rd

5

funct

6

MIPS Operations• Load-store• ALU operations

– Add, subtract, multiply, divide, and, or, xor, LUI (load upper immediate), shifts

• Control transfer– Set conditions– Branch (reg=0, reg0, reg1=reg2, reg1reg2),

jump, jump-and-link (call)– Conditional move

• Floating point– Paired single operations– Multiply-add (DSP)

MIPS: Instruction Usage

• Integer applications:– Load, add, branch, store,

or, compare

• FP applications:– Add (int), load (int), load,

multiply, add, store

Figure 2.34.

Another View: Trimedia Media Processor

• Embedded processor for multimedia applications– E.g. set-top boxes (decoders, etc.) and TVs

• Very different architecture– 128 32-bit registers (FP or int)– Partitioned (SIMD) instructions– 2’s complement and saturating arithmetic– VLIW architecture

Trimedia: VLIW Approach

• Compiler can group up to five instructions for simultaneous execution– Must be independent– Use NOPs if there are insufficient independent

instructions• Large program size

• Trimedia uses memory compression

• Programs are 2-3 times larger than MIPS (even with compression)!

Fallacies and Pitfalls

• Pitfall: Designing a “high-level” instruction set to support HLL’s– Seldom provide an exact match– Often too general (VAX CALLS)

Fallacies and Pitfalls

• Fallacy: There is such a thing as a typical program– Programs vary very significantly

• Pitfall: Designing an architecture to reduce code size without considering compilers– Compilers have much greater impact on code

size– Start with densest compiled code

Fallacies and Pitfalls

• Pitfall: Expecting good compiled performance for DSPs– Hand-tuned assembler is faster and more

compact

• Fallacy: An architecture without flaws cannot be successful– 80x86!

• Segments, accumulators, stack-based FP

Fallacies and Pitfalls

• Fallacy: You can design a flawless architecture– All designs have trade-offs

• VAX code size more important than easy decoding

• Early RISCs: delayed branches

• Address space

2.15. Concluding Remarks

• 1960’s: Stack architectures– Matched the compiler technology of the day

• 1970’s: CISC era– Tried to support HLL features in hardware

• Today: RISC era– Simple, load-store architectures

Concluding Remarks

• Trends in the 1990’s:– Move to 64 bits– Conditional instructions

• Eliminating branches

– Optimisation of cache access (prefetch instructions)

– Support for multimedia– Faster floating point

The Future

• Trend towards VLIW architectures

• Increased use of conditional execution

• Blending of general-purpose and DSP architectures

• Emulating 80x86 architecture