Download - NCI Report: Zephyr
NCI Report: Zephyr
PLDI NCI TutorialPLDI NCI Tutorial
University of VirginiaUniversity of Virginia
Princeton UniversityPrinceton University
6/16/2000 PLDI NCI Tutorial 2
Zephyr Goals
• Goal– Deliver high-quality, language-
neutral tools for rapidly constructing compilers for experimental computing systems research
• How– Provide specification languages and
processors to automatically generate key compiler components•Don’t write code, write specifications!
6/16/2000 PLDI NCI Tutorial 3
Zephyr Compilers
EDG C++Java
MachSUIF
SUIF-to-VPOBridge
VPO
lccEDG C++
Alpha
SUIF
Sparc MIPS X86Alpha X86
In terprocedura lanalysis
Para lle lizationand loca lity
optsO bject-oriented
optsScheduling
RegisterA llocation
Instruction se lectionRegister a llocation
Code motionM emory access
coalescingInduction variab le
e lim inationCSE
Loop unro llingIn lin ing
SUIF Zephyr
6/16/2000 PLDI NCI Tutorial 4
Zephyr Building Blocks
• ASDL: Abstract Syntax Description Language
• VPO: Very Portable Optimizer• CSDL: Computer System
Description Language
6/16/2000 PLDI NCI Tutorial 5
ASDL: Abstract Syntax Description Language
Parser
Lexer
Toke
ns
ASTSemanticAnalysis
AS
T
Translate IR OPT1
....
IR
IR OPTn
IR
CodeGen
AST IR
GlueGenerator
GlueDescription
6/16/2000 PLDI NCI Tutorial 6
ASDL
• ASDL makes it easy to communicate complex recursive data structures
• ASDL and its tools provide – Concise descriptions of tree-like
structures, including ASTs and compiler (IRs)
– Automatic generation of data structure implementations and pickling functions for C, C++, Java, Standard ML, and Haskell.
– Graphical browsing and editing of data structures on disk.
6/16/2000 PLDI NCI Tutorial 7
ASDL
• For more information about ASDL see:– Give reference here– Give URL here
6/16/2000 PLDI NCI Tutorial 8
VPO: Very Portable Optimizer
• VPO is a retargetable optimizer that operates on a low-level, machine-independent representation called RTLs (register transfer lists)
• VPO is retargeted by providing a machine description (MD) of the target machine, and revising a few machine-dependent routines
• VPO is small, easily extended, and extremely effective
6/16/2000 PLDI NCI Tutorial 9
History Lesson
• PO developed in 1981– Pioneered use of RTLs– Demonstrated ability to
do optimizations on low-level representation
• Development split in 1982– gcc development
• Richard Stallman and Len Tower
– VPO development• Many people at Uva
and a few industrial labs
P O
V P O gcc
6/16/2000 PLDI NCI Tutorial 10
Register Transfer Lists• Based on Bell and Newell's ISP
notation• Machine-independent
representation of a machine-dependent operation
• Algorithms that manipulate RTLs are machine-independent
6/16/2000 PLDI NCI Tutorial 11
Register Transfer Lists• While assembly language notations
may very, RTLs are very similar across architectures
ExampleRTL Machineadd %o1,%o2,%o2 SPARCaddu $10,$10,$9 MIPSar 10,9 IBM
in RTL each operation would be representedr[10] = r[10] + r[9];
6/16/2000 PLDI NCI Tutorial 12
RTLs
• The form of RTLs are fixed• dst = src ; dst = src ; dst = src …
– The individual register transfers are performed in parallel
– Example• r[1] = r[1] + r[2] ; NZ = r[1] + r[2] ? 0
– VPO provides machine-independent primitives for operating on and manipulating RTLs• Obtain the sources and destinations• Obtain the memory locations read and written• Obtain the type of instruction (arithmetic,
branch, control transfer, etc.)
6/16/2000 PLDI NCI Tutorial 13
RTLs
• Think of RTL as a machine-independent assembly language– For a machine X, each RTLx describes
an instruction in X’s instruction set (may be a synthetic instruction)
– RTLx should specify• instruction’s input and outputs• the transformation the instruction
makes on the machine state– VPO uses this information to
compute a dataflow graph
6/16/2000 PLDI NCI Tutorial 14
Compilation with VPO
SourceCode
Front andMiddle Ends
VPO Mach MachineCode
RTL
You supply the front end and a simple code generator, we supply an optimizing back end
6/16/2000 PLDI NCI Tutorial 15
Generating RTLX
• Translate IL ops to semantically equivalent sequences of instructions for the target machine– Generate RTL representation of
instructions, not assembly language– Do not worry about code quality
• Perform naïve, straightforward translation• Expose all computations (even effective
address computations) to VPO• Use virtual or pseudo registers for temporaries• VPO handles activation record and data
placement
6/16/2000 PLDI NCI Tutorial 16
Generating RTLx
The C codeK = I + 1;
= <int,32>
ADDR K<local,32>
+ <int,32>
@ <int,32>
ADDR I<local,32>
CON 1<int,32>
IL SPARC RTLADDR int K r[33]=r[14]+K.;ADDR int I r[34]=r[14]+I.;@ int r[35]=M[r[34]]; r[34]CON int 1 r[36]=1;+ int r[37]=r[35]+r[36]; r[35]:r[36]= int M[r[33]]=r[37]; r[33]:r[37]
6/16/2000 PLDI NCI Tutorial 17
VPO design rationale• All "traditional" optimizations performed
at the machine-level on a single representation—RTL– most optimizations are machine-dependent– better code is produced– instruction selection can be performed on
demand– avoids phase ordering problems– simplifies implementation of optimizations– easier to accommodate emerging
architectures– "plug and play" structure
6/16/2000 PLDI NCI Tutorial 18
RTLs in VPO
• VPO optimization algorithm– repeat
apply code-improving transformationuntil fixed-point reached or exhausted registers
• Maintaining two invariants– Semantic invariant (S)
• Observable behavior of program unchanged (according to RTL semantics)
– Machine invariant (M)• Every RTL equivalent to one machine instruction
6/16/2000 PLDI NCI Tutorial 19
VPO code improvements
• Each code-improving transformation is– machine-level, but– machine-independent
• Any semantics-preserving transformation is OK
• Preserve machine invariant (M) using machine description;– for each new RTL produced, ask MD if OK– if any is not target machine instruction,
roll back transformation
6/16/2000 PLDI NCI Tutorial 20
Code improvement catalog
• Register assignment and allocation
• Common subexpression elimination
• Induction variable elimination
• Code motion• Constant propagation• Copy propagation• Memory access
coalescing
• Recurrence detection
• Instruction scheduling
• Dead code elimination
• Constant folding• Loop unrolling• Branch minimization• Evaluation order
determination
6/16/2000 PLDI NCI Tutorial 21
VPO Optimizations
• Common subexpression elimination•Davidson, J. W. and Fraser, C. W.,
‘Eliminating Redundant Object Code,’ in Conference Record of the Ninth Annual ACM Symposium on Principles of Programming Languages, January 1982, pp. 128–132.
• Evaluation Order Determination•Davidson, J. W. , ‘A Retargetable Instruction
Reorganizer’, in Proceedings of the SIGPLAN ‘86 Symposium on Compiler Construction, 21(7), June 1986, pp. 23–241.
6/16/2000 PLDI NCI Tutorial 22
VPO Optimizations
• Link-time optimization• Benitez, M. E. and Davidson, J. W., ‘A Portable
Global Optimizer and Linker’, in Proceedings of the SIGPLAN ‘88 Symposium on Programming Language Design and Implementation, June 1988, pp. 329—338.
• Memory access coalescing• Davidson, J. W. and Jinturkar, S., ‘Memory
Access Coalescing: A Technique for Eliminating Redundant Memory Accesses’, in Proceedings of the SIGPLAN ‘94 Symposium on Programming Language Design and Implementation, Orlando, FL, June 1994, pp. 186— 195.
6/16/2000 PLDI NCI Tutorial 23
VPO Optimizations
• Code Motion• Benitez, M. E. and Davidson, J. W., ‘The
Advantages of Machine-Dependent Global Optimization’, in Proceedings of the 1994 Conference on Programming Languages and Systems Architectures, Zurich, Switzerland, March 1994, pp. 105–124.
• Loop Unrolling• Jinturkar, S. and Davidson, J. W., ‘Improving
Instruction-level Parallelism by Loop Unrolling and Dynamic Memory Disambiguation’, in Proceedings of the 28th Annual IEEE/ACM International Symposium on Microarchitecture, Ann Arbor, MI, November 1995, pp. 125–132.
6/16/2000 PLDI NCI Tutorial 24
VPO Optimizations
• Branch mininization•F. Mueller and D. B. Whalley, ‘Avoiding
Conditional Branches by Code Replication’ in Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, June 1995, pages 56-66.
•M. Yang, G. Uh, and D. Whalley, ‘Improving Performance by Branch Reordering’ in Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, June 1998, pages 130-141.
6/16/2000 PLDI NCI Tutorial 25
VPO Optimizations
• Recurrence detection and optimization
•Benitez, M. E. and Davidson, J. W., ‘Code Generation for Streaming: an Access/Execute Mechanism’, in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991, pp. 132–141.
6/16/2000 PLDI NCI Tutorial 26
Building VPO
VPOGenerator
Eval. Order Determ.
ZIFLow Analysis &Transformation Libraries
VPOMIPS
CSDLSPARCSpecification
NewTransformation
CSDLMIPSSpecification
CSDLALPHASpecification
CSDLi486Specification
Register Allocation
Access Coalescing
Comm. Subexpr. Elim.
Eval. Order Determ.
Induction Var. Elim.
Instruction Scheduling
Code Motion
SSA Computation
6/16/2000 PLDI NCI Tutorial 27
CSDL: Computing System Description Language
• Computing System Description Language– Modular system of components– Allows applications to customize a
description– Easily extensible for adding new
details– Reusable/application independent
6/16/2000 PLDI NCI Tutorial 28
CSDL
CallingConvention
(CCL)
MemorySystem
Description(MSDL)
PipelineDescription(PLUNGE)
CSDL Core
InstructionRepresentation
(SLED)
Object-fileFormat
MemorySystem(MSDL)
CallingConvention
(CCL)PipelineDescription(PLUNGE)Pipeline
(PLUNGE)
InstructionSemantics
(l -RTL)
6/16/2000 PLDI NCI Tutorial 29
Zephyr Compilers
• EDGSUIF-to-VPO Compiler– Five targets (SPARC, Pentium, Alpha,
MIPS, SimpleScalar)
TargetMachine
Code
EDG Front EndSourceCode ...
SUIF Pass 1 VPOSPARC
SUIF Passes
SUIF-to-LIRALIRA-to-SPARC
RTLSPARC
6/16/2000 PLDI NCI Tutorial 30
Zephyr Compilers
• EDG-to-VPO C++ compiler– Funded by Edison Design group– Targeted to SPARC only– Compiles all benchmark suites (SPEC,
PGI, lcc)– Code generator (translator from EDG
intermediate representation to RTLs) provided as a literate program
6/16/2000 PLDI NCI Tutorial 31
Zephyr Compilers
• lcc-to-VPO C compiler– Targeted to SPARC, X86, MIPS, ALPHA,
and SimpleScalar– Code generators (translators from LIRA
to target-machine RTLs) provided as literate programs
– Currently producing good code, some optimizations are not fully implemented/debugged
6/16/2000 PLDI NCI Tutorial 32
SPEC results for SPARC
Benchmark Gcc –O Lcc vpolcc go 13.4 6.45 11.0 M88ksim 5.70 4.98 6.2 li 8.98 5.93 7.48 Compress 11.6 9.0 9.28 Ijpeg 8.79 5.54 8.6 Perl 12.3 9.2 10.2 Vortex 10.7 8.27 11.2
6/16/2000 PLDI NCI Tutorial 33
Acknowledgements
• This work has been funded by:– Defense Advanced Research Projects
Agency– National Science Foundation– Panasonic AVC Labs– Edison Design Group
6/16/2000 PLDI NCI Tutorial 34
Afternoon Schedule
Time Talk
1:30-2:00 ASDL: Dan Wang
2:00-2:55 Using Zephyr for PL Research: Kevin Scott The VPO Code Generation Interfaces LIRA: The lcc intermediate representation SUIF-to-LIRA
2:55-3:15 Using Zephyr for Architecture Research: Jason Hiser and Chris Milner Introduction Handling a target machine’s calling convention
3:15-3:30 Break
6/16/2000 PLDI NCI Tutorial 35
Afternoon Schedule
Time Talk
3:30-4:30 Using Zephyr for Architecture Research (continued): Jason Hiser and Chris Milner Writing a VPO machine description (md.y) Writing a VPO register specification (regs.rt) EASE: Environment for Architecture Study and Evaluation Case Study: Targeting SimpleScalar
4:30-5:20 Using Zephyr for Optimization Research: Jack Davidson Introduction to VPO’s optimization structure Adding a new optimization to VPO
6/16/2000 PLDI NCI Tutorial 36
Afternoon Schedule
Time Talk
5:20-5:40 Zephyr support tools: Raja Venkateswaran VET: Observing and debugging VPO VPOISO: Isolating optimization errors
5:40-6:00 Wrap up and Open Discussion
Using Zephyr for Programming Language
ResearchKevin Scott
University of Virginia
6/16/2000 PLDI NCI Tutorial 38
Overview
• Zephyr organization and philosophy• VPO code generation interfaces• Adding a new front-end to Zephyr:
– Using the Lira intermediate representation
– With a custom code expander using the VPO code generation interfaces
• Language related issues in retargeting Zephyr
• Q & A
6/16/2000 PLDI NCI Tutorial 39
What is Zephyr?
• Set of tools for generating and optimizing RTL programs– VPO (Very Portable Optimizer)
• SPARC, Alpha, x86, MIPS, SimpleScalar (PISA)
– Code Expanders• Turn a front-end’s IR into RTLs
– Glue for hooking front-ends up to VPO• VPO code generation interfaces• Lira IR
– Debugging tools• VET – interface for controlling and visualizing
VPO transformations• vpoiso – isolates optimizer bugs
6/16/2000 PLDI NCI Tutorial 40
National Compiler Infrastructure
SML/NJ EDG C++ Ada95DEC
FORTRANJava
MachSUIF
SUIF-to-VPOBridge
VPO
lccEDG C++IBM C++
VisualAge
Alpha
SUIF
Sparc MIPS X86Alpha X86
Interproceduralanalysis
Parallelizationand locality
optsObject-oriented
optsScheduling
RegisterAllocation
Instruction selectionRegister allocation
Code motionMemory access
coalescingInduction variable
eliminationCSE
Loop unrollingInlining
SUIFInfrastructure
ZephyrInfrastructure
Optional Item
6/16/2000 PLDI NCI Tutorial 41
Why use Zephyr?
• You’re a language researcher– Easy to hook a front-end up to VPO– Relatively little effort required to get
multiple targets– VPO is a very good optimizer
•Wide range of existing operations•Leverage work of others contributing new
optimizations to VPO– Let’s you concentrate on front-end
issues– Less work than writing a VPO-quality
optimizer yourself
6/16/2000 PLDI NCI Tutorial 42
Front Ends
Zephyr Organization
lccEDG SUIF
SPARC MIPS
Alpha x86
Lira code expanders
VPO
EDG code expanders
SPARC
VPOi and VPOasm
VPCC
SPARC
x86
CVM code expanders
MIPS
6/16/2000 PLDI NCI Tutorial 43
Four Front Ends
• VPCC – A K&R C compiler– IR is code for a C virtual machine (CVM)– Deprecated in favor of lcc front-end
• EDG – Edison Design Group C/C++– Very flexible IR
• Lcc – Retargetable C compiler– Simple backend emits Lira, an IR based on
lcc trees
• SUIF 2.1– High level optimizations and analyses– suif2lira pass transforms SUIF IR into Lira
6/16/2000 PLDI NCI Tutorial 44
Code Expanders
• CVM Code Expanders– SPARC, x86, MIPS– Generate encoded RTL files directly –
don’t use VPOi or VPOasm
• EDG Code Expanders– SPARC– First expander to use VPOi and
VPOasm interfaces
6/16/2000 PLDI NCI Tutorial 45
Lira Code Expanders
• Targets– SPARC– X86– Alpha– MIPS32– MIPS64 and SimpleScalar (PISA)
• Input Lira code specialized for target• Output encoded RTLs for VPO• All use the VPOi and VPOasm
interfaces
6/16/2000 PLDI NCI Tutorial 46
VPOi
• VPOi provides a C interface for:– Creating RTLs– Sending RTLs to VPO for optimization
• Abstracts away specifics of:– RTL representation– How RTLs are sent to VPO
• RTL creation routines can be semi-automatically generated from a machine specification
6/16/2000 PLDI NCI Tutorial 47
VPOasm
• VPOasm provides a C interface for sending assembly language statements to VPO.
• Allows a code expander to:– Change segments– Define symbols– Initialize storage locations– Specify alignments for code or data
6/16/2000 PLDI NCI Tutorial 48
More on VPOi and VPOasm
• Why use these interfaces?– Simpler than writing out VPO encoded RTL
files manually.– Can get some of the implementation for
free if doing a new target architecture.– Allows us to change RTL and assembly
language representations w/o fouling you up. Much.
• Reference manual for VPOi and VPOasm:– http://www.cs.virginia.edu/zephyr/vpoi
6/16/2000 PLDI NCI Tutorial 49
VPOi and VPOasm caveats
• Interfaces are written in C.– Bad if you’re writing a code expander in
languages with no mechanism for calling C functions.
• Interfaces are relatively rigid.– Suppose you want to communicate
something to the optimizer that doesn’t look like an RTL or assembly language.
• Interfaces have only been tested on C/C++ front ends.– Might have to change to accommodate new
language features…
6/16/2000 PLDI NCI Tutorial 50
Lira
• Simple IR based on lcc trees• Targets a stack-oriented virtual
machine• Two types of entities in a Lira file:
– Instructions– Directives
6/16/2000 PLDI NCI Tutorial 51
Lira Instructions
• Instruction is composed of:– Operator (33)
– Type• F (float), I (signed integer), U (unsigned integer),
P (pointer), V (void), B (aggregate)
– Size• 1, 2, 4, 8, …
– Auxiliary info
CALLGEMODADDCVF
ARGEQLSHNEGBCOM
NEASGNDIVINDIRCNST
LABELLTSUBBXORCVUADDRL
JUMPLERSHBORCVPADDRG
RETGTMULBANDCVIADDRF
6/16/2000 PLDI NCI Tutorial 52
Lira Instruction Example
• C Fragmentint a;
a = a + 10;
• Lira Translation
ADDRGP4 “a”
INDIRI4
CNSTI4 10
ADDI4
ADDRGP “a”
ASGNI4
6/16/2000 PLDI NCI Tutorial 53
Lira Directives
• Change program segments with:– code, data, bss, lit
• Specify alignment with:– align
• Control symbol visibility with:– import, export
• Initialize storage locations with:– bytes, string, address, skip
6/16/2000 PLDI NCI Tutorial 54
Lira Directives (cont)
• Indicate procedure boundaries with:– proc, endproc
• Describe procedure locals and parameters with:– local, param
• Describe source coordinates with:– file, line
6/16/2000 PLDI NCI Tutorial 55
Lira Directive Example
• Reserving storage for a global int “a”-bss-export a-align 4+LABELI4 “a”-skip 4
6/16/2000 PLDI NCI Tutorial 56
The truth about Lira
• Lira can be emitted from lcc using a postorder walk of lcc trees. Almost.
• Typical case:ADDI4
INDIRI4
ADDRGP4 “a”
CNSTI4 10
ADDRGP4 “a”
INDIRI4
CNSTI4 10
ADDI4
6/16/2000 PLDI NCI Tutorial 57
The truth about Lira (cont)
• Sometimes, we don’t do a postorder traversal:
ADDI4
INDIRI4
ADDRGP4 “a”
CNSTI4 10
ADDRGP4 “a”
INDIRI4
CNSTI4 10
ADDI4
ADDRGP “a”
ASGNI4
ADDRGP4 “a”
INDIRI4
6/16/2000 PLDI NCI Tutorial 58
The truth about Lira (cont)
• A Lira program is specialized to the compilation target.– Types, sizes and alignments are
target specific– Front-end must generate appropriate
target dependent code for accessing the components of aggregates (arrays and structs)
6/16/2000 PLDI NCI Tutorial 59
Lira Code Expander
• Structured for simplicity.• Code is generated by a big switch
statement.• Two passes made over the input.
– First gather symbol information.– Second generates code.
• SPARC expander is about 1800 lines of C. Close of ½ of the code is machine independent or easily reused on new targets.
6/16/2000 PLDI NCI Tutorial 60
Retargeting Lira code expander
• Three big tasks:– Modify dumptree to map Lira ops
onto RTLs for the new target. Easiest of the three since there is substantial opportunity for cut & paste coding.
– Modify sp_call to emit target dependent RTLs. On the SPARC we emit the following when the caller returns a struct:VPOi_rtl(ST(tmp_loc, sp_plus(r[14], SP_OFS-4)),
VPOi_locSetBuild(tmp_loc, 0));
6/16/2000 PLDI NCI Tutorial 61
Retargeting Lira code expander
• Modify setup_frame to:– Use right offsets for parameters and
locals.– Emit RTLs to do target dependent
frame setup on procedure entry. For procedures returning a struct on the SPARC, we emit:
VPOi_rtl(LD(sp_plus(r[30], SP_OFS-4),tmpreg), 0);
locaddr = sp_plus_ra(r[30], locals.t[0].sym, 0);
VPOi_rtl(ST(tmpreg, Rtl_fetch(locaddr, 32)),
VPOi_locSetBuild(locaddr, tmpreg, 0));
6/16/2000 PLDI NCI Tutorial 62
Why use Lira?
• Lira is a pretty good intermediate language for C-like languages. (Thanks to Chris Fraser and Dave Hanson!)– Abstracts away specifics of a target’s calling
sequence! Left to code expander to implement.
• Separating Lira from lcc means that we can reuse the Lira code expanders for front-ends other than lcc. E.g., SUIF.
• Very easy to write a Lira code expander.
6/16/2000 PLDI NCI Tutorial 63
Lira References
• “A Retargetable C Compiler: Design and Implementation”
• Lcc version 4.1 code generation interfaces– http://www.cs.princeton.edu/software/lcc/pkg/doc/4.
html
• More on the way…
6/16/2000 PLDI NCI Tutorial 64
Adding a front-end to Zephyr
• Is your language C-like? – If yes then consider writing code to
map your IR onto Lira. This gets you all of Lira’s targets almost for free.
– If no then you might need to write a code expander for each target you want to support.
6/16/2000 PLDI NCI Tutorial 65
Adding a front-end to Zephyr
• Is my target already supported?– If yes then you’re golden.– If no then you may have to do one or
more of the following:•Create VPOi and VPOasm interfaces for
your target. This can be partially automated.
•Write a Lira code expander for the new target, or
•Write a custom code expander for the new target.
•Port VPO to the new target.
6/16/2000 PLDI NCI Tutorial 66
Adding a front-end using Lira
• Difficulty depends on your IR.– Trivial for lcc – almost same IR!– Pretty easy for SUIF. E.g.
void Translator::trans(BinaryExpression exp) { int lira_op;
translate(exp->get_source1()); translate(exp->get_source2()); switch(op_map(exp->get_opcode())) {
case SOP_add: lira_op = LIRA_ADD; break;...
} emitter->emit(lira_op, lira_map_ty(exp->get_result_type());}
6/16/2000 PLDI NCI Tutorial 67
Where can I find out more?
• Should be releasing suif2lira as a literate program around July 1.– Good starting point for someone
familiar with SUIF wanting to hook up a front-end with Lira.
• Literate source for SPARC and x86 Lira code expanders will be available immediately after PLDI.
6/16/2000 PLDI NCI Tutorial 68
Adding a front-end using a custom code expander
• Difficulty again depends on your IR.
• Refer to EDG SPARC code expander:– http://www.cs.virginia.edu/zephyr/dist/edg-sparc-1.0.pdf
6/16/2000 PLDI NCI Tutorial 69
Language issues in retargeting Zephyr
• Calling convention– In addition to emitting RTLs to
properly handle language calling conventions on function calls and function entry, also need to consider fixentry in VPO.
– fixentry finalizes a procedure’s prologue after optimization is complete.
– More in next talk.
Using Zephyr for Architecture Research
Jason Hiser and Chris Milner
University of Virginia
A Brief Introduction to Zephyr and Architectural
ResearchJason Hiser
University of Virginia
6/16/2000 PLDI NCI Tutorial 72
Roadmap
• Handling a machine’s calling convention– Jason
• Break– Coffee!
• Writing a VPO machine description and Writing a VPO register description– Chris Milner
• Case Study: Targeting SimpleScalar– Jason
Handling a Machine’s Calling Conventionfixentry fun (regs.c)
Jason HiserUniversity of Virginia
6/16/2000 PLDI NCI Tutorial 74
Introduction To regs.c
• Fixentry: The main routine of regs.c – Responsibilities of fixentry
• Parameters, external and global data used in fixentry
• Other functions: regarg, initmap, map, transfer, leaf
6/16/2000 PLDI NCI Tutorial 75
Responsibilities of Fixentry
• Calculate stack space needed – outgoing parameters, spill locations,
local variables, saved registers, and incoming parameters
• Emit function prologue – Adjust stack pointer– save return address, and saved
registers– add RTLs for local equates
6/16/2000 PLDI NCI Tutorial 76
Fixentry Responsibilities (continued)
• Create and maintain a “mapping” from the registers used to the actual hardware registers
• Save/restore necessary registers and incoming parameters to stack
• Emit function epilogue (including code to restore saved registers)
6/16/2000 PLDI NCI Tutorial 77
Not the responsibility of Fixentry
• Perform any optimization• Insert spill code• Make decisions about register
usability• Emit assembly code for any
instructions• Setup registers/stack for making
a function call• Allocate global data
6/16/2000 PLDI NCI Tutorial 78
Extern Variables (Where fixentry gets its data)
• struct bblock *top List of basic blocks in current function
• struct locuse *locs local variables and parameters
• int isused[MAXREGS] which registers are used and which
aren’t• int varargs is this a variable
argument function?
6/16/2000 PLDI NCI Tutorial 79
Parameters to Fixentry
• struct list *ptr the RTLs in the current function
• struct blist *retb the basic blocks that need epilogue code
6/16/2000 PLDI NCI Tutorial 80
Global Variables
• int gpregmap[] The “mapping” of the general purpose registers
• int fpregmap[] The “mapping” of the float registers
• int spilloff Information to the code emitter
about where to place spill variables
6/16/2000 PLDI NCI Tutorial 81
Calculating Stack Space
• Loop through RTLs and find out how much space is needed for outgoing params
• Loop through temps and calculate spill space needed
• Loop through locals and calculate local space needed
6/16/2000 PLDI NCI Tutorial 82
Calculating Stack Space (cont.)
• Loop through registers and find out which ones need to be saved
• Determine space needed for incoming parameters (register params only)
6/16/2000 PLDI NCI Tutorial 83
Emitting Prologue and Epilogue
• Prologue– Emit code to adjust stack pointer– Emit code to spill return address and
saved regs
• Epilogue– For each exit block
•Restore spilled registers•Restore stack pointer• Jump to return address
6/16/2000 PLDI NCI Tutorial 84
Register Map
• Register allocator determines what variables are in which register– Fixentry needs to put these variables
in the proper register.
• Fixentry attempts to map registers so no movements are necessary, overriding the allocator assignment policy– If it can’t, register to register moves
are necessary
6/16/2000 PLDI NCI Tutorial 85
Other Functions of regs.c
• regarg Boolean function returns true if a local variable is an argument, and enters the
function in a register• initmap Initializes the gpregmap
and fpregmap• map Returns the mapping for a
register
6/16/2000 PLDI NCI Tutorial 86
Other Functions of regs.c(continued)
• transfer Creates a transfer RTL from two machine
locations (memory, register, or spill)
• leaf Boolean function determines if a function is a leaf
6/16/2000 PLDI NCI Tutorial 87
Summary
• Fixentry is the main portion of regs.c
• Fixentry is responsible for – function prologue– function epilogue – register mapping to avoid register to
register moves
• Regs.c also contains a few functions to let other areas know about the mapping.
Using Zephyr for Architecture Research
(continued)
Jason Hiser and Chris Milner
University of Virginia
Writing a VPOMachine Specification
Chris MilnerUniversity of Virginia
6/16/2000 PLDI NCI Tutorial 90
Outline of talk
• Structure of VPO• Machine descriptions• How to construct the descriptions• Getting machine dependent
information for machine independent transformations– combiner– loop (and other) transformations– scheduler
• EASE
6/16/2000 PLDI NCI Tutorial 91
Structure of VPO
C Code
ma
chin
e in
de
pe
nd
en
t so
urc
e
C CodeCSE
C Codestrength
reduction
C Codedead codeelimination
...
C Codesimp.c
Registerdescription
reg.rt
C Codertl.c
machine dependent source
Instructiondescription
md.y
InstructionProcessor
yyfast
C Codesched.c
machineindependent
combiner()
loop_strength()
machinedependent
inst_is_legal()
is_basic()
VPO optimizer
C Code
C Compiler
Pipelinedescription
pipe.pg
RegisterProcessor
regtool
PipelineProcessor(real soon now)
C Code
6/16/2000 PLDI NCI Tutorial 92
VPO
• “Machine independent” transformations on low level “machine dependent” intermediate form (register transfer lists)
• Retargeted portion assists in:– recognizing legal RTLs– converting and inserting RTLs to
assist transformations– picking apart RTLs to get information
6/16/2000 PLDI NCI Tutorial 93
Role of Machine Descriptions
• md.y - legal instructions– maintains VPO invariant– YACC grammars
• regs.rt - register file– register types– alignment– size– ABI
6/16/2000 PLDI NCI Tutorial 94
md.y
• RTL recognizer– Workhorse– RTLs come from combiner (at compile
time)– ours are not usual table driven ones
but directly executable (yyfast)
• How do you do it?– Work from existing ones (derive
Alpha from MIPS); or, – construct one anew
6/16/2000 PLDI NCI Tutorial 95
Sample machine
• Subset SIMPLESCALAR– e.g. student project on FPGA– load/store– chars, half words and words– constants must be loaded into
registers– add, and, not, sll, sra, srl– branch on less than, branch on
equal,jump, call, return
6/16/2000 PLDI NCI Tutorial 96
Constructing md.y (continued)
• Operands - registers%token REG0 REG1 REG2
(scanner converts ‘b’‘[‘‘1’’]’ to REG0)
reg: REG0
| REG1
| REG2
6/16/2000 PLDI NCI Tutorial 97
Constructing md.y (continued)
• Operands - memory%token BMEM WMEM RMEM (scanner converts ‘B’‘[‘ to BMEM )
mem: BMEM reg ‘]’
| WMEM reg ‘]’
| RMEM reg ‘]’
6/16/2000 PLDI NCI Tutorial 98
Constructing md.y (continued)
• Operands - misc%token PC RT ST (used for call and return)
%token LOCAL GLOBAL CON LBL
expr: LOCAL
| GLOBAL
| CON
| LBL
6/16/2000 PLDI NCI Tutorial 99
Constructing md.y (continued)
• Operations%left ‘=‘ ‘+’ ‘&’ ‘”’ ‘{‘ ‘}’
%nonassoc ‘~’ ‘,’
rhs : reg ‘+’ reg
| reg ‘&’ reg
| reg ‘{‘ reg
| reg ‘}’ reg
| reg ‘”’ reg
6/16/2000 PLDI NCI Tutorial 100
Constructing md.y (continued)
• Binary operationsbinops: reg ‘=‘ rhs
• Unary operationnot: reg ‘=‘ ‘~’ rhs
6/16/2000 PLDI NCI Tutorial 101
Constructing md.y (continued)
• Load, load immediate and storel : reg ‘=‘ mem
li: reg ‘=‘ expr
s : mem ‘=‘ reg
si: expr ‘=‘ reg (FORTRAN)
6/16/2000 PLDI NCI Tutorial 102
Constructing md.y (continued)
• Branchbb: PC ‘=‘ reg ‘:’ reg
| PC ‘=‘ reg ‘<‘ reg • jump call and returnjmp: PC ‘=‘ reg
jal: ST ‘=‘ expr
ret: PC ‘=‘ RT
6/16/2000 PLDI NCI Tutorial 103
Constructing md.y (continued)
• All instructionsinst: bb | jmp | jal | ret
| binst | not
| l | li | s
• Now, we need some glue and some checking
6/16/2000 PLDI NCI Tutorial 104
Glue for parser
• Build up semantic records• Found in isem.c
– addr() - record for addressing modereg: REG0 {$$=addr(BYTE,BREGISTER…)}
– memref() - record for memory access– brecord() - record for binary op– rrecord() - record for relational op– same() - ensure records are same
6/16/2000 PLDI NCI Tutorial 105
Semantic routines
• inst.c– each instruction or instruction class
has a routine– routine checks for legal operands– is responsible for emitting legal asm– e.g. bb() -
•on MIPS check the semantics for compare and branch
• right hand operand immediate, use immediate form of instruction
• records instruction type
6/16/2000 PLDI NCI Tutorial 106
Structure of VPO(again)
C Code
ma
chin
e in
de
pe
nd
en
t so
urc
e
C CodeCSE
C Codestrength
reduction
C Codedead codeelimination
...
C Codesimp.c
Registerdescription
reg.rt
C Codertl.c
machine dependent source
Instructiondescription
md.y
InstructionProcessor
yyfast
C Codesched.c
machineindependent
combiner()
loop_strength()
machinedependent
inst_is_legal()
is_basic()
VPO optimizer
C Code
C Compiler
Pipelinedescription
pipe.pg
RegisterProcessor
regtool
PipelineProcessor(real soon now)
C Code
6/16/2000 PLDI NCI Tutorial 107
regs.rt
• TYPES– basic types of registers on the
machine– byte,half,word,float,double– BTREG, WTREG, RTREG, FTREG,
DTREG
• CODES– condition codes – IC,FC,etc.
6/16/2000 PLDI NCI Tutorial 108
regs.rt(continued)
• CLASS – general_purpose, float, spill– number – scratch – reserve
6/16/2000 PLDI NCI Tutorial 109
regs.rt(continued)
• CLASS (continued) – type
•alignment (even-odd register pairs)•size - how many to allocate•invariant - mark as invariant for loops
– e.g. fp and sp•memchar, regchar - give it a different name
•stack, fifo - tells the allocator about them
6/16/2000 PLDI NCI Tutorial 110
regs.rt for MIPS
types BTREG, WTREG, RTREG, FTREG, DTREG
codes FC
class = general_purpose
number = 32
scratch = 2..15, 24, 25
reserve = 0, 1, 26, 27, 28, 29, 31
(notes: MIPS - reg 0 is zero, reg 1 is asm reg,reg 26,27 are used by os, reg 28 is gp,reg 29 is sp, reg 31 is return address)
6/16/2000 PLDI NCI Tutorial 111
regs.rt for MIPS (continued)
type = RTREG
alignment = 1
size = 1
invariant = 28, 29
endtype
type = BTREG, WTREG
alignment = 1
size = 1
endtype
6/16/2000 PLDI NCI Tutorial 112
regs.rt for MIPS (continued)
class = floating_point
number = 16
scratch = 0..9
type = FTREG, DTREG
alignment = 1
size = 1
endtype
endclass
6/16/2000 PLDI NCI Tutorial 113
regs.rt for MIPS (continued)
class = SPILL
number = 32
type = BTREG, WTREG, RTREG, FTREG
alignment = 1
size = 1
endtype
type = DTREG
alignment = 2
size = 2
endtype
endclass
6/16/2000 PLDI NCI Tutorial 114
Structure of VPO(again)
C Code
ma
chin
e in
de
pe
nd
en
t so
urc
e
C CodeCSE
C Codestrength
reduction
C Codedead codeelimination
...
C Codesimp.c
Registerdescription
reg.rt
C Codertl.c
machine dependent source
Instructiondescription
md.y
InstructionProcessor
yyfast
C Codesched.c
machineindependent
combiner()
loop_strength()
machinedependent
inst_is_legal()
is_basic()
VPO optimizer
C Code
C Compiler
Pipelinedescription
pipe.pg
RegisterProcessor
regtool
PipelineProcessor(real soon now)
C Code
6/16/2000 PLDI NCI Tutorial 115
Other files
• simp.c - helps the combiner• sched.c - machine specific
portion of scheduling
• rtl.c - routines to find machine idioms in
transformations
6/16/2000 PLDI NCI Tutorial 116
simp.c
• Combine RTLs in machine dependent way
• e.g. SPARC 1 r[35]=~r[35]
2 {1} r[33]=r[33]&r[35]
combines tor[33]=r[33]&~r[35]
semantically ok, but not an instructioncomp() makes machine idiom substitution
r[33]=r[33] ANDNOT r[35]
6/16/2000 PLDI NCI Tutorial 117
simp.c(continued)
• e.g. SPARC constants 4095 is biggest immediate1 r[40]=4095
2 {1} r[41]=r[40]+13
combines and folds tor[41]=4108
comp() converts to r[41]=HI[4108]
r[41]=r[41]|LO[4108]
6/16/2000 PLDI NCI Tutorial 118
rtl.c
• Manipulate– reverse() - reverse a branch– don’t_bother_with() - tell cse to ignore
• Predicates– is_call(), is_rjmp(), ismem(), writes_mem()
– is_pc(),
• Pick apart– findlabel(), usetype()
6/16/2000 PLDI NCI Tutorial 119
rtl.c(continued)
• Insert code to help transformations– store(), load()– multconst()
•add series of shifts and adds
– locsub() - substitute reg for mem•SPARC has sign extend on load•no single sign extend move•have to insert shifts to do sign extend
6/16/2000 PLDI NCI Tutorial 120
rtl.c(continued)
r[1] = 0
r[9] = r[14] + a
L32:
r[8] = r[1]*4
R[r[8]+r[9]]=0
r[1]=r[1]+1
IC=r[1]?100
PC=IC<0,L32
• regular induction variable• induced expression• basic induction variable
•Assist loop strength reduction•might be one instruction or several
6/16/2000 PLDI NCI Tutorial 121
sched.c
• SPARC - yes, MIPS - no• Scheduler uses mostly machine
independent list scheduling algorithm
• keeps machine specific dependencies straight
• helps avoid hazards
6/16/2000 PLDI NCI Tutorial 122
sched.c(continued)
• md_sets_uses– what an instruction does– what an instruction is blocked by– reads can slide past read, not past
writesrtl->does |= READS
rtl->blocks |= WRITES
– writes cannot slide past anythingrtl->does |= WRITES
rtl->blocks |= WRITES | READS
6/16/2000 PLDI NCI Tutorial 123
sched.c(continued)
• md_sets_uses– condition code users can’t slide past
one another rtl->does |= ICWRITES
rtl->blocks |= ICWRITES | ICREAD
and rtl->does |= ICREADS
rtl->blocks |= ICWRITES | ICREAD
– calls are treated conservatively•assume codes, floats and memory written
6/16/2000 PLDI NCI Tutorial 124
sched.c(continued)
• sched_adv()– relative advantage or disadvantage
of scheduling this instructions next– relative to last instruction scheduled– e.g. SPARC
•space out float instructions•avoid consecutive stores•make consecutive instructions
independent
6/16/2000 PLDI NCI Tutorial 125
EASE
• EASE: Environment for Architecture Study and Experimentation– VPO includes a facility for obtaining
•Measurements of instruction usage• Instruction cache traces•Data cache traces•precise timing
– VPO provides facilities for emulating architectures•Can extend existing architectures
6/16/2000 PLDI NCI Tutorial 126
EASE(continued)
• Use control-flow graph to insert instrumentation code
• Low overhead (10 to 15%)
• Cache traces generated on the fly (no need to store)
Bump Counter
Bump Counter
BasicBlocks
6/16/2000 PLDI NCI Tutorial 127
EASE(continued)
• Emulation of new architecture features– Add new
instructions to machine description
– Generate code and optimize as if new features exist
– In last step of VPO, emit code to emulate new features
r [ 3] = r [ 3] + r [ 2]
r [ 5] = r [ 5] + ( r [ 3] * r [ 2] )
add r2, r3, r3
mul r3, r2, r1add r1, r5, r5
VPOMachLast Step
VPOMachLast Step
Case Study: Targeting SimpleScalar
Jason HiserUniversity of Virginia
6/16/2000 PLDI NCI Tutorial 129
Introduction
• What is SimpleScalar? Why use it?
• Why use VPO with SimpleScalar?– SimpleScalar comes with gcc, why
not use that?
• Experiences in porting VPO to SimpleScalar
• Research with SimpleScalar and VPO
6/16/2000 PLDI NCI Tutorial 130
What is SimpleScalar?
• SimpleScalar is a functional simulator designed for use with architectural research– sim-safe -- a simple, fast simulator– sim-bpred -- measures branch
predictor statistics– sim-cache -- measures cache
statistics– sim-outorder -- models a multi-issue,
out of order superscalar processor
6/16/2000 PLDI NCI Tutorial 131
Why Use SimpleScalar?
• Easy to model many common architectural features.– hybrid branch predictors,arbitrarily many
functional units, much more
• Extendible instruction set -- PISA– Allows any instruction to be “annotated”
•easy to create new instructions or add fields to old ones
• Comes with GNU tools for SimpleScalar– gcc, gas, gld, glibc, etc.
6/16/2000 PLDI NCI Tutorial 132
Why VPO and SimpleScalar?(Why not use gcc?)
• gcc does not generate instruction annotations
• difficult to write new optimizations to take advantage of new instructions
• just building gcc can be a challenge
6/16/2000 PLDI NCI Tutorial 133
Why VPO and SimpleScalar?(continued)
• Easily build VPO on any machine you can build SimpleScalar
• Describe new instructions in machine description and optimizer will automatically use them when beneficial
• New optimizations can consult the machine description to see if architectural support is available– allows portability of optimizations
6/16/2000 PLDI NCI Tutorial 134
Experiences with Porting VPO to SimpleScalar
• PISA is basically MIPS– changes to some instruction formats– dmfc1 appears to be broken, negu not
available, branch if (not) equal to zero instructions don’t exist
• Change instruction format in inst.c• When compiling for SimpleScalar
tell the machine description that negu, beqz, bneqz and dmfc1 are not available
6/16/2000 PLDI NCI Tutorial 135
Research with SimpleScalar and VPO at UVa
• Idea– Compiler managed on-chip memory can
provide performance and power benefits
• Framework– Add instructions to move data to/from
on-chip memory from/to registers• to VPO (in md.y, inst.c)• to SimpleScalar (machine.def)
– Add optimization to promote variables from cache to on-chip memory
6/16/2000 PLDI NCI Tutorial 136
Summary
• SimpleScalar is a versatile functional simulator
• Porting VPO isn’t difficult– SimpleScalar target soon to be
included with VPO
• VPO and SimpleScalar make a great vehicle for architectural research
Using Zephyr for Optimization Research
Jack DavidsonUniversity of Virginia
6/16/2000 PLDI NCI Tutorial 138
VPO Logical Structure
VPOGenerator
Eval. Order Determ.
ZIFLow Analysis &Transformation Libraries
VPOMIPS
CSDLSPARCSpecification
NewTransformation
CSDLMIPSSpecification
CSDLALPHASpecification
CSDLi486Specification
Register Allocation
Access Coalescing
Comm. Subexpr. Elim.
Eval. Order Determ.
Induction Var. Elim.
Instruction Scheduling
Code Motion
SSA Computation
6/16/2000 PLDI NCI Tutorial 139
Actual Structure
VPO
lib SPARC MIPS X86 ALPHA
6/16/2000 PLDI NCI Tutorial 140
VPO Program Representation
TOP
BASIC BLOCK
BASIC BLOCK
i
BASIC BLOCK
i
LIST (RTL struct)
LIST
LIST
RTLCOSTINST TYPEUSESSETSDEF/USE
PREDSIDOMSDOMNEST LVLUSESDEFSOUTSPHIREGSTATE
6/16/2000 PLDI NCI Tutorial 141
VPO Optimizations
• Review vpo.h
6/16/2000 PLDI NCI Tutorial 142
VPO Optimization Algorithm
repeatapply code-improving
transformationuntil fixed-point reached or exhausted registers
• Maintaining two invariants– Semantic invariant (S)
• Observable behavior of program unchanged (according to RTL semantics)
– Machine invariant (M)• Every RTL equivalent to one machine instruction
6/16/2000 PLDI NCI Tutorial 143
VPO code optimization
• Each code-improving transformation is– machine-level, but– machine-independent
• Any semantics-preserving transformation is OK
• Preserve machine invariant (M) using machine description;– for each new RTL produced, ask MD if OK– if any is not target machine instruction,
roll back transformation
6/16/2000 PLDI NCI Tutorial 144
VPO Optimization Driver
• Review vpo.c
6/16/2000 PLDI NCI Tutorial 145
Adding a new optimization
• Determine where in optimize to insert the function– What analyses does the optimization
need?•Control-flow optimizations usually come
first as they need very little data-flow information
•Data-flow optimizations follow: code motion, induction-variable elimination, common subexpression elimination
– Does the optimization operate on a single basic block or does it operate across basic blocks?
6/16/2000 PLDI NCI Tutorial 146
Adding a new optimization
• Browse controlflow.c/fix_control_flow()
• Browse cdmotion.c/code_motion()
6/16/2000 PLDI NCI Tutorial 147
Semantic Safe Points
• A semantic safe point is a point in the optimization process where the code satisfies the M and S invariants– Code can be emitted at any semantic
safe point and it should run correctly– Can insert new optimization between
any semantic semantic-safe point
6/16/2000 PLDI NCI Tutorial 148
Debugging the compiler
SourceCode
Front andMiddle Ends
VPO Mach MachineCode
RTL
Trans n..........Trans 4Trans 3Trans 2Trans 1
6/16/2000 PLDI NCI Tutorial 149
VET-VPO Examination Tool
• Allows transformations to be observed– Observe data structure (control-flow
graph)– Set a break point at a transformation– Set a break point at a phase– Replay a transformation
VET and VPOISO
Raja VenkateswaranUVA
6/16/2000 PLDI NCI Tutorial 151
VET
• VET -> VPO Examination Tool• GUI for viewing optimizations• By Phase and By transformation• Ability to revert to previous
phases• Wide range of user options
6/16/2000 PLDI NCI Tutorial 152
VPOISO
• Tool for isolating optimizer bugs
• Uses binary search to find the first transformation error
• Works by comparing against the correct output