dixie binary translation and optimization for multiple isas
DESCRIPTION
DIXIE Binary Translation and Optimization for Multiple ISAs. Computer Architecture Department Universitat Politècnica de Catalunya-Barcelona. www.ac.upc.es/dixie. UPC people involved. Roger Espasa Agustín Fernández Manel Fernández Victor Moya Juan Lopez Silvia Cernuda Antonio Parada - PowerPoint PPT PresentationTRANSCRIPT
DIXIE
Binary Translation and Optimizationfor Multiple ISAs
Computer Architecture Department
Universitat Politècnica de Catalunya-Barcelona
www.ac.upc.es/dixie
UPC people involved
Roger Espasa Agustín Fernández Manel Fernández Victor Moya Juan Lopez Silvia Cernuda Antonio Parada Albert Ribé Álex Ramírez
Dixie
Static binary translator Accepts multiple ISAs (Alpha, x86, PPC, Mips, Convex) Translates to a common IR (Dixie ISA)
Static binary instrumentation Works on common IR but reflects source ISA
Static binary optimizer Optimizes the common IR Generates native code from common IR
Multiple targets supported also (Alpha, Mips)
Dixie Virtual Machine Can run binaries specified in the common IR Also runs binaries with mixture of common/native code
Dixie overview
NativeISAsTarget
ISAs
Dixiebinary
Alpha
DIXIEC
JANGO
SPEEDY
DVM(Dixie Virtual Machine)
Userspecification
Convex
x86
Mips
Usersimulator
Alpha
Mips
...
Alpha
Convex
PowerPC
x86
Mips
PowerPC
Targetbinaries
Outline
Motivation
DIXIE Architecture
Debugging Tools
Performance
Summary
Outline
Motivation
DIXIE Architecture
Debugging Tools
Performance
Summary
Binary Translation
For embedded processors Embedded market is
Rapidly moving Changes processors frequently
Software (development, porting) is a major cost issue Binary translation is cheaper than retargeting gcc
Goals Retargeting must be FAST and EASY Support different ISAs Provide good debugging tools
To ease writing ISA description To verify correctness of translations
Techniques Static Translation (as much as possible) Some Dynamic Translation (only if necessary)
Binary Optimization
Inevitably, binary translation introduces overheads Use static and dynamic optimization to
Adapt better to new chip Offset overheads of static binary translation
Goals Eliminate overheads due to
Manual translation process Intermediate ISA lack of expressiveness
Incremental development of the optimizer
Techniques Static optimization (as much as possible) Dynamic optimization (only if necessary) Optimized blocks still run within Virtual Machine
Instrumentation
Instrumentation of program binaries For computer architecture research Due to lack of access to ‘exotic’ machines Historical origin of Dixie…
Many classes of tools, but... Different tools for different machines Porting tools is difficult Few tools allow research on vector machines or new ISAs Lack of wrong-path information
Dixie goals Cross-platform instrumentation Research on multiple & discontinued ISAs Full architecture coverage Wrong-path information
Outline
Motivation
DIXIE Architecture
Debugging Tools
Performance
Summary
Dixie overview
NativeISAsTarget
ISAs
Dixiebinary
Alpha
DIXIEC
JANGO
SPEEDY
DVM(Dixie Virtual Machine)
Userspecification
Convex
x86
Mips
Usersimulator
Alpha
Mips
...
Alpha
Convex
PowerPC
x86
Mips
PowerPC
Targetbinaries
Dixie compiler
NativeISAsTarget
ISAs
Dixiebinary
Alpha
DIXIEC
JANGO
SPEEDY
DVM(Dixie Virtual Machine)
Userspecification
Convex
x86
Mips
Usersimulator
Alpha
Mips
...
Alpha
Convex
PowerPC
x86
Mips
PowerPC
Targetbinaries
Jango
NativeISAsTarget
ISAs
Dixiebinary
Alpha
DIXIEC
JANGO
SPEEDY
DVM(Dixie Virtual Machine)
Userspecification
Convex
x86
Mips
Usersimulator
Alpha
Mips
...
Alpha
Convex
PowerPC
x86
Mips
PowerPC
Targetbinaries
Breakpoints: trace
mov a0,a1
ld.w @8(a1),a2
sub.w #8,a2
MOV.lo.32 r11,r10
LOAD.lo.32 r500,r11,#8
LOAD.lo.32 r12,r500,#0
SUB.c2.32 r12,r12,#8
LOAD.lo.32 r500,r11,#8
LOAD.lo.32 r12,r500,#0
SUB.c2.32 r12,r12,#8
MOV.lo.32 r11,r10
TRACE vpc,r11,#8
DIXIEC JANGO
TRACE vpc,r500,#0
Speedy & DVM
NativeISAsTarget
ISAs
Dixiebinary
Alpha
DIXIEC
JANGO
SPEEDY
DVM(Dixie Virtual Machine)
Userspecification
Convex
x86
Mips
Usersimulator
Alpha
Mips
...
Alpha
Convex
PowerPC
x86
Mips
PowerPC
Targetbinaries
Speedy & DVM
Dixie binary is optimized by Speedy Optimizations at basic block (BB) level
Translate Dixie BBs into native code Generates .speedy sections
Dixie binary is runable on top of the DVM Emulates the behavior of each Dixie instruction
Interpreting each Dixie instruction Jumping into sequences of “Speedy” BBs
Interacts with the user simulator Through trace instructions inserted by Jango
Maps target system calls into host system calls Through DixOS
DVM Portability
DVM runs on all major hardware combinations:
x86 / LINUXPower2 / AIX
Sparc/SUNOS
Alpha / OSF1
IA64/LINUXMIPS / IRIX
Little Endian Big Endian
32 bits
64 bits
Speedy Architecture
Front End: Understands Dixie ISA Optimizes Dixie Code (NOP, VPC, CSE) Lowers Representation
Load Virtual Registers into physical registers Local register allocation Load large constants into registers
Back End: Translates Dixie ISA into target ISA Instruction translation
Opcode selection Big/Little endian memory access Alignment issues
Peephole Optimizer Recognize instruction sequences Remove redundant loads Remove redundant branches
Outline
Motivation
DIXIE Architecture
Debugging Tools
Performance
Summary
Debugging
Porting to a new ISA is not easy Many “cut-and-paste” bugs A trivial bug may take weeks to be found without
appropriate tools
We would like developers to “Test-as-you-go’’ every instruction description Test each instruction almost in isolation Quickly compare DVM and native results
andiu. ra, rs, ui
MOV.lo.32 r(TMP0),uiSHL.lo.32 r(TMP0),r(TMP0),32 AND.lo.32 (ra),r(rs),r(TMP0) CMPLT.c2.32 r(ICR(POSCRI(0))),r(ra),0CMPGT.c2.32 r(ICR(POSCRI(1))),r(ra),0CMPEQ.lo.32 r(ICR(POSCRI(2))),r(ra),0AND.lo.32 r(TMP0),r(XER),0x80000000CMPNE.lo.32 r(ICR(POSCRI(3))),r(TMP0),0
Outline
Motivation
DIXIE Architecture
Debugging Tools
Performance
Summary
Performance
Benchmark suite SPECint95
Environment DEC Alpha AXP-21264 running at 625 MHz OSF/1 v4.0
Two versions of the Dixie binaries DVM: “pure” Dixie binaries Speedy: Dixie binaries optimized using Speedy
DVM slowdown
0
25
50
75
100
125
150
go
m8
8ks
im
gcc
com
pre
ss
li
ijpe
g
pe
rl
vort
ex
DVMSpeedy
Alpha on Alpha
Outline
Motivation
DIXIE Architecture
Debugging Tools
Performance
Summary
Summary
Binary translation & optimization Are becoming important tools in the embedded market Promise lower development costs
When changing architectures Are also of interest to major computer manufacturers
IA-64 emulation Transmeta FX!32 (now obsolete)
DIXIE Robust tool that meets most translation demands Multi-ISA, Multi-platform