epic architectures and compiler technology wen-mei hwu epic architectures wen-mei hwu department of...
Post on 20-Dec-2015
217 views
TRANSCRIPT
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
EPIC Architectures
Wen-mei Hwu
Department of Electrical and Computer Engineering
Coordinated Science Laboratory
University of Illinois at Urbana-Champaign
IMPACT Grouphttp://www.crhc.uiuc.edu/IMPACT/
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Outline
• History and Background
• Control Speculation
• Predication
• IMPACT EPIC Architecture
• Compiler Technology
• Outlook
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Vision: Bridging the Gap Between Programs and Hardware
if (x>=0)
if (x==1 ||
x==2 ||
x==3)
m=f(x);
else
m=g(x);
f g
>== = =1 32 0
1
m
+
x
0
enable
x>=0
x!=3
x!=1
x!=2
m=g(x) m=f(x)
TF
T
T
T
F
F
F
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Can we get the best of both worlds?
• Hardware– highly speculative
– parallel in nature
– efficient logic manipulation
– special purpose
– area effiicient
– enery efficient
• Programming– conservative semantics
– sequential in nature
– awkward logic manipulation
– easily retargeted
– area inefficient
– energy inefficient
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
EPIC Design Objectives• To define a programmable architecture
model that allows compiled programs to approach special hardware design in– logic manipulation capability
– speculation and parallelism
– chip area efficiency
– energy efficiency
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Significant Milestones
• 1994 Intel/HP forms IA-64 alliance with U. of Illinois contribution
• 1997 Announcement of IA-64
• 1997 Motorola/Lucent forms StarCore alliance with U. of Illinois contribution
• 1998 major computer vendors adopt IA-64
• 1998 Announcement of StarCore
• 1999 Release of user mode architecture
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Evolution of VLIW/EPIC
First-Generation
VLIW
Second-Generation
VLIW
Research
MicrocodeCompaction
Product
Floating-PointSystems
NYUTrace Scheduling
(1976-1979)
TRWPolycyclicProcessor
Module Scheduling(1980-1982)
YaleELI
Bulldog Compiler(1980-84)
CydromeCydra 5
(1984-1988)
MultiflowTRACESeries
(1984-1990)
CullerPSC
(1983-1987)
Univ. of IllinoisIMPACT(1987 - )
HP LabsPlayDoh(1989 - )
ArrayProcessors
Numerix,CDC,etc.
Intel/HP/Moto/Lucent
EPIC
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
EPIC - the IMPACT Perspective• IMPACT work done since 1987 to lay
foundation for EPIC architectures– Intel/HP IA-64, Motorola/Lucent StarCore
• Key Technologies– control speculation [ISCA-91] [ASPLOS-92] [MICRO-96]
– data (dependence) speculation [ICS-92] [ASPLOS-94]
– predicated execution [MICRO-92][ISCA-95] [MICRO-97]
– integrated architecture and inline recovery [ISCA-98]
– logic minimization approach to predication [ISCA-99]
– implementation neutral predication architecture [TBD]
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Outline
• History and Background
• Control Speculation
• Predication
• IMPACT EPIC Architecture
• Compiler Technology
• Outlook
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Control Speculation
• Executing an instruction before knowing that its execution is required
• Moving an instruction above a branch– Removes control dependences to increase ILP– Win when branch directions predicted correctly
• Instruction sequence seen by hardware is changed!– Must ensure that execution result unaffected by such
movement
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Control Speculation Example
• A: r6 = r4+1• B: If (r9==0) goto L1• C: r1 = MEM(r2+0)• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• G: MEM(r2+r4) = r4
• C: r1 = MEM(r2+0)• A: r6 = r4+1• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• B: if (r9==0) goto L1• G: MEM(r2+4) = r4
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Scheduling Error
• An ordering of instructions that will – cause early program termination or
– produce results that differ from those of the unscheduled program.
• To avoid scheduling errors– Live value must be properly preserved - register
renaming
– Spurious Exception condition must be supressed
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Safe Speculation• Compiler analysis to identify
– instructions that are always safe.
– speculation that will not introduce a new exception.
• Trivial analysis examples:– array references with constant indices
– divide and remainder with non-zero divisor
• Complex analysis examples:– Branches to ensure legal input operands
– Earlier use of the same input operand
– Loop analysis
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Silent Instructions
• Architecture provides silent versions of instructions that may potentially cause exceptions.– Multiflow - silent FP instructions– HPPA - silent FP instructions, silent de-referenced null pointer– SPARC V9 - silent load instruction
• To move an instr. above a branch, convert it into its silent version.
• Both Multiflow TRACE and Cydrome Cydra-5 used similar ideas.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Silent Instructions• Memory access instructions
– If a segmentation fault condition occurs, the instruction is canceled before it reaches the memory system. An arbitrary garbage value is returned.
– If a page fault happens without segmentation fault, the OS page fault handler is immediately invoked as usual. Extra page faults may occur from speculation.
• Arithmetic instructions– If a trap condition occurs, an arbitrary garbage value is deposited
into the destination register.
• The exception condition is either immediately handled or simply ignored.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Debugging Implications• If the speculated instruction:
– the garbage value generated by a silent instruction would not be used.
– the exception condition is correctly ignored since the silent instruction should not have been executed.
• If the branch agrees with compile-time prediction:– the exception condition that occurred to a silent instruction is
incorrectly ignored.
– the garbage value generated may be used by a subsequent instruction without warning.
– not acceptable if exceptions must be reported timely and accurately
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Performance Issues
• Page faults caused by silent loads are handled right away– no support to defer page fault until execution of
instruction is confirmed.
– Additional page may faults result from speculation.
– The number additional page faults should be small for systems that are designed not to page.
• Similar issues exist if TLB misess are handled through exception mechanism.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Sentinel Scheduling• Design Objective
– Correctly ignore exceptions generated by speculative instructions whose execution turns out to be unnecessary.
– Correctly report exceptions generated by speculative instructions whose execution is confirmed.
– Support recovery from exceptions thus reported.– Provide the option to handle page faults after the need
for executing a speculative instruction is confirmed.– Minimize the extra hardware and instructions needed to
achieve the objectives above.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Accurate Exception Report
• Each instruction has two parts:– Non-excepting part which performs the actual
operation
– Sentinel part that flags an exception if necessary
• Non-excepting part of I can be speculatively executed provided the sentinel part stays in I's home block
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Sentinel Speculation Example
• A: r6 = r4+1• B: If (r9==0) goto L1• C: r1 = MEM(r2+0)• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• G: MEM(r2+r4) = r4
• C: r1 = MEM(r2+0)• A: r6 = r4+1• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• B: if (r9==0) goto L1• sentinels B, C, D, E• G: MEM(r2+4) = r4
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Sentinel Elimination
• The sentinel of I can be eliminated if – there is another instruction in I's home block which uses the
result of I OR – I is non-excepting and is not the last direct or indirect use of
an excepting instruction's destination
• Unprotected instruction - an instruction whose sentinel cannot be eliminated.
• If an unprotected instruction is speculated, an explicit instruction must be created to serve as the sentinel
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Sentinel Speculation Example
• A: r6 = r4+1• B: If (r9==0) goto L1• C: r1 = MEM(r2+0)• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• G: MEM(r2+r4) = r4
• C: r1 = MEM(r2+0)• A: r6 = r4+1• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• B: if (r9==0) goto L1• H: check r5• G: MEM(r2+4) = r4
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Architectural Support
• Additional bit in opcode field to specify speculative instruction. – can be partially supported by adding speculative version
of all opcodes that should be considered for speculative scheduling and that can directly or indirectly cause exceptions.
• Exception bit (vector) added to each register to mark exceptions caused by a speculative instruction. – These bits need to be preserved across context switches.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Execution Model
• Speculative instructions – src(I).except = 0
• I does not cause an exception, normal execution
• I causes an exception
– dest(I).except = 1
– dest(I).data = pc of I– src(I).except = 1 (exception propagation)
• dest(I).except = 1,
• dest(I).data = src(I).data
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Execution Model
• Non-speculative instructions – src(I).except = 0
• I does not cause an exception - normal execution
• I causes an exception - I reported as source of exception
– src(I).except = 1
• (report exception for speculative instruction)
• signal exception
• src(I).data is PC of exception
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Scheduling Algorithm
• Identify unprotected instructions
• Perform conventional scheduling– if an unprotected instruction is moved above a branch,
an explicit sentinel instruction is inserted into list of to-be-scheduled instructions
– Explicit sentinel restricted to remain in I's home block with control dependences
– All instructions moved above a branch are marked as speculative
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Recovery from Exception
• Important to allow accurate handling of page faults and TLB misses.
• Issues: – ensure that instructions can be retried after the
exception condition is handled
– minimize the negative performance impact in terms of register pressure and instruction count due to recovery.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Recovery Block• Copy speculative instructions into recovery
blocks– One entrance point per potential exception reported by a sentinel
– Code Expansion vs. Efficiency
– must provide a means to reach recovery block - explicit checks
• Source registers of the instructions not in the recovery blocks are not preserved.
• Instructions re-executed during recovery are reduced.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Recovery Block Example
• A: r6 = r4+1• B: If (r9==0) goto L1• C: r1 = MEM(r2+0)• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• G: MEM(r2+r4) = r4
• C: r1 = MEM(r2+0)• A: r6 = r4+1• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• B: if (r9==0) goto L1• H: check r5, L2• I: check r4, L3• G: MEM(r2+4) = r4
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Recovery Block Example
• Recovery Block for C• L2:• C: r1 = MEM (r2+0)• E: r5 = r1 + 1
• Recovery Block for D• L3:• D: r3 = MEM (r2+r4)• F: r4 = r3 + 1• G: MEM (r2+4) = r4
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Multiple Exceptions
• Different basic blocks
– first sequential exception always reported since check instruction guaranteed to remain in home block of each potential trap-causing instruction
• Same basic block– An exception will be signaled but no guarantee it will
be the first according to original source code
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Outline
• History and Background
• Control Speculation
• Predication
• IMPACT EPIC Architecture
• Compiler Technology
• Outlook
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Predicated Execution
• Conditional execution of instructions based on a Boolean source operand
• Execution model– Load r1, r2, r3 <p1>
– If p1 is TRUE, instruction executes normally
– If p1 is FALSE, instruction treated as NOP (with some exceptions)
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Full Predication Support
• Predicate defining instructions
• Full set of predicated instructions
• Separate predicate register file
• Best performance
• Cydra-5, IA-64, TI-C60, StarCore
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Partial Predication Support
• Adds limited set of predicated instructions to existing ISA– no extension to operand format
– CMOV
• Brings some performance increase to existing ISA’s
• SPARC, Alpha, MIPS, P6
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
HP-PD Predicate Defines
pred< cmp > dest < type >, src1, src2 (Pin)
• < cmp > - condition: =, >, <, etc.
• < type >– Unconditional (U, U)
– OR-type (O, O)
– AND-type (A, A)
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Unconditional Predicate Defines
• For blocks reached on one condition
If (a < 10) c= c+1;
else if (b > 20)
d = d+1;else
e = e+1;
bge a, 10, L1
add c, c, 1jmp L3
ble b, 20, L2
add d, d, 1jmp L3
add e, e, 1
L3
F
TF
T
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Unconditional Predicate Define
Pin Condition U U0 0 0 00 1 0 01 0 0 11 1 1 0
Pout
bge a, 10, L1
add c, c, 1jmp L3
ble b, 20, L2
add d, d, 1jmp L3
add e, e, 1
L3
F
TF
T
pred p1(U), p2(U), a 10add c, c, 1 (p2)pred p3(U), p3(U), b 20 (p1)add d, d, 1 (p4)add e, e, 1 (p3)
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Or Predicate Defines
• For blocks reached on multiple conditions
If (a && b) c= c+1;
else d = d+1;
beq a, 0, L1
beq b, 0, L1
add d, d, 1jmp L2
L1: add e, e, 1
L2:
F
TF
T
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Or-type Predicate Define
Pin Condition O O0 0 - -0 1 - -1 0 - 11 1 1 -
Pout
pred_clr p1pred p1(O), p2(U), a = 0pred p1(O), p3(U), b = 0 (p2)add d, d, 1 (p3)add e, e, 1 (p1)
bge a, 0, L1
ble b, 0, L1
add d, d, 1jmp L2
L1: add e, e, 1
L2:
F
TF
T
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
And-type Predicate Define
Pin Condition O O0 0 - -0 1 - -1 0 0 -1 1 - 0
Pout
pred_clr p1pred_set p3pred p1(O), p3(A), a = 0pred p1(O), p3(A), b = 0add d, d, 1 (p3)add e, e, 1 (p1)
bge a, 0, L1
ble b, 0, L1
add d, d, 1jmp L2
L1: add e, e, 1
L2:
F
TF
T
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Outline
• History and Background
• Control Speculation
• Predication
• IMPACT EPIC Architecture
• Compiler Technology
• Outlook
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
IMPACT EPIC Architecture• Predication
– base model is HP-PD [Schlansker,Rau, Kathail]
– added implicit predicate pR to facilitate speculation
– prefix alternative for code size control [EuroPar-99]
– added new conjunctive and disjunctive types to facilitate minimization of program decision logic
– moving towards implementation-neutral predication
• Control Speculation– based on Sentinel model [ASPLOS-92]
– added R-Tags (in addition to E-tags) and pR (implicit recovery predicate) to enable inline recovery
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
IMPACT EPIC ArchitectureRegister File
Value/PC E-Tag R-Tag
Memory Conflict Buffer
Register Tag and Attribute
S
Instructions
DS
T/F E-Tag R-Tag
Predicate Register File
LOAD Pred
S
DS CHECK Pred
PredOPERATION
T/F E-Tag R-Tag pR
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Control Speculative Execution• Speculative instruction causes an exception
– write current PC into destination register
– set E-Tag in destination register
• Speculative instruction propagates an exception– a source register with set E-Tag– Propagate PC from source to destination register– set E-Tag in destination register
• Non-speculative instruction detects exceptions– a source register with set E-Tag
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Microprocessor Microarchitecture Main Memory
100 cycle latency
L2 Cachenon-blocking1M-byte, 4-way64B block
System Bus Backside BusInterface
L1 Cache
non-blocking16K-byte, 2-way32B block
BTB1Kdirect-mapped2-level
I-Fetch Unit
I-Cache32K-bytedirect-mapped64B block (split)
Instruction Decoder
Register Alias Table(64 predicate and 128 regular) Reorder Buffer
ArchitectureRegister File
2 Memory Units
1 Floating PointUnit
1 Branch Unit
2 Integer Units
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Result of Applying EPIC Techniques3.10
1.00
1.20
1.40
1.60
1.80
2.00
2.20
2.4000
8.es
pres
so
023.
eqnt
ott
072.
sc
085.
cc1
124.
m88
ksim
129.
com
pres
s
130.
li
132.
ijpe
g
147.
vort
ex
cccp
cmp
eqn
grep lex
wc
yacc
Spe
edup
ove
r ba
se
Predication onlyControl speculation onlyData speculation onlyPredication and speculation combined
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Integrated Predication and Control Speculation
• All of the following must be true for a predicated instruction to take effect– input predicate true
– input predicate E-Tag false
– either • pR false, or
• R-Tag of at least one input registers true
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Speculation Example
• Speculative (affected by exception)
• speculative (not affected)• Non-speculative• branch• check (non-speculative
use)
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Inline Recovery Model• Processor enters recovery mode, set pR
– PC in source register used as recovery PC– The speculative instruction at recovery PC is
executed non-speculatively.– Exception processing is performed.– If exception is non-terminating, the result is
stored into destination register, set R-Tag.– Instructions with R-Tag set in source registers
are executed, set R-Tag in destination register
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Inline Recovery Model (cont.)– Non-speculative instructions not repeated.
• Stores, self-incrementing loads and stores, etc. are safe.
• Same effect is achieved by recovery blocks.
• Source registers of non-speculative instructions do not need to be preserved.
– Branches and predicate defines repeated• to reproduce original control flow
• input condition must be preserved
– Recovery model is turned off when reaching check with set source R-Tag.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Recovery Block - Code Size
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
00
8.e
spre
sso
02
3.e
qnto
tt
07
2.s
c
08
5.c
c1
09
9.g
o
12
4.m
88
ksim
12
9.c
om
pres
s
13
0.l
i
13
2.i
jpeg
14
7.v
ort
ex
cccp
cmp
eqn
grep
lex
wc
yacc
Co
de E
xpan
sio
n
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Instruction Cache Miss Comparison (32k, direct mapped)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%0
08
.esp
ress
o
02
3.e
qnto
tt
07
2.s
c
08
5.c
c1
12
4.m
88
ksim
12
9.c
om
pres
s
13
0.l
i
13
2.i
jpeg
14
7.v
ort
ex
cccp
cmp
eqn
grep
lex
wc
yacc
Per
cent
red
uct
ion
(%)
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Instruction Cache Miss Comparison (64k, 8way)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%0
08
.esp
ress
o
02
3.e
qnto
tt
07
2.s
c
12
4.m
88
ksim
13
0.l
i
13
2.i
jpeg
14
7.v
ort
ex
Per
cent
red
uct
ion
(%)
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Spurious Cache Misses and Exceptions
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%0
08
.esp
ress
o
02
3.e
qnto
tt
07
2.s
c
08
5.c
c1
12
4.m
88
ksi
m
12
9.c
om
pre
ss
13
0.l
i
13
2.i
jpeg
14
7.v
ort
ex
cccp
cmp
eqn
gre
p
lex
wc
yacc
Per
cent
redu
ctio
n (
%)
Cache missesExceptions
• Spurious cache misses, TLB misses, and page faults are frequent in speculated code. Failing to suppress them can have a detrimental effect on performance.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Outline
• History and Background
• Control Speculation
• Predication
• IMPACT EPIC Architecture
• Compiler Technology
• Outlook
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
EPIC Compiler Technology Overview
If-Conversion
Classical Optimization
Predicate Optimization
ILP Optimization
Scheduling/PartialReverse If-Conversion
Register Allocation
Code Generation
Deb
uggi
ng o
f O
ptim
ized
Cod
e
Predicated Dataflow
PredicateAnalysis
Source
MemoryDisambiguation
MachineDescription
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Technology Vision• Compiler
Technology– analyze programmatic intentions
• pointer alias analysis
• integer range analysis
• predicate analysis
– transformations• program decision logic
minimization [ISCA-99]
• fully resolved predicate optimizatios
• data structure optimization
• algorithm transformations
• Architecture Support– logic manipulation instructions
• efficient condition tests
• instructions to efficiently combine conditions
– highly effective speculative execution
• cache misses, TLB misses
• exceptions and dependence violations [ISCA-98]
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Vision: Bridging the Gap Between Programs and Hardware
if (x>=0)
if (x==1 ||
x==2 ||
x==3)
m=f(x);
else
m=g(x);
f g
>== = =1 32 0
1
m
+
x
0
enable
x>=0
x!=3
x!=1
x!=2
m=g(x) m=f(x)
TF
T
T
T
F
F
F
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Analysis of Predicated Codes
• Live Variable Analysis Example:– Without Predicate Aware Dataflow (Only
instructions on TRUE predicate can kill.)• R7 is defined and killed by instruction 5; R7 is used
by instruction 6.• R7’s live range is (5,6).• R3 is not defined and killed by instruction 3 in all
cases because it is predicated on P1. R3 is used by instruction 4.
• R3’s live range is (1,2,3,4) and live out the top of the CB.
– With Predicate Aware Dataflow• R7’s live range is also (5,6).• R3’s live range is (3,4) because instruction 3
defines R3 for all uses by instruction 4. This is known by studying the relation of P1 to P2.
1 (p1un) = (r1 < 0)
2 (p2un) = (r2 < 0) (p1)
3 r3 = r4 + r5 (p1)
4 r8 = r3 + 1 (p2)
5 r7 = r4 + r6
6 r4 = r7 - 1 (p1)
7 r9 = r9 / 2 (p2)
• Dataflow without regard to predicates leads to conservative results.
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Dataflow Analysis of Predicated Code
• Traditional dataflow requires reverse if-conversion (RIC)
• RIC of some codes is exponential (wc: 5,20,80,240,...)
• Factoring reduces order of complexity (wc: 8,15,22,28,...)
RIC of one iter. (width 5)
Code example (wc)
RIC of code with 2x unroll (width 20)
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Code Size Control using Predication
Code example (099.go copyshape):
• Predication reduced code size by instruction merging (in example 35%)
OriginalPredicated
B
S
L
JS
B
S
L
JS
B
S
L
JS
B
S
L
JS
S
L
JS
B
S
L
JS
B
S
L
JS
B
P
X
P P P P PP L
X X
S S S S
X
Code example (MediaBench Experimental Image Compression reflect1):Original (Overhead=8/17 instrs (47%)) Optimized (6/19 (30%)) Predicated (3/13 (23%))
B1
J
B2
B1
J
J
B1
J
B1
J
B2 B2
J
J
P1I0
I1
I3 I4 I5 I6
I0I0I1
I7 I8
I9
I2 I2I2I4I3I3
I5I8
I9
I7 I6
I2
I7 I8I9I8
I7
P3I1P2
I3 I4 I5 I6
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Program Decision Logic Optimization
• Express control as a predicate network
• Reformulate decision as a logic network mimicking circuit minimization techniques
T
p1
p2 p4
p3 p5 p6
p8
T
p7
p3
p5
p8
p3 p5 p8
T
p3
p5
p8
T
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Working Example - 132.ijpeg in SPEC95
• Contains 477 functions and 25,889 lines of code• Spends 200 seconds and 18MB of memory in analysis• 229 of 266 indirect call-sites are converted into direct ones
f6
f3f7
f3(&s1, &i, &j);f7(s1);
f?
*s->p = 10;*s->q = 20;(*s->fp)(s);
s1s
i
j
v1
v2 s1 qp
j
i
fpf5
t = malloc();t->p = v1;t->q = v2;t->fp = f5;*s = t;
Prior to object elevation
After object elevation
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Debugging of Optimized Code (PLDI-99)
• When to take over execution and when to stop forward recovery?– original execution order of instructions has to be tracked
– instructions might be moved up to different paths leading to the breakpoint or down to different paths starting from the breakpoint
I1(S1)I1’(S4)I5 (S4)
I2(S2)I3(S3)I4(S3)
A
B
C D
E
F
I1(S1)I1’(S4)I5 (S4)
I2(S2)I3(S3)I4(S3)
A
B
C D
E
F
I3’(S3)
I4’(S3)
I1’’(S2,S4)
A
B
D
E
F
I3’(S3)
I4’(S3)
breakpointI5 (S4)
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Outline
• History and Background
• Control Speculation
• Predication
• IMPACT EPIC Architecture
• Compiler Technology
• Outlook
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
EPIC Research Challenges
• Implementation neutral architecture
• Profile independence and program transparent profiling
• Code size optimizations
• Analysis of predicated code
• Interprocedural alias analysis
• Debugging of optimized code
EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT
Outlook• Compilers critical to the performance of EPIC uP’s
– Use of predication and speculation is a serious challenge
– Any misuse will lead to performance loss.
– Brand new algorithms will be deployed in the EPIC compilers.
– Existing software development models must be supported.
• Expect performance robustness issues– Awesome performance leap seen for some applications.
– Less for others due to limitations of analyses and optimizations.
– It can take years for the performance gain to be universal.
– A lot of research activities needed, www.trimaran.org.
• Evolution of EPIC architectures– Revisions of architectures are likely as compilers mature.
– Code size and power consumption are critical for embedded EPICs.