framework for profile-analysis data-layout optimizations
DESCRIPTION
Framework for Profile-Analysis Data-Layout Optimizations. Shai Rubin. Ras Bodik. Trishul Chilimbi. University of Wisconsin. University of Wisconsin. Microsoft Research. DL Optimization. Data Layout Optimization (What). References sequence: A.x, B, A.z. Original data layout. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/1.jpg)
1
Framework for Profile-Analysis Data-Layout Optimizations
Shai Rubin Ras Bodik Trishul Chilimbi
Microsoft ResearchUniversity of Wisconsin University of Wisconsin
![Page 2: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/2.jpg)
2
Data Layout Optimization (What)
CPU
Cache
Memory
References sequence: A.x, B, A.z
1 cycle
102 cycles
106 cycles
Disk
B
A
A.x
time
time
cache blocks
1
2
3
4
Memory Pages
1
2
BAA
time
time
cache blocks
B
1
2
3
4
Memory Pages
1
2
DL Optimization
A.x B A.z
A.x B A.z
A.x B A.z
A.x B A.z
A.x B A.z
A.x B A.z
AB BA.x B A.z
A.x B A.z
DL optimization: increase spatial locality of data to prevent memory faults.
Original data layout Modified data layout
A.z
B
A
A.x A.zA.z A.x
![Page 3: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/3.jpg)
3
Data Layout
Layout Space
Data Layout Optimization (How)
Optimal for simple
loopsHeuristic
Reference Summary
Array Dep.
Analysis(static)
Ref. Trace
(dynamic)
Scientific(array based)
General purpose
(pointer based)
Compile Time
1. Compile Time 2. Runtime
Program
Optimal Layout
Enforce layout
Data Layout Optimizer“Good” Layout
Program′
![Page 4: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/4.jpg)
4
Problems with Current Data-Layout Optimization
• Computationally hard to find the optimal layout [Petrank].
• Computationally hard to approximate the optimal layout
[Petrank].
• Implication - heuristics are not robust:– will not work for all programs.
• From our experience with heuristics:– Field Reordering [Chilimbi PLDI’99] – no improvement (on perl).
– Custom Memory Allocator [Seidl ASPLOS’98] degrades performance (on
espresso).
• Our approach: replace heuristic with feedback-driven search.
![Page 5: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/5.jpg)
5
Data Layout Space
Searching For a Data Layout
Current program data layout
“Good” Layouts“Good” + “easy” to enforce layouts
– a “good” layout.
• Search advantage: – Robust, for each program finds a “good” layout.
Optimal data layout
– an “easy” to enforce layout.
• Problem: Perform a search in the data layout space.
• Look for:
![Page 6: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/6.jpg)
6
Is Search Practical?
Possible layouts
Data Layout
Reference Trace
Optimizer (Heuristic)
Enforce layoutEdit Compile Execute Evaluate Continue?
End
• Not clear:
Enforce
![Page 7: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/7.jpg)
7
Outline
• Background and Problem Definition
• Search is a solution, but may not practical
– Making the search practical
• Applications
• Summary
![Page 8: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/8.jpg)
8
Making the Search Practical
Reference Trace
Data Layout Search Engine
Edit Compile Execute Evaluate Continue?
End
Compress(T)CST
Data Object Analysis DOA(CST,LS)NLS
Layout Selector LS(NLS,B,CST,SS)DL
Enforce LayoutAL(DL,CST)NT
EvaluateSimulate(NT)B
“good “and enforceable
layoutsClass Splitting
Linearization
Field ReorderingLayout
Space
Narrowed Space
Search Strategy
Trace
Data Layout
New Trace
Continue(B)
Benefit
Benefit
CompressedSymbolicTrace
Search Strategy
T
T
Trace
Framework for Data Layout Optimization
T
![Page 9: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/9.jpg)
9
Trace Representation
• Problem: reference trace cannot be easily manipulated since it is too
large (>10GB, >100M references).
• Solution: compressed trace (using modified SEQUITUR).
• Example:
- Trace: acbcbcbcbdbdbdbde
• Representation advantage:
- Compact; fits into main memory [ChilimbiPLDI’01].
- Expose repetitions (we use this later).
- It produces a symbolic trace (i.e., a terminal is a data object).
SEQUITUR Representation
SacBBBAAe Bbc
ACC Cbd
![Page 10: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/10.jpg)
10
Framework for Data-Layout Optimization
Reference Trace
Data Layout Search Engine
Compile Continue?
End
Compress(T)CST
Data Object Analysis DOA(CST,LS)NLS
Layout Selector LS(NLS,B,CST,SS)DL
Enforce LayoutEL(DL,CST)CST’
EvaluateSimulate(NT)B
“good “and enforceable
layoutsClass Splitting
Linearization
Field ReorderingLayout
Space
Narrowed Space
Search Strategy
Trace
Data Layout
Continue(B)
Benefit
Benefit
CompressedSymbolicTrace
Search Strategy
New Trace
![Page 11: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/11.jpg)
11
Avoid re-compilation• Problem: data layout evaluation (edit+compilation+simulation).
• Solution: “pretend” that the program was edited and compiled.
A.x, B, A.z, B
A.x10A.z14B20
30,20,34,20
New concrete trace
Single symbolic trace
CompileRun
(simulate)Edit
program
Enforce Layout
• Symbolic trace + data layout concrete address trace.
A.x30A.z34B20
30,20,34,20
• Simple, but crucial for an efficient search.
User(Optimizer)
Simulate
![Page 12: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/12.jpg)
12
Framework for Data-Layout Optimization
Reference Trace
Data Layout Search Engine
Compile Continue?
End
Compress(T)CST
Data Object Analysis DOA(CST,LS)NLS
Layout Selector LS(NLS,B,CST,SS)DL
Enforce LayoutEL(DL,CST)CST’
Evaluate Simulate(CST’)B
“good “and enforceable
layoutsClass Splitting
Linearization
Field ReorderingLayout
Space
Narrowed Space
Search Strategy
Trace
Data Layout
Continue(B)
Benefit
Benefit
CompressedSymbolicTrace
Search Strategy
New Trace
![Page 13: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/13.jpg)
13
Memoization: Efficient Trace Simulation
• Evaluation using simulation: MissRateT=Simulate(T);
• Problem: simulation of the whole trace (T) is too expensive.
• Solution: avoids re-simulation of repeated sub-traces.
SEQUITUR Representation
SBBBAA Bbc
ACC Cbd
CSC=Simulate′(C)
CSB=Simulate′(B)
CSA = CSCCSC
CSS = CSBCSBCSBCSACSA T: bcbcbcbdbdbdbd
• Memoization:
1. Simulate each “low level” rule, compute its memoization value.− For cache simulation: memoization value = CacheState [CS].
2. Recursively compose memoization values for “higher” rules.
MissRateT = Length(T)
CSMissess
![Page 14: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/14.jpg)
14
Outline• Background and Problem Definition
• Search is a solution, but maybe not feasible
– Making the search practical:• Trace representation
• Avoid recompilation
• Efficient simulation
• Applications
• Summary
![Page 15: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/15.jpg)
15
Framework Application (1)• Application: an implementation of the
framework that searches in a sub-space of
the layout space.
• Field Reordering:
– Objective: reduce number of cache misses.
– Sub-space: all possible (legal) orders of fields in
(heap) objects.
– Our search strategy: (almost) exhaustive search.
![Page 16: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/16.jpg)
16
Field Reordering: Exhaustive Search
• We compared:
– Best field order found by our iterative search.
– Field orders produced by existing heuristics:
• Fields Temporal Affinity [ChilimbiPLDI’99]
• Fields Access Frequency [TruongPACT’98].
Miss Rate Reduction
-10.00%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
perl twolf boxsim
iteration affinity frequency
Runtime improvement: 0%-4.5%.
![Page 17: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/17.jpg)
17
Custom Memory Allocator (CMA)
A
B
APage 1
Page 2 B
A
time
address
A B APage 1
Page 2
B A
time
address
• Objective: reduce number of page faults.
Allocator 1 Allocator 2
Poor locality Good locality
• CMA can work well if it has a good placement function:assigns dynamically allocated heap objects to memory pages (heaps).
Reference trace: ABABA
![Page 18: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/18.jpg)
18
CMA Placement Function (PF)malloc(size s){
}
PF: Map objects to heapsPF(heap object)int
• How we can find a placement function using our framework?• A placement function defines a data layout.
• Learn by measuring the benefits of its data layout.• How: use a learning algorithm.
Learner PF(Attributes)int
Use Framework to Evaluate PF
Size
1 2
size<24size24
Decision Tree
Learner
Profiling InformationProfile(Heap objects)
runtime attributes
![Page 19: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/19.jpg)
19
CMA Results
Program Number of heaps
Espresso 2
Boxsim 8
Twolf 5
Perl 5
Ghostscript 10
Lp_solve 6
WS Size Reduction1
02468
1012141618
Esp
ress
o
Box
sim
Tw
olf
Per
l
Gho
stS
crip
t
lp_s
olve
Benchmark
Red
uct
ion
%
test input
WS Size Reduction1
0
5
10
15
20
Esp
ress
o
Bo
xsim
Tw
olf
Pe
rl
Gh
ost
Scr
ipt
lp_
solv
e
Re
du
cti
on
%
train input test input
1Relative to original working set size.
![Page 20: Framework for Profile-Analysis Data-Layout Optimizations](https://reader036.vdocuments.us/reader036/viewer/2022070406/56814224550346895dae39f5/html5/thumbnails/20.jpg)
20
Contributions and Future Work
• Formulate data layout optimization as a search process.
• Build a framework for efficient search process.
• Improve existing optimizations; enable new
optimizations.
• Framework limitations:– Difficult to handle very large traces (>0.5B references).
– Requires some guidance from the programmer (search strategy).
• Future work – Advanced search strategies that combine several optimizations.
– Other non-data-layout optimization – prefetching.