the return of synthetic benchmarks ajay m. joshi (ut austin) lieven eeckhout (ghent university) lizy...
TRANSCRIPT
![Page 1: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/1.jpg)
The Return of Synthetic Benchmarks
Ajay M. Joshi (UT Austin)
Lieven Eeckhout (Ghent University)
Lizy K. John (UT Austin)
Laboratory of Computer ArchitectureDepartment of Electrical & Computer
EngineeringThe University of Texas at Austin
January 28, 2008
![Page 2: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/2.jpg)
2
The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis
Workload Characteristics Used in Synthesis
Synthetic Benchmark Construction
Evaluation of BenchMaker
Applications
Summary
Outline
![Page 3: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/3.jpg)
3
Benchmark Spectrum
Toy Benchmarkse.g. Heap sort
Microbenchmarkse.g. STREAM
Kernel Codese.g. Livermore Loops
Application Suitese.g. SPEC CPU
Complete Application Code
Less Development Effort
More Scalable
More Maintainable
Less Representative
More Development Effort
Less Scalable
Less Maintainable
More Representative
Synthetic Benchmarkse.g. Dhrystone, Whetstone
![Page 4: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/4.jpg)
4
Benchmark Subsetting
[Eeckhout et al., PACT’02]
[Vandierendonck et al., CAECW’04]
[Phansalkar et al., ISPASS’05]
[Eeckhout et al. IISWC’05]
• Statistical Sampling [Conte et al., ICCD’96 ] [Wunderlich et al., ISCA’03]
• Representative Sampling [Sherwood et al., ASPLOS’02]
• Reduced Input Set [ KleinOsowski, CAN’04]
• Statistical Simulation & Synthetic Workloads [Oskin et al., ISCA’00] [ Eeckhout et al., ISPASS’00] [Nussbaum et al., PACT’01] [Bell et al., ICS’05]
• Analytical Modeling [Noonburg et al., MICRO’94] [Karkhanis et al., ISCA’04]
• Speedup Simulation [Schnarr et al., ASPLOS’98] [Loh et al., SIGMETRICS’01]
Ben
chm
ark
Exp
losi
on
Benchmark Run Length
Micropro
cessor
Complexity
Focus on Simulation Time Reduction
![Page 5: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/5.jpg)
5
Using Real-World Applications as Benchmarks
Proprietary Nature of Real-World Applications
Single-Point Performance Characterization
Application Benchmarks are Rigid
Applications Evolve Faster than Benchmarks
Benchmark Suites are Costly to Develop, Maintain, and Upgrade
Studying Commercial Workload Performance
Early Design Stage Power/Performance Studies
Motivation : Benchmarking Challenges
Usefulness of Synthetic Benchmarks Beyond Simulation Time Reduction
![Page 6: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/6.jpg)
6
Resurgence of Synthetic Benchmarks…..
IEEE Computer, August 2003
![Page 7: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/7.jpg)
7
The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis
Workload Characteristics Used in Synthesis
Synthetic Benchmark Construction
Evaluation of BenchMaker
Applications
Summary
Outline
![Page 8: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/8.jpg)
8
Workload Synthesis: Central Idea
Workload Synthesizer
Inst
ruct
ion
Leve
l Pa
ralle
lism
Prog
ram
Loc
ality
Inst
ruct
ion
Mix
Con
trol
Flo
w
Beh
avio
r
ADD R1, R2,R3LD R4, R1, R6MUL R3, R6, R7 ADD R3, R2, R5DIV R10, R2, R1SUB R3, R5, R6
STORE R3, R10, R20ADD R1, R2,R3LD R4, R1, R6MUL R3, R6, R7 ADD R3, R2, R5DIV R10, R2, R1SUB R3, R5, R1BEQ R3, R6, LOOPSUB R3, R5, R6
STORE R3, R10, R20DIV R10, R2, R1
………….
Application Behavior Space
‘Knobs’ for Changing Program
Characteristcs
Workload Synthesis Algorithm
Synthetic Benchmark
Execution Driven Simulator
Real Hardware or RTL
Compile and Execute
Just 40 workload characteristics
![Page 9: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/9.jpg)
9
Modeling Real-World Applications
Real Hardware
ExecutionDriven
Simulator
Real World Proprietary Workload
Synthetic Benchmark
Clone
Workload ProfilerBinary Instrumentation OR
Simulation
WorkloadSynthesizer
Workload Profile =
Workload Attributes
+DistributionOf Attribute
Values
Modeling Workload Attributes into Synthetic Workload
Experiment Environment
Microarchitecture-Independent Workload Profiling
![Page 10: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/10.jpg)
10
The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis
Workload Characteristics Used in Synthesis
Synthetic Benchmark Construction
Evaluation of BenchMaker
Applications
Summary
Outline
![Page 11: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/11.jpg)
11
Workload Characteristics as ‘Knobs’Category Num. Characteristic
instruction mix 10 percentage of integer short latencypercentage of integer long latencypercentage of floating-point short latencypercentage of floating-point long latencypercentage of integer loadpercentage of integer storepercentage of floating-point loadpercentage of floating-point storepercentage of branches
Instruction-level parallelism
8 register-dependency-distance – 8 distributions for register dependencies. Register dependency distance equal to 1 instruction, and the percentage of dependency dependencies that have a distance of up to 2, 4, 6, 8, 16, 32, and greater than 32 instructions.
data locality 110
data footprintdistribution of local stride values
instruction locality 1 instruction footprint
branch predictability 10 distribution of branch transition rate
![Page 12: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/12.jpg)
12
Attributes to capture inherent workload behavior
– Data Locality: Dominant strides of static Load/Store
– Control Flow Predictability: Branch transition rate
Modeling Locality & Control Flow Predictability
– Data Locality of Integer, Scientific, and Embedded
Workloads effectively modeled using circular streams
– Replicating transition-rate of static branches
Capturing The Essence of Workloads
![Page 13: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/13.jpg)
13
Modeling Data Access Pattern
• Identify streams of data references
• A Stream? – Sequence of memory addresses in an arithmetic progression – Elements of arrays A, B, and C form 3 streams
for( ii = 0; ii < N; ii ++)
A [ii] = B [ii] + C [ii]
200, 204, 208 .. 320, 324, 328 .. 404, 408, 412 ...
Issuing Sequence : 320, 404, 200, 324, 408, 204 ….
• Streams are interleaved and may contain noise 4, 8, 12, 16, 1, 3, 20, 24, 5, 7, 2, 9, 11, 28 …
![Page 14: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/14.jpg)
14
Reference pattern of static Load / Store Instructions– PC-correlated spatial locality - Dependence on address referenced by nearby Ld / St
- Programs with pointer chasing codes
– PC-correlated temporal locality - Dependence on previous address generated by same Ld / St
- Programs with multidimensional arrays
Could static Load / Store instructions be natural sources of streams ?
Profile every static Load / Store instruction – Number of different strides with which it accesses data
Extracting Streams
![Page 15: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/15.jpg)
15
Dependency Distance
ADD R1, R3,R4
MUL R5,R3,R2 ADD R5,R3,R6
LD R4, (R1) SUB R8,R2,R1
Measure Distribution of Dependency Distances
Upto 1, Upto 2, Upto 4, Upto 8, Upto 16, Upto 32, >32
Read After Write Dependency Distance = 3
Modeling Instruction Level Parallelism
![Page 16: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/16.jpg)
16
Capture behavior of easy and difficult to predict branches
Inherent program feature that captures branch behavior
Transition Rate [ Haungs et al. HPCA’00 ] # of Taken-Not Taken transitions / # of times executed
Branches with low transition-rate (easier to predict)TTTTTTTTTN, NNNNNNNNNT
Branches with high transition-rate (easier to predict)TNTNTNTNTN
Branches with moderate transition-rate (tougher to predict)
Modeling Control Flow Predictability
![Page 17: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/17.jpg)
17
The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis
Workload Characteristics Used in Synthesis
Synthetic Benchmark Construction
Evaluation of BenchMaker
Applications
Summary
Outline
![Page 18: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/18.jpg)
18
Workload Profile
Instruction Mix
Register Dependency Distance
Stride Pattern of Load/Store
Branch Transition Rate
Branch Transition Probabilities
C
A
B
D
BR
BRBR
BR
0.8 0.2
1.0 1.0
0.90.1
Synthetic Clone Generation
1 Big Loop
Workload Synthesis (1)
A
B
D
A
B
D
A
C
D
A
B
D
![Page 19: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/19.jpg)
19
Workload Profile
Instruction Mix
Register Dependency Distance
Stride Pattern of Load/Store
Branch Transition Rate
Branch Transition Probabilities
C
A
B
D
BR
BRBR
BR
0.8 0.2
1.0 1.0
0.90.1
Synthetic Clone Generation
1 Big Loop
Workload Synthesis (2)
A
B
D
A
B
D
A
C
D
A
B
D
Memory Access Model (Strides)
![Page 20: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/20.jpg)
20
Workload Profile
Instruction Mix
Register Dependency Distance
Stride Pattern of Load/Store
Branch Transition Rate
Branch Transition Probabilities
C
A
B
D
BR
BRBR
BR
0.8 0.2
1.0 1.0
0.90.1
Synthetic Clone Generation
1 Big Loop
Workload Synthesis (3)
A
B
D
A
B
D
A
C
D
A
B
D
Memory Access Model (Strides)
Branching Model – Based on Transition Rate
![Page 21: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/21.jpg)
21
Workload Profile
Instruction Mix
Register Dependency Distance
Stride Pattern of Load/Store
Branch Transition Rate
Branch Transition Probabilities
C
A
B
D
BR
BRBR
BR
0.8 0.2
1.0 1.0
0.90.1
Synthetic Clone Generation
1 Big Loop
Workload Synthesis (4)
A
B
D
A
B
D
A
C
D
A
B
D
Memory Access Model (Strides)
Branching Model – Based on Transition Rate
Register Assignment C code with asm & volatile constructs
![Page 22: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/22.jpg)
22
The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis
Workload Characteristics Used in Synthesis
Synthetic Benchmark Construction
Evaluation of BenchMaker
Applications
Summary
Outline
![Page 23: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/23.jpg)
23
Evaluation of BenchMaker SPEC CPU2000, SPECjbb2005, and DBT2 workloads Validated Sim-Alpha Performance Model of Alpha 21264
Benchmark Input SimPoint(s)
SPEC CPU2000 Integer
bzip2 graphic 553
crafty ref 774
eon rushmeier 403
gcc 166.i 389
gzip graphic 389
mcf ref 553
perlbmk perfect-ref 5
twolf ref 1066
vortex lendian1 271
vpr route 476
gcc expr 8, 24, 47, 51, 56, 73, 87, 99
SPEC CPU95 Integer
gcc expr 0, 3,5,6,7,8,9,10,12
![Page 24: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/24.jpg)
24
Performance Correlation
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8bz
ip2
craf
ty
gcc
gzip
mcf
perlb
mk
twol
f
vorte
x
vpr
dbt2
dbm
s
SP
EC
jbb2
005
Inst
ruct
ions
-Per
-Cyc
le
Original Benchmark Synthetic Benchmark
Trade Accuracy for Flexibility – Average Error of 11%
![Page 25: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/25.jpg)
25
Energy/Power Correlation
0
5
10
15
20
25
30
35bz
ip2
craf
ty
gcc
gzip
mcf
perlb
mk
twol
f
vorte
x
vpr
dbt2
dbm
s
SP
EC
jbb2
005
Ene
rgy-
Per
-Inst
ruct
ion
Original Benchmark Synthetic Benchmark
Average Error of 13%
![Page 26: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/26.jpg)
26
The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis
Workload Characteristics Used in Synthesis
Synthetic Benchmark Construction
Evaluation of BenchMaker
Applications
Summary
Outline
![Page 27: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/27.jpg)
27
Altering Individual Program Characteristics
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 10 20 30 40 50 60 66 70 80 90 100
Percentage of References with Stride Value 0
Instr
ucti
on
s-P
er-
Cycl
e
![Page 28: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/28.jpg)
28
Interaction of Program Characteristics
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 10 20 30 40 50 60 66 70 80 90 100Percentage of references w ith Stride Value 0
L1
D-c
ach
e M
iss-
Rat
e
Data Footprint - 600K Data Footprint - 300KData Footprint - 900K
![Page 29: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/29.jpg)
29
Modeling Impact of Benchmark Drift
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8
Factor by which code size is increased
Inst
ruct
ion
s-P
er-C
ycle
Increase in Data Footprint from SPEC CPU95 to SPEC CPU2000 for gcc (Model with 7% accuracy)
Increase in Code Footprint (hypothetical)
![Page 30: The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture](https://reader035.vdocuments.us/reader035/viewer/2022070305/5514ffdb55034693478b64ba/html5/thumbnails/30.jpg)
30
Summary Synthetic Benchmarks to Address Benchmarking Challenges
Constructing Synthetic Benchmarks from Hardware-Independent Characteristics
Applications of Synthetic Benchmarks
- Altering Program Characteristics
- Studying Interaction of Program Characteristics
- Modeling Benchmark Drift