workload characterization: can it save computer architecture and performance evaluation? ·...

59
Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization: Can it save Computer Architecture and Performance Evaluation? Lizy Kurian John The Laboratory for Computer Architecture (LCA) ECE Department, UT Austin [email protected] (512)-232-1455 http://www.ece.texas.edu/projects/ece/lca/ http://www.ece.utexas.edu/~ljohn

Upload: others

Post on 09-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Workload Characterization: Can it save Computer Architecture and Performance

Evaluation?Lizy Kurian John

The Laboratory for Computer Architecture (LCA)ECE Department, UT Austin

[email protected](512)-232-1455

http://www.ece.texas.edu/projects/ece/lca/http://www.ece.utexas.edu/~ljohn

Page 2: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

This presentation includes work by several of my students,

especially• Deepu Talla• Ramesh Radhakrishnan• Tao Li• Yue Luo• Rob Bell Jr• Aashish Phansalkar

Page 3: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Basic Belief

• There are bottlenecks that exist in modern computer systems, which if precisely unveiled, will lead to appropriate architectures and architectural enhancements.

Page 4: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Media Workload Characterization Example –Study on effectiveness of MMX- using Discrete

Cosine Transform (DCT)

Pentium II without MMX Pentium II with MMX

Clocks Eff. Comp. Clocks Eff. Comp.

Maximum compiler optimizations 3500 0.15 2375 0.24 (6%)

Ideal case ≈512 1 ≈128 4 (100%)

Page 5: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Media Workload Characterization Example –Study on effectiveness of MMX- using Discrete

Cosine Transform (DCT)

Pentium II without MMX Pentium II with MMX

Clocks IPC Eff. Comp. Clocks IPC Eff. Comp.

Maximum compileroptimizations 3500 1.47 0.15 2375 1.04 0.24 (6%)

Perfect memorySystem (prefetching) 2737 1.88 0.20 1578 1.56 0.36 (9%)

Ideal case ≈512 - 1 ≈128 - 4 (100%)

Page 6: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Only the instructions shown in red are MMX computations. All other instructions are simply supporting these computations.

Pentium III – SIMD code for Discrete Cosine Transform (DCT)

lea ebx, DWORD PTR [ebp+128] load/address overheadmov DWORD PTR [esp+28], ebx load/address overhead$B1$2:xor eax, eax address overheadmove dx, ecx address overheadlea edi, DWORD PTR [ecx+16] load/address overheadmov DWORD PTR [esp+24], ecx load/address overhead$B1$3:movq mm1, MMWORD PTR [ebp] load overheadpxor mm0, mm0 initialization overheadpmaddwd mm1, MMWORD PTR [eax+esi] True Computationmovq mm2, MMWORD PTR [ebp+8] load overheadpmaddwd mm2, MMWORD PTR [eax+esi+8] True Computationadd eax, 16address overheadpaddw mm1, mm0 True Computationpaddw mm2, mm1 True Computationmovq mm0, mm2 load related overheadpsrlq mm2, 32 SIMD reduction overheadpovd ecx, mm0 SIMD load overheadmovd ebx, mm2 SIMD load overheadadd ecx, ebx SIMD conversion Overheadmov WORD PTR [edx], cx store overheadadd edx, 2 address overheadcmp edi, edx branch related overheadjg $B1$3 loop branch overhead$B1$4:move cx, DWORD PTR [esp+24] load/address overheadadd ebp, 16 address overheadadd ecx, 16 address overheadmove ax, DWORD PTR [esp+28] load/address overheadcmp eax, ebp branch related overheadjg $B1$2 loop branch overhead

Page 7: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Breakdown of dynamic instructions

0%

20%

40%

60%

80%

100%

cfa-PIII cfa-SS dct-PIII dct-SS scale-PIII scale-SS motest-PIII

motest-SS

aud-PIII aud-SS g711-PIII g711-SS

memory branch integer SIMD-overhead SIMD-computation

• Approximately 75%-85% of dynamic instructions are supporting

Page 8: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

The MediaBreeze architecture focuses on the parallelism in the supporting instructions rather than the actual media computations.

L1 D-cacheSIMD

computationunit

Addressgeneration

units

Addressgeneration

units

Load/Storeunits

Breeze InstructionBuffer

Hardwarelooping

Hardwarelooping

Instructionstream

InstructionDecoder

Non-SIMD pipeline

BreezeInstructionInterpreter

BreezeInstructionInterpreter

Starting address ofBreeze instruction

Normalsuperscalarexecution

L2 cache

Main memory

SIMD pipeline

IS-1

IS-2

IS-3

OS

IS - input stream

OS - output stream

Overhead

Useful computations

new hardware

existing hardware useddifferently

Data reorg. /Address trans.

Data Station

Page 9: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Performance of MediaBreeze

• The MediaBreeze architecture gives up to 27X performance on DSP kernels and up to 2X on media applications over the best SIMD performance.

Page 10: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Area, power, and timing implications of MediaBreeze

• Area – 0.31 mm2 (overall chip area increase is 0.3%)

• Power – 430 mW at 1 GHz (less than 1% of the overall processor power

• Timing – Overall pipeline depth is not increased.

Page 11: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Simple Solutions

Elegant Solutions

Page 12: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Power Aware Adaptive Architectures

Detects phases during program execution and adapts hardware characteristics to suit features of the phase

Page 13: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

OS Energy DissipationOS Energy Dissipation

0%10%20%30%40%50%60%

pmake

SPECInt.gcc

SPECInt.vorte

xse

ndmail

fileman

Java

.dbJa

va.je

ssJa

va.ja

vac

Java

.jack

Java

.mtrt

Java

.compres

sDBMS.se

lect

DBMS.update

DBMS.join

% of OS Cycles% of OS Energy

92% 89%

Page 14: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

OS Power & Performance TradeoffOS Power & Performance Tradeoff

0

0.2

0.4

0.6

0.8

pmake

gccvo

rtex

sendm

ailfile

man dbjes

sjav

ac jack

postgre

s.sele

ct

postgre

s.upd

ateosb

ootAVGN

orm

aliz

ed E

nerg

y.D

elay

Sampling based Adaptation (Window Size: 2048-cycle)Sampling based Adaptation (Window Size: 128-cycle)Routine based Adaptation

Page 15: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Java Acceleration for General Purpose Processors

• What’s the biggest bottleneck to alleviate?• Object-oriented nature?• Translation• Hardware Translator for Java Bytecodes

Page 16: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

The Java Hardware Interpreter

Java class file

Native executable

Fetchbytecode translator Decode Execute

bytecodes

Native machine instructions

• No changes to processor core– light-weight Java run time environment

Page 17: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

HardInt Performance4-way performance

44.8

109.

3 149.

7

934.

1

911.

7

60.4

135.

9

85.2 12

7.7

492.

2

71.0

133.

7

221.

5

989.

4

867.

8

59.8

108.

8 146.

2

146.

1

321.

9

16.0

27.7

28.8

250.

2

120.

0

0

50

100

150

200

250

300

350

400

db javac jess mpeg mtrt

ex

ecu

tio

n c

ycle

s (

millio

ns)

JDK 1.1.6 Interpreter JDK 1.1.6 JIT JDK 1.2 Interpreter JDK 1.2 JIT Hard-Int

• Hard-Int performs consistently better than the interpreter

• In JIT mode, significant performance boost in 4 of 5 applications.

Page 18: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Simple Solutions

Elegant Solutions

Page 19: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Architect’s Treasure Chest

• Perhaps our treasure chest contains simple solutions to most problems we will encounter

• We need to be able to identify which is the right solution for the right problem

• Diagnosing the problem is the issue• Workload characterization is the key to this

diagnosis

Page 20: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Are computer architects becoming like doctors

prescribing medicines without diagnosing the disease?

Are we getting too excited with all the wonderful things a new medicine

can do?

Page 21: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Performance Evaluation

Page 22: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Page 23: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

If cars were benchmarked like computers

• Mileage chart might have looked like

28.3 mpgI-7527.5 mpgI-9526.6 mpgI-3524.2 mpgI-9025.3 mpgI-8028.1 mpgI-2027.2 mpgI-10

Page 24: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

International Routes

28 mpgJapan’s Hwy 14230 mpgAutoBahn24 mpgMadrid’s M-3024 mpgI-9025 mpgI-8028 mpgI-2027 mpgI-10

Page 25: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

And we would have asked questions like

• Did you drive on I-80 in the summer or winter?

• Was it night or day?• When you drove through Austin on I-35,

was a Longhorn football game going on?

Page 26: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

And in our car conferences, we would have accepted papers that

• Benchmarking results on most number of highways

• Benchmarked from end to end• Benchmarked on highways in multiple

weather conditions

Page 27: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Imagine running cars on

1790 milesI-75 (FL to MI)1920 milesI-95 (Maine-FL)1570 milesI-35 (TX to MN)3020 milesI-90 (WA to MA)2900 milesI-80 (CA to NJ)1540 milesI-20 (TX to SC)

2460 milesI-10 (CA to FL)

Page 28: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Abstracting these long roads

• CITY• HIGHWAY

Page 29: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

If we look at our computer performance evaluation trends,

aren’t we simply adding roads to the list?

Page 30: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Reducing Redundancy in Benchmarking

• SimPoint [Sherwood et. al]• SMARTS [Wunderlich et. al]• Benchmark clustering using PCA Analysis

[Eeckhout et. al]

Page 31: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

SimPoint

• Sherwood, et al. ASPLOS 2002• Sample selection:

– Clustering analysis of Basic Block Vectors to identify representative chunks of instructions

• Sampling unit size: 100 million instructions• Sample size: 3-10• Warm up: No explicit warm-up

Page 32: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

SMARTS

• Wunderlich, et al. ISCA 2003• Sample selection:

– Selecting chunks evenly distributed in the instruction stream (systematic sampling)

• Sampling unit size: 1,000 instructions• Sample size:

– Depends on confidence interval requirement– Thousands to tens of thousands

Page 33: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

SIMPOINT

• Analogous to realizing that no need to go all over I-10 from California to Florida, if 10 miles around Phoenix, and 10 miles from San Antonio and 10 miles from El Paso are 10 miles from the New Mexico desert are taken, that’s sufficient.

Page 34: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

SMARTS

• Randomly picking some miles from anywhere will do.

• No need for representative sampling

Page 35: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Cluster analysis

• Linkage clustering

• K-means clustering• Iterative algorithm• Based on distance

between program-input pairs = linkage distance

Page 36: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Rescaled PCA space: PC1 vs PC2 [Eeckhout]

compress

ijpeg

go

gcc

m88ksim

vortex

perlbmkxlisp

TPC-D

PC

2:

hig

h I

LP a

nd

low

bra

nch

pre

dic

tion a

ccura

cy

PC1: fewer branches and low I-cache miss rates

Page 37: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Dendrogram to select representatives [Eeckhout]

Page 38: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Cluster Analysis and PCA [Eeckhout]

• Analogous to realizing that I-10 and I-20 are very similar kinds of roads. Similarly I-80 and I-90 are very similar.

Page 39: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Plot for the scores of L1 cache access behavior of SPECint2000 and SPECjvm98 benchmark suites

-6

-4

-2

0

2

4

6

8

-20 -15 -10 -5 0 5 10 15

PC1

PC2

SPECint2000SPECjvm98jvm.compress

bzip2

gzip

jvm98

gcc

Page 40: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

The Return of Synthetic Benchmarks?

A framework to generate synthetic benchmarks that are:

•Representative of applications or user specifications

•Automatically generated

•Generated and executed using user parameters

•Source code and executables

•Portable to multiple hardware & simulation systems

Page 41: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Statistical Simulation

•Statistical Simulation using Synthetic Traces•Carl and Smith

•Nussbaum and Smith

•Oskin et al.: HLS

•Eeckhout et al.

•Executable code built from the workload characterization of well-correlated statistical simulation systems

•Automatic Benchmark Synthesis

Page 42: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Synthetic Traces in Statistical Simulation

1. Collect global statistics• Basic block size

• Instruction Mix

• Instruction Dependencies

• Branch predictability

• L1/L2 cache statistics

2. Generate basic blocks

3. Connect them together into a graph (HLS) or generate a trace

4. Execute in order, simulating cache misses and branch mispredicts

Page 43: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Improving Correlation

Track information on a per basic block basis•Basic block size

•Instruction sequences

•Merged dependency information

•Cache hit/miss information

•Branch predictability

Page 44: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Summary by Benchmark Class

05

10152025303540

overa

ll

spec

int

spec

fp

tech

overa

ll_ds

p32sp

ecint_ds

p32sp

ecfp_

dsp32

tech_d

sp32

overa

ll_bp

1sp

ecint_bp

1sp

ecfp_

bp1

tech_b

p1

HLS errorBasic Blk error

Tracking characteristics on a basic block granularity reduces error

Page 45: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Adding Structure: BB Maps

0

5

10

15

20

25

30

35

40

45

sdot_ssum1 sdot_sfill sscale_ssum2 ssum2_sfill scopy_sdot ssum2_sdot avg

HLS errorBasic Blk errorBB Map

•Multi-phase programs can benefit from simulation using a graph of basic block frequencies

•Programs are back-to-back two-loop combinations of the technical loops

Page 46: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

IPC from Synthetic Trace

0

0.5

1

1.5

2

2.5

gcc perl m88ksim ijpeg vortex compress go li

SS IPCSynthetic IPC

Converted workload characteristics to C-code/ASM statements and run through SimpleScalar

Page 47: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Use of statistical theory?

We architects – are we unwilling to use statistical theory?

Page 48: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Normalized Standard Deviation (crafty)

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50Normalized number of sampled instructions (j)

larger chunkoriginal chunk size

0.32

0.89

Page 49: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

SimPoint

• Sherwood, et al. ASPLOS 2002• Sample selection:

– Clustering analysis of Basic Block Vectors to identify representative chunks of instructions

• Sampling unit size: 100 million instructions• Sample size: 3-10• Warm up: No explicit warm-up

Page 50: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

SMARTS

• Wunderlich, et al. ISCA 2003• Sample selection:

– Selecting chunks evenly distributed in the instruction stream (systematic sampling)

• Sampling unit size: 1,000 instructions• Sample size:

– Depends on confidence interval requirement– Thousands to tens of thousands

Page 51: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Normalized Standard Deviation (crafty)

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50Normalized number of sampled instructions (j)

larger chunkoriginal chunk size

0.32

0.89

Page 52: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Estimating speedupRequired sample size

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

art equake lucas bzip2-source

gcc-166 vpr-route gzip-random

vortex-1

Sam

ple

Size

8-way cpi16-way cpispeedup

Sample size required to achieve 2% relative error at 95% confidence level

Page 53: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

(Comparing reduced data set:Test of population median

• If the populations are the same, the population median/mean should be the same

• Wilcoxon signed rank testMetrics Reduced data set p-value

Test 0.06445 Train 0.02734

CPI on 8-way machine

MinneSPEC 0.04883 Test 0.03711 Train 0.01953

CPI on 16-way machine

MinneSPEC 0.03711 Test 1Train 0.375

Speedup (16-way vs. 8-way)

MinneSPEC 0.6953

Page 54: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Comparing reduced data set:Test result

• None of the reduced data sets have the same population median CPI with reference data set

• All reduced data sets show same median speedup as the reference data set

Page 55: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Future Workloads

Life, Death and Games

Page 56: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Future Workloads – Life, Death and Games

• Life Science - Pharmaceuticals, Drug Discovery

• Death Science – Military Applications, Weapon Simulation, Crash Analysis, Scientific Computing

• Games – Multiplayer, Natural Language Recognition/Semantic Analysis

Page 57: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Risk of missing truly innovative architectures?

Inventions in Search of Applications eg: Laser

If workload characterization can lead to synthetic workloads of the future, endless future workloads can be synthesized that may lead to truly innovative architectures

Page 58: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Workload Characterization: Can it save Computer Architecture and Performance

Evaluation?

Possibly Yes.

Nothing else possibly can.

Page 59: Workload Characterization: Can it save Computer Architecture and Performance Evaluation? · 2005-02-02 · Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7 Workload Characterization:

Feb 15, 2004 Lizy Kurian John, LCA, UT Austin, CAECW-7

Task is on us, the workload characterization community

We need to abstract program behavior into essential attributes

which can help new architectures and performance evaluation