1 enterprise platforms group pinpointing representative portions of large intel itanium programs...

30
1 Enterprise nterprise P latforms latforms Group roup Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi Enterprise Platform Group Intel Corporation Presented at MICRO-37: Portland, OR, Dec. 6 th , 2004 IA32/EM64T/ IPF

Upload: dillan-horner

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

1

EEnterprise nterprise PPlatforms latforms GGrouproup

Pinpointing Representative Portions of Large Intel Itanium

Programs with Dynamic Instrumentation

Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew

Sun, Anand Karunanidhi

Enterprise Platform GroupIntel Corporation

Presented at MICRO-37: Portland, OR, Dec. 6th, 2004

IA32/EM64T/IPF

Page 2: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

2

EEnterprise nterprise PPlatforms latforms GGrouproup

Target: LARGE Applications

• With little/no manual intervention

• Within reasonable time

Goal: Accurate Performance Prediction

Page 3: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

3

EEnterprise nterprise PPlatforms latforms GGrouproup

Instruction Counts : Some Itanium Applications

# Instructions (billions)

142 373 463

3,979 3,994

4,932

SPECINT (average)

SPECFP (average)

RenderManmagic

Fluent L2

Amber rt

Ls-Dyna 3cars

Page 4: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

4

EEnterprise nterprise PPlatforms latforms GGrouproup

Whole-Program Simulation is Slow

Simulation Time in YEARS@ 10,000 Instructions/Second

0.4 1.2 1.5

12.6 12.715.6

SPECINT (average)

SPECFP (average)

RenderManmagic

Fluent L2

Amber rt

Ls-Dyna 3cars

Page 5: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

5

EEnterprise nterprise PPlatforms latforms GGrouproup

Solution: Select Simulation Points

• Manually• Randomly

– Anywhere– From uniform regions

• Fine-grain sampling (SMARTS: CMU)• By program-phase analysis

(SimPoint:UCSD, iPart: Intel/MRL)

Page 6: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

6

EEnterprise nterprise PPlatforms latforms GGrouproup

Running Commercial Applications on Simulators is Hard

• Resource Requirements: Disks etc.– Need to modify/re-configure the simulator

• OS dependencies– Need support for specific kernel and

device drivers

• License checking– Need special action

Page 7: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

7

EEnterprise nterprise PPlatforms latforms GGrouproup

Use PIN to select simulation points (PinPoints) and generate traces

PIN: A dynamic-instrumentation system+ A tool for writing tools+ No special compiler/linker flags required

Solution: Native Execution with Instrumentation

Page 8: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

8

EEnterprise nterprise PPlatforms latforms GGrouproup

PIN-Tools: Profiling, Trace Generation and more….

PIN-based profiler

Simulation Point

Selection

ProfilePinPoints

PIN-based Trace

Generator

PIN-based Branch

Predictor

Your Simulator Here

Page 9: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

9

EEnterprise nterprise PPlatforms latforms GGrouproup

Simulation Point Selection withSimPoint [UCSD]

Why SimPoint?

• Instrumentation based

• Microarchitecture independent

• Works well (results later)

Applied to multi-threaded programs

PIN-based profiler

SimPoint Tools

Basic BlockVectors PinPoints

Page 10: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

10

EEnterprise nterprise PPlatforms latforms GGrouproup

Multiple Sources of Error

Goal: Accurate Performance Prediction

Error Source: Phase detection

Error Source: Non-repeatability

Error Source: Warm-up, Modeling

PinPoints TracesSimulationStats (CPI)

Phase-detection is not enough!

Need Trace Generation and Simulation

Page 11: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

11

EEnterprise nterprise PPlatforms latforms GGrouproup

Main Contributions• A Toolkit that automatically:

– Profiles, finds phases/ simulation regions (PinPoints)

–Validates that PinPoints are representative

–Generates traces for simulators

Available for Itanium/IA32/EM64T

• Evaluations in a production environment

Page 12: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

12

EEnterprise nterprise PPlatforms latforms GGrouproup

The PinPoints Toolkit

PinPoints file

H/W counters-based Validation

(pfmon : ItaniumPAPI : IA32)

Compute CPI

Match?

Whole ProgramWeighted Sum

for PinPoints

Phase Detection+ PinPoint Selection

Trace Generation/Simulation

Page 13: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

13

EEnterprise nterprise PPlatforms latforms GGrouproup

EvaluationsApplications: Built w/ Intel’s compilers (high opt)

HPC: Fluent, AMBER, LS-Dyna, RenderMan SPEC2000: Processed 8-9 times

Test Configurations: Linux (RedHat)

Merced Itanium (1) 800 MHz L3: 2MB

McKinley Itanium-2 900 MHz L3: 1.5MB

Madison Itanium-2 1.3 GHz L3: 3-6 MB

Page 14: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

14

EEnterprise nterprise PPlatforms latforms GGrouproup

• PinPoints << 1% of program execution•Turnaround time (Traces) : Few days

PinPoints Generated

Program # Retired Instructions

(billions)

# PinPoints (250 million insts. EACH)

AMBER-rt 3,994 6

Fluent-m3 2,625 8

LS-DYNA 4,932 6

SPECINT2000(avg.) 142 4

SPECFP2000(avg.) 373 5

Page 15: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

15

EEnterprise nterprise PPlatforms latforms GGrouproup

Results: Overview• PinPoints: Whole-Program CPI prediction

(SPEC2000 and HPC applications):– Average CPI prediction error ~5%– PinPoints better than random selection

• Predicting speedup between microarchitectures

– PinPoints can be used to evaluate microarchitecture variations

• PinPoints Traces: Prediction of native SPEC2000 ratios

– INT within 8% FP within 3%More results in the paper

Page 16: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

16

EEnterprise nterprise PPlatforms latforms GGrouproup

0.1

0.6

1.1

1.6

2.1

CP

I

Whole_pgm_CPI

PinPoints_CPI

CPI: Actual vs. PredictedSPEC2000: Itanium-Madison

Page 17: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

17

EEnterprise nterprise PPlatforms latforms GGrouproup

SPEC2000 CPI PredictionAverage Error: Madison : 2.8%

Merced : 3.2% McKinley : 2.7%

0.1

0.6

1.1

1.6

2.1

CP

I

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% A

bs

(De

lta

in

CP

I)

%Delta

Whole_pgm_CPI

PinPoints_CPI

Page 18: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

18

EEnterprise nterprise PPlatforms latforms GGrouproup

HPC Applications CPI PredictionAverage Error: Madison : 5.0%

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

CP

I

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% A

bs(d

elta

CP

I)

%Delta

Whole_pgm_CPI PinPoints_CPI

Page 19: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

19

EEnterprise nterprise PPlatforms latforms GGrouproup

Cumulative Distribution of CPI Errors for SPEC2000

5%

15%

25%

35%

45%

55%

65%

75%

85%

95%

0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% 22% 24% 26% 28% 30%

CPI Error

% o

f R

un

s

PinPoints : N Points

Random: N Points

Uniform Random : N Points

Comparison With Random Selection[ 48 unique program runs ]

Page 20: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

20

EEnterprise nterprise PPlatforms latforms GGrouproup

Cumulative Distribution of CPI Errors for HPC apps.

5%

15%

25%

35%

45%

55%

65%

75%

85%

95%

0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% 22% 24% 26% 28% 30%

CPI Error

% o

f R

un

s

PinPoints : N Points

Random: N Points

Uniform Random : N Points

Comparison With Random Selection[ 18 unique program runs ]

Page 21: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

21

EEnterprise nterprise PPlatforms latforms GGrouproup

Speedup: Merced McKinleySPEC2000

0123456

Spe

edup

McKinley:Actual

Page 22: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

22

EEnterprise nterprise PPlatforms latforms GGrouproup

PinPoints Speedup Prediction: SPEC2000: Merced McKinley

0

1

2

3

4

5

6

Spe

edup

McKinley:Actual

McKinley:Predicted

Page 23: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

23

EEnterprise nterprise PPlatforms latforms GGrouproup

PinPoints: Speedup Prediction Across Multiple Microarchitectures

Same Binaries/PinPoints

0

1

2

3

4

5

6

Sp

eed

up

McKinley:ActualMcKinley:PredictedMadison:ActualMadison:Predicted

Page 24: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

24

EEnterprise nterprise PPlatforms latforms GGrouproup

Putting it All Together:From PinPoints to Projections

PinPoints TracesSimulationStats (CPI)

Does simulation of traces for PinPoints predict native performance?

Error Source: Phase detection

Error Source: Non-repeatability

Error Source: Warm-up, Modeling

Error: Cumulative

Page 25: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

25

EEnterprise nterprise PPlatforms latforms GGrouproup

CPI Prediction with SimulationSPEC2000: Itanium Madison

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

CPI

0%10%20%30%40%50%60%70%80%90%100%

Abs

(% D

elta

)

% Delta

Actual: Native Hardware

Simulated: PinPoints(traces)

Page 26: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

26

EEnterprise nterprise PPlatforms latforms GGrouproup

Native SPEC2000 Ratios[Spring 2004]

Itanium: Madison 1.5GHz/6MB L3

2075

1174

0

500

1000

1500

2000

2500

SPECfp SPECint

SP

EC

Rat

io

Page 27: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

27

EEnterprise nterprise PPlatforms latforms GGrouproup

Performance Prediction from PinPoints Traces

Itanium: Madison 1.5GHz/6MB L3

2075

1174

2126

1270

0

500

1000

1500

2000

2500

SPECfp SPECint

SP

EC

Rat

io

Native Simulated

Page 28: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

28

EEnterprise nterprise PPlatforms latforms GGrouproup

Summary

PinPoints toolkit : Automatic simulation region selection, tracing, and validation

Dynamic instrumentation (PIN ) LARGE programs

• PinPoints: << 1% of executionCapture whole-program CPI– Average error < 5% for SPEC2000, HPC apps.– Better than random selection

• PinPoints traces: Predict SPEC2000 Ratios– INT within 8% FP within 3%

Page 29: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

29

EEnterprise nterprise PPlatforms latforms GGrouproup

Try it out!

(PIN + PinPoints) toolkit :

http://rogue.colorado.edu/Pin

New

Page 30: 1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

30

EEnterprise nterprise PPlatforms latforms GGrouproup

Backup: Simulator Warm-up• Strategy 1: Large slice-size (250 million

instructions)– Too coarse-grain for phase detection– Too much simulation time

• Strategy 2: 7 warm-up traces per simulation trace (30 million instructions)

Art (SPECFP2000): First pinpoint touches most of the working set– Simulate all pinpoint traces in

succession