automatic detection of performance anomalies in task ...€¦ · i post-mortem (o ine) analysis...

74
Automatic Detection of Performance Anomalies in Task-Parallel Programs Work in progress on the Aftermath trace analysis tool Andi Drebes Universit´ e Pierre et Marie Curie Laboratoire d’Informatique de Paris VI [email protected] Joint work with: Antoniu Pop, Karine Heydemann Albert Cohen, Nathalie Drach RACING’14, May 30th, 2014 Open tream

Upload: others

Post on 07-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Automatic Detection of Performance Anomalies inTask-Parallel Programs

Work in progress on the Aftermath trace analysis tool

Andi Drebes

Universite Pierre et Marie CurieLaboratoire d’Informatique de Paris VI

[email protected]

Joint work with:Antoniu Pop, Karine Heydemann

Albert Cohen, Nathalie Drach

RACING’14, May 30th, 2014

Open tream

Page 2: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Context

Hardware and software environment

I Multi-core / many-core systems

I How to exploit the hardware efficiently?

I Task-parallel languages based on fine-grained tasks

Performance debugging

I Requires analysis of complex interactions at execution time:Application / Run-time / Machine

I Possible solution: Record dynamic events to a trace file

I Post-mortem (offline) analysis

Aftermath

I Trace visualization and support for manual analysis

I Originally developed for OpenStream language & run-time

I Work in progress: Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 1 / 16

Page 3: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Context

Hardware and software environment

I Multi-core / many-core systems

I How to exploit the hardware efficiently?

I Task-parallel languages based on fine-grained tasks

Performance debugging

I Requires analysis of complex interactions at execution time:Application / Run-time / Machine

I Possible solution: Record dynamic events to a trace file

I Post-mortem (offline) analysis

Aftermath

I Trace visualization and support for manual analysis

I Originally developed for OpenStream language & run-time

I Work in progress: Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 1 / 16

Page 4: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Context

Hardware and software environment

I Multi-core / many-core systems

I How to exploit the hardware efficiently?

I Task-parallel languages based on fine-grained tasks

Performance debugging

I Requires analysis of complex interactions at execution time:Application / Run-time / Machine

I Possible solution: Record dynamic events to a trace file

I Post-mortem (offline) analysis

Aftermath

I Trace visualization and support for manual analysis

I Originally developed for OpenStream language & run-time

I Work in progress: Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 1 / 16

Page 5: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Outline

1. Aftermath

2. Insufficient parallelism and its causes

3. Performance anomalies during task execution

4. Work in progress: Status

5. Summary & Questions

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs

Page 6: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 7: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

TimelineTimeline

Fil

ters

Fil

ters

Detailed text viewDetailed text view

Sta

tist

ics

Sta

tist

ics

Menu bar: derived metricsMenu bar: derived metrics

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 8: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 9: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Time

Procesors

Activityduringexecution

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 10: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 11: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Task execution (dark blue)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 12: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Task Task Task

Instance Instance Instance

Task execution (dark blue)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 13: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Task creation (white)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 14: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Searching for work (light blue)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 15: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Basic statistics for run-time states

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 16: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

Heatmap indicating task duration (white: fast, red: slow)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 17: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Aftermath

NUMA heatmap indicating locality of memory accesses(blue: local, pink: remote)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Page 18: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Navigating through a trace

I Huge amounts of high-dimensional data

I Lots of features → lots of possibilities for analysis

I Where to look? What to look for?

I Expertise & lots of time required

Guide the user through performance analysisI Refine analysis in several steps

I Start by analyzing parallelismI Analyze what happens inside tasks

I Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 3 / 16

Page 19: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Navigating through a trace

I Huge amounts of high-dimensional data

I Lots of features → lots of possibilities for analysis

I Where to look? What to look for?

I Expertise & lots of time required

Guide the user through performance analysisI Refine analysis in several steps

I Start by analyzing parallelismI Analyze what happens inside tasks

I Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 3 / 16

Page 20: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Detecting insufficient parallelism

P0

Time

P1

P2

Pn-1

...

...

Task execution

Other states

Ideal situationAll CPUs are in task executionstate without any interruption

P0

Time

P1

P2

Pn-1...

...

Task execution

Other states

Realistic scenarioTask creation, memoryallocation, idle time,over-synchronization, . . .

Detect insufficient parallelism / high overhead automatically

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 4 / 16

Page 21: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Detecting insufficient parallelism

P0

Time

P1

P2

Pn-1

...

...

Task execution

Other states

Ideal situationAll CPUs are in task executionstate without any interruption

P0

Time

P1

P2

Pn-1...

...

Task execution

Other states

Realistic scenarioTask creation, memoryallocation, idle time,over-synchronization, . . .

Detect insufficient parallelism / high overhead automatically

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 4 / 16

Page 22: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Detecting insufficient parallelism

P0

Time

P1

P2

Pn-1

...

...

Task execution

Other states

Ideal situationAll CPUs are in task executionstate without any interruption

P0

Time

P1

P2

Pn-1...

...

Task execution

Other states

Realistic scenarioTask creation, memoryallocation, idle time,over-synchronization, . . .

Detect insufficient parallelism / high overhead automatically

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 4 / 16

Page 23: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

d : Duration of the intervalde,i : Time that processor i spends in task execution statete : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Page 24: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

d : Duration of the interval

de,i : Time that processor i spends in task execution statete : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Page 25: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

de,0+ + + +

d : Duration of the intervalde,i : Time that processor i spends in task execution state

te : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Page 26: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

de,1+

d : Duration of the intervalde,i : Time that processor i spends in task execution state

te : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Page 27: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

de,2+ + + +

d : Duration of the intervalde,i : Time that processor i spends in task execution state

te : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Page 28: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

de,n-1+

d : Duration of the intervalde,i : Time that processor i spends in task execution state

te : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Page 29: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

d : Duration of the intervalde,i : Time that processor i spends in task execution statete : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Page 30: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

d : Duration of the intervalde,i : Time that processor i spends in task execution statete : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Page 31: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Detecting the cause of insufficient parallelism

Multiple stages during analysis

I If inequation does not hold, find out why

I Possible causes: task creation overhead, memory allocation,not enough tasks available for execution, . . .

I Use thresholds for associated states:tc (task creation), ti (idle time)

1 2 3 4 5 6 7 8 9 10

Interval selection

I Multiple intervals: initialization, termination, etc.

I Repeat analysis for different intervals

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 6 / 16

Page 32: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Detecting the cause of insufficient parallelism

Multiple stages during analysis

I If inequation does not hold, find out why

I Possible causes: task creation overhead, memory allocation,not enough tasks available for execution, . . .

I Use thresholds for associated states:tc (task creation), ti (idle time)

1 2 3 4 5 6 7 8 9 10

Interval selection

I Multiple intervals: initialization, termination, etc.

I Repeat analysis for different intervals

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 6 / 16

Page 33: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

ChooseInterval

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 7 / 16

Page 34: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

ChooseInterval

Analyze taskexecution time

AboveSufficientparallelism

No intervalleft

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 7 / 16

Page 35: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

ChooseInterval

Analyze taskexecution time

Analyze task

synchronizationAnalyze task

creation time...

Below

AboveSufficientparallelism

No intervalleft

Analyzeidle time

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 7 / 16

Page 36: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

ChooseInterval

Analyze taskexecution time

Analyzeidle time

Analyze task

synchronizationAnalyze task

creation time...

Inefficientsynchroni-

zation

Notenough

parallelismexposed

High taskcreationoverhead

...

BelowAbove BelowAbove BelowAbove BelowAbove

Below

AboveSufficientparallelism

No intervalleft

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 7 / 16

Page 37: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Detecting performance anomalies during task execution

During task executionI Performance anomaly possible even at 100% task execution

(ineffective use of caches, remote memory accesses, branchmisprediction)

Number of tasks

Duration

Number of tasks

Duration

Impact on the distribution of task durationI Slowdown of all tasksI Different groups / peaks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 8 / 16

Page 38: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Detecting performance anomalies during task execution

During task executionI Performance anomaly possible even at 100% task execution

(ineffective use of caches, remote memory accesses, branchmisprediction)

Number of tasks

DurationNumber of tasks

Duration

Impact on the distribution of task durationI Slowdown of all tasksI Different groups / peaks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 8 / 16

Page 39: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Using performance counters

Hardware performance counters

I Implemented in hardware, no slowdown of the application

I Low tracing overhead if sampled at beginning / end of a task

I Dozens of hardware events can be monitored

Automatic analysis of performance counters

I Which hardware events are relevant?

I Manual testing tedious & time consuming

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 9 / 16

Page 40: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Using performance counters

Hardware performance counters

I Implemented in hardware, no slowdown of the application

I Low tracing overhead if sampled at beginning / end of a task

I Dozens of hardware events can be monitored

Automatic analysis of performance counters

I Which hardware events are relevant?

I Manual testing tedious & time consuming

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 9 / 16

Page 41: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Analyzing performance counters

Time

Counter value

v(c,i,t)Pi

Per-CPU performance counterI Absolute value v(c , i , t), monotonically increasing

I c : Counter (e.g. cache misses)I i : Processor identifierI t: Timestamp

I Sampled at the beginning and end of a task

Break down counter evolution to task instances

I Increase of c by task T : Nc,T = v(c, i , e)− v(c , i , s)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 10 / 16

Page 42: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Analyzing performance counters

T

Time

Counter value

v(c,i,t)Pi

Per-CPU performance counterI Absolute value v(c , i , t), monotonically increasing

I c : Counter (e.g. cache misses)I i : Processor identifierI t: Timestamp

I Sampled at the beginning and end of a task

Break down counter evolution to task instances

I Increase of c by task T : Nc,T = v(c, i , e)− v(c , i , s)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 10 / 16

Page 43: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Analyzing performance counters

T

Time

Counter value

s e

v(c,i,t)Pi

Per-CPU performance counterI Absolute value v(c , i , t), monotonically increasing

I c : Counter (e.g. cache misses)I i : Processor identifierI t: Timestamp

I Sampled at the beginning and end of a task

Break down counter evolution to task instances

I Increase of c by task T : Nc,T = v(c, i , e)− v(c , i , s)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 10 / 16

Page 44: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Analyzing performance counters

T

Time

Counter value

s e

v(c,i,t)Pi Nc,T

Per-CPU performance counterI Absolute value v(c , i , t), monotonically increasing

I c : Counter (e.g. cache misses)I i : Processor identifierI t: Timestamp

I Sampled at the beginning and end of a task

Break down counter evolution to task instances

I Increase of c by task T : Nc,T = v(c, i , e)− v(c , i , s)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 10 / 16

Page 45: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Linear regression

Duration

Nc,TjPerformance indicator value

dTj

Perform linear regression

I Assume linear model: dTj= α · Nc,Tj

+ β (α and βconstant)

I Compare coefficient of determination with threshold

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 11 / 16

Page 46: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Linear regression

Duration

Nc,TjPerformance indicator value

dTj

Perform linear regression

I Assume linear model: dTj= α · Nc,Tj

+ β (α and βconstant)

I Compare coefficient of determination with threshold

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 11 / 16

Page 47: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Linear regression

Duration

Nc,TjPerformance indicator value

dTj

Perform linear regression

I Assume linear model: dTj= α · Nc,Tj

+ β (α and βconstant)

I Compare coefficient of determination with threshold

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 11 / 16

Page 48: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Shortcuts & refinement

Variation of task duration

I Determine coefficient of variation for task duration

I Only perform analysis if significant

Task typesI Different task types in an application

I Auxiliary tasks: initialization, terminationI Work tasks: matrix multiplication, decomposition, etc.

I Performance anomaly not necessarily present in all types

Topology of the machine

I Anomaly only present on subset of processors

I Example: Memory accesses local for one NUMA node, remoteon another

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 12 / 16

Page 49: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Shortcuts & refinement

Variation of task duration

I Determine coefficient of variation for task duration

I Only perform analysis if significant

Task typesI Different task types in an application

I Auxiliary tasks: initialization, terminationI Work tasks: matrix multiplication, decomposition, etc.

I Performance anomaly not necessarily present in all types

Topology of the machine

I Anomaly only present on subset of processors

I Example: Memory accesses local for one NUMA node, remoteon another

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 12 / 16

Page 50: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Shortcuts & refinement

Variation of task duration

I Determine coefficient of variation for task duration

I Only perform analysis if significant

Task typesI Different task types in an application

I Auxiliary tasks: initialization, terminationI Work tasks: matrix multiplication, decomposition, etc.

I Performance anomaly not necessarily present in all types

Topology of the machine

I Anomaly only present on subset of processors

I Example: Memory accesses local for one NUMA node, remoteon another

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 12 / 16

Page 51: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose task type

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 52: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose task typeChoose set

of processors

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 53: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose task typeChoose set

of processorsCheck varation

of task duration

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 54: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose task typeChoose set

of processorsCheck varation

of task duration

Low

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 55: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processorsCheck varation

of task duration

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 56: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Break down values

to task instances

Check varation

of task duration

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 57: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 58: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

Event set

irrelevant

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 59: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

High

Event set

irrelevant

Event set

relevant

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 60: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

High

Event set

irrelevant

Event set

relevant

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 61: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

High

Event set

irrelevant

Event set

relevant

No set left

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 62: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

High

No set left

Event set

irrelevant

Event set

relevant

No set left

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Page 63: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Example: K-means branch misprediction

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Page 64: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Example: K-means branch misprediction

0

500

1000

1500

2000

2500

0

2e+

06

4e+

06

6e+

06

8e+

06

1e+

07

1.2

e+07

1.4

e+07

1.6

e+07

1.8

e+07

2e+

07

Num

ber

of

tasks

Task duration [cycles]

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Page 65: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Example: K-means branch misprediction

0.0×100

2.0×106

4.0×106

6.0×106

8.0×106

1.0×107

1.2×107

1.4×107

1.6×107

1.8×107

2.0×107

0

100

00

200

00

300

00

400

00

500

00

600

00

700

00

800

00

900

00

100

000

Task d

ura

tion [

cycle

s]

Branch mispredictions

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Page 66: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Example: K-means branch misprediction

0.0×100

2.0×106

4.0×106

6.0×106

8.0×106

1.0×107

1.2×107

1.4×107

1.6×107

1.8×107

2.0×107

0

100

00

200

00

300

00

400

00

500

00

600

00

700

00

800

00

900

00

100

000

Task d

ura

tion [

cycle

s]

Branch mispredictions

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Page 67: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Example: K-means branch misprediction

Low mispre-diction rateLow mispre-diction rate

High mispre-diction rateHigh mispre-diction rate

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Page 68: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Work in progress: Status

Analysis of parallelism

I Per-interval analysis of time spent on task execution

I Per-interval analysis of time spent in run-time states

I Support for thresholds

I Loop performing analysis on set of intervals

Correlation of performance indicators

I Support for performance counters

I Task duration histogram

I Analysis of the variation of task durations

I Breaking down performance counter values to task instances

I Linear regression

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 15 / 16

Page 69: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Work in progress: Status

Analysis of parallelism

I Per-interval analysis of time spent on task execution

I Per-interval analysis of time spent in run-time states

I Support for thresholds

I Loop performing analysis on set of intervals

Correlation of performance indicators

I Support for performance counters

I Task duration histogram

I Analysis of the variation of task durations

I Breaking down performance counter values to task instances

I Linear regression

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 15 / 16

Page 70: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Work in progress: Status

Analysis of parallelism

I Per-interval analysis of time spent on task execution

I Per-interval analysis of time spent in run-time states

I Support for thresholds

I Loop performing analysis on set of intervals

Correlation of performance indicators

I Support for performance counters

I Task duration histogram

I Analysis of the variation of task durations

I Breaking down performance counter values to task instances

I Linear regression

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 15 / 16

Page 71: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Work in progress: Status

Analysis of parallelism

I Per-interval analysis of time spent on task execution

I Per-interval analysis of time spent in run-time states

I Support for thresholds

I Loop performing analysis on set of intervals

Correlation of performance indicators

I Support for performance counters

I Task duration histogram

I Analysis of the variation of task durations

I Breaking down performance counter values to task instances

I Linear regression

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 15 / 16

Page 72: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Summary

Aftermath

I Tool for trace-based analysis of task-parallel programs

I Currently provides only support for manual analysis

I Available at http://openstream.info/aftermath

Automatic analysis of parallelism based on thresholds

I Amount of time spent on task execution sufficiently high?

I If not, perform subsequent threshold-based analysis for statesassociated with overhead of the run-time system

Automatic correlation of performance indicators

I Indicate which events are relevant

I Break down counter evolution to task instances

I Correlate with task duration

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 16 / 16

Page 73: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Summary

Aftermath

I Tool for trace-based analysis of task-parallel programs

I Currently provides only support for manual analysis

I Available at http://openstream.info/aftermath

Automatic analysis of parallelism based on thresholds

I Amount of time spent on task execution sufficiently high?

I If not, perform subsequent threshold-based analysis for statesassociated with overhead of the run-time system

Automatic correlation of performance indicators

I Indicate which events are relevant

I Break down counter evolution to task instances

I Correlate with task duration

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 16 / 16

Page 74: Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis Aftermath I Trace visualization and support for manual analysis I Originally developed

Summary

Aftermath

I Tool for trace-based analysis of task-parallel programs

I Currently provides only support for manual analysis

I Available at http://openstream.info/aftermath

Automatic analysis of parallelism based on thresholds

I Amount of time spent on task execution sufficiently high?

I If not, perform subsequent threshold-based analysis for statesassociated with overhead of the run-time system

Automatic correlation of performance indicators

I Indicate which events are relevant

I Break down counter evolution to task instances

I Correlate with task duration

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 16 / 16