program performance analysis toolkit adaptor

27
Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Programs Performance Analysis Toolkit Adaptor Michael K. Pankov Advisor: Anatoly P. Karpenko Bauman Moscow State Technical University October 11, 2013 Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Upload: michael-pankov

Post on 06-Dec-2014

276 views

Category:

Technology


1 download

DESCRIPTION

The Adaptor framework automates experimentation, data collection and analysis in the field of programs performance and tuning. It can be used for i.e. estimation of computer system performance during its design or search of optimal compiler settings by methods of iterative compilation and machine learning-driven techniques. Contact information: Michael K. Pankov • [email protected] • michaelpankov.com Source on GitHub: https://github.com/constantius9/adaptor This is an extended and edited version of my diploma defense keynote, given on June 19, 2013

TRANSCRIPT

Page 1: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Programs Performance Analysis Toolkit Adaptor

Michael K. PankovAdvisor: Anatoly P. Karpenko

Bauman Moscow State Technical University

October 11, 2013

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 2: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Goal, tasks, and importance of the work

Goal

Develop a method and software toolkit for modeling ofprograms performance on general purpose computers

Tasks

1 Develop method of programs performance modeling

2 Implement the performance analysis & modeling toolkit

3 Study the efficiency of toolkit on a set of benchmarks

Importance

1 Estimation of computer performance during its design

2 Search of optimal compiler settings by methods of iterativecompilation and machine learning-driven techniques

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 3: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Overview

A lot of recent research: see C. Dubach, G. Fursin, B. C. Lee,W. Wu

In particular, there’s cTuning public repository for researchand corresponding program Collective Mind run by G. Fursin

This work is about modeling of performance ofgeneral-purpose computer programs with feature ranking bymeans of Earth Importance and regression by means ofk-Nearest Neighbors and Earth Regression. We try toaccomplish automatic detection of relevant features.

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 4: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Method of statistical programs performance analysisVelocitas

1 Perform a series of experiments on measuring time of programexecution and form a set, U:

U = {(Xi , yi )},Xi = (xij , i ∈ [1;m], j ∈ [1; n])

Xi — features vector (CPU frequency, number of rows ofprocessed matrix, etc.), yi — response (execution time),m — number of experiments, n — number of features

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 5: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

2 Split the U set into training sample D and test sample C byrandomly assigning of 70% of experiments to D

D = {di | f Irand (di ) > 0.3}, (1)

di = (Xi , yi ), (2)

f Irand (d) ∈ [0 : 1], (3)

i ∈ [1;m], (4)

C = U \ D (5)

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 6: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

3 Extract features xik

xik = f (Xi ), (6)

X ′i = (xij ), (7)

D ′ = {(X ′i , yi )}, (8)

i ∈ [1;m], (9)

j ∈ [1; n + r ], (10)

k ∈ [n + 1; r ] (11)

r — number of additional features (i.e. ”size of input data”)

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 7: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

4 Filter the training set D ′ to remove noise and incorrectmeasurements

D ′′ = D ′ \ {(X ′i , yi ) | P(X ′

i , yi )}

P — experiment selection predicate (we remove allexperiments where the measured execution time is less thantmin)

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 8: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

5 Rank the features and select only ones with non-zeroimportance

sj = frank (D ′′), (12)

j ∈ [1; n], (13)

D ′′′ = {(X ′i , yi ) | Sj > 0} (14)

sj — scalar value of importance of particular feature,frank — feature ranking function (we used MSE, Relief F,Earth Importance)

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 9: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

6 Fit the regression model of 1 of 4 kinds (linear, random forest,Earth, k nearest neighbors)

Mp = {fpred ,B} (15)

B = ffit(D ′′), p ∈ [1; 4] (16)

B — vector of model parameters, ffit — learning function,fpref — prediction function (they’re defined for each modelseparately)

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 10: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

7 Test the model by RRSE metric

C = U \ D (17)

= {(X ′i , yi )}, (18)

i ∈ [1;m], (19)

X ′i = (xik), (20)

k ∈ [1; n + r ] (21)

Y = fpred (~X ,B), (22)

RRSE =

√√√√√√√m∑

i=1(yi − yi )2

m∑i=1

(yi − y)2(23)

yi — predicted value of response, y — average value ofresponse in testing sample

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 11: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Architecture of Adaptor Framework

Database server

Data views

Client

Database interaction module

Program building module

Experimentation module

Information retrieval module

Information analysis module

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 12: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Technology stack

Database server

Distributed client-server document-oriented storage CouchDBCloud platform Cloudant

Client

PythonStatistical framework OrangeGNU/Linux on x86 platform

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 13: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Database interaction module

Provides high-level API forstorage of Python objects todatabase documents

Uses local CouchDB serveras a fall-back if the remoteisn’t available

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 14: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Program building module

Manages paths to sourcefiles of experimentalprograms

Sources are in hierarchicalstructure of directoriesModule enables that onlyspecifying the name ofprogram to build isenough for sources to befound

Manages build tools andtheir settings

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 15: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Experimentation module

Calibrates the programexecution time measurementbefore every series of runs

Subtracts the executiontime of ”simplest”program to avoidsystematical errorRuns the program beingstudied until relativedispersion of timemeasurement becomespretty low (drel < 5%)

Passes experiment data todatabase interaction module

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 16: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Information retrieval module

Collects the information onused platform andexperiment being carried out

CPU

FrequencyCache sizeInstruction set extensionsetc.

Compiler

Experiment

Studied programSize of input data

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 17: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Data analysis module

Receives data from database and saves it to CSV files forinput to Orange statistical analysis system

Graphs results using Python library matplotlib

Two groups of program performance models

Simplest (1 feature)More complex (3-5 features)

Four regression models in both groups

Lineark Nearest NeighborsMultivariate Adaptive Regression SplinesRandom Forest

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 18: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Data analysis module (cont.)

Scheme of 40 data analysiscomponents in Orangesystem

Reading inPreprocessingFilteringFeature extractionFeature rankingPredictor fittingPrediction resultsevaluationSaving predictions to CSVfile

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 19: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

PlatformIntel CPUs

Core 2 Quad Q82002.33 GHz, 2 MB cacheCore i5 M460 2.53GHz, 3 MB cacheXeon E5430 2.66GHz, 6 MB cache

Ubuntu 12.04, gcc andllvm compilers

Polybench/C 3.2 benchmarkset, 28 programs in total

Linear algebra, solution ofsystems of linear algebraicequations and ordinarydifferential equations

Input data is generated bydeterministic algorithms

Performance of chosenprograms from benchmarkset is modeled usingAdaptor framework

symm. Multiplication ofsymmetric matrices

Square matrices of 2i

dimensionality,i = f

′′rand(1, 10)

ludcmp.LU-decomposition.

Square matrices off′′

rand(2, 1024)dimensionality

1000 experiments per CPU

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 20: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Feature ranking. symm program

Attribute Relief F Mean Square Error Earth Importance

size 0.268 0.573 4.9cpu mhz 0.000 0.006 3.3

width 0.130 0.573 0.7cpu cache 0.000 0.006 0.5

height 0.130 0.573 0.0

Earth Importance selected only relevant features

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 21: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Feature ranking. symm program (cont.)

428 experiments

1 feature: matrix dimensionality

RMSE RRSE R2

k Nearest Neighbors 5.761 0.051 0.997Random Forest 5.961 0.052 0.997

Linear Regression 15.869 0.139 0.981

Root Relative Square Error of k Nearest Neighbors —approx. 5%

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 22: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Resulting model of performance

k Nearest Neighborsmodelof performanceof symm programon IntelCore 2 Quad Q8200CPU

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 23: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Resulting model of performance

Comparison of models of performance of ludcmp program

468 experiments

2 features: width of matrix, CPU frequency

RMSE RRSE R2

k Nearest Neigbors 1.093 0.048 0.998Linear Regression 9.067 0.394 0.845

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 24: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Where models fail

Amazon throttles itsmicro servers: data issplit into two”curves”

Earth Regression atleast tries to followthe ”main curve”

k Nearest Neighborsis much worse in thissituation

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 25: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Results of evaluation

Most suitable Feature Ranking method — Earth Importance

Most suitable Regression method — k Nearest Neighbors

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 26: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Further work

Velocitas method is promising, scales for larger feature sets

Data filtering to reduce noise can help it to get even better

Orange is decent statistical framework, but interactive workwith it limits batch processing

For larger data sets and increased automation of Adaptorframework, either its API, or other libraries (e.g. sklearn)should be used

Custom research scenario support is required

It would be interesting to perform experiments on GPU tostudy effects of massive parallel execution

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

Page 27: Program Performance Analysis Toolkit Adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Thank you!

Contact information: Michael K. Pankov

[email protected]

This is an extended and edited version of my diploma defensekeynote, given on June 19, 2013

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor