246 255 155 190 28 42 dark 1 light 1 dark 2 light 2 accent 1 … · 2020. 1. 14. · the pk/pd...

48
109 207 246 Dark 1 255 255 255 Light 1 131 56 155 Dark 2 0 99 190 Light 2 85 165 28 Accent 1 214 73 42 Accent 2 185 175 164 Accent 3 151 75 7 Accent 4 193 187 0 Accent 5 255 221 62 Accent 6 255 255 255 Hyperlink 236 137 29 Followed Hyperlink 127 175 221 Tata Blue 50% 203 215 238 Tata Blue 25% 179 149 197 Purple 50 % 212 195 223 Purple 25 % 255 242 171 Yellow 50 % 255 249 213 Yellow 25 % 229 205 186 Brown 50 % 248 241 235 Brown 25 % 180 213 154 Green 50 % 214 231 200 Green 25 % 241 240 202 Light Green 50% 251 251 241 Light Green 25% Title and Content Parallel Implementation of PK-PD Parameter Estimation on Xeon Phi Using Grid Search Method Nishant Agrawal, R. Narayanan, Manoj Nambiar, Payal Guha Nandy, Rihab Abdulrazak, Ambuj Pandey, Shyamsundar Das Performance Engineering Research Center , TCS Innovation Lab, Mumbai Drug Development R&D Group, TCS Innovation Labs, Hyderabad

Upload: others

Post on 11-Nov-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

109 207 246

Dark 1

255 255 255

Light 1

131 56 155

Dark 2

0 99 190

Light 2

85 165 28

Accent 1

214 73 42

Accent 2

185 175 164

Accent 3

151 75 7

Accent 4

193 187 0

Accent 5

255 221 62

Accent 6

255 255 255

Hyperlink

236 137 29

Followed Hyperlink

127 175 221

Tata Blue 50%

203 215 238

Tata Blue 25%

179 149 197

Purple 50 %

212 195 223

Purple 25 %

255 242 171

Yellow 50 %

255 249 213

Yellow 25 %

229 205 186

Brown 50 %

248 241 235

Brown 25 %

180 213 154

Green 50 %

214 231 200

Green 25 %

241 240 202

Light Green 50%

251 251 241

Light Green 25%

Title and Content

Parallel Implementation of PK-PD Parameter

Estimation on Xeon Phi Using Grid Search Method

Nishant Agrawal, R. Narayanan, Manoj Nambiar, Payal Guha Nandy,

Rihab Abdulrazak, Ambuj Pandey, Shyamsundar Das

Performance Engineering Research Center , TCS Innovation Lab, Mumbai

Drug Development R&D Group, TCS Innovation Labs, Hyderabad

Page 2: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Agenda

Pharma R&D Productivity

o Reasons for Poor R&D Productivity

Model Based Drug Development

Generation of Insights from integrated data

PK-PD Modelling

o Initial Parameter Estimation

o Scope and Limitations

o Parallelized Grid Search on Xeon Phi - A new and effective

approach

o Miscellaneous Optimization on Xeon Phi

o Result Comparison

Summary

Page 3: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Problem Statement: Pharma R&D Productivity

Steven M. Paul “How to improve R&D productivity: the pharmaceutical industry’s grand challenge”, Nature Reviews in Drug Discovery,Vol.9, 203-214, 2010.

Page 4: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Potential Solution

Model Based Drug Development

is based on three themes,

Integration, Innovation, and

Impact: quantitative integration of multisource data

and knowledge through the application of

clinical, biomedical, biological,

engineering, statistical, and mathematical

concepts, resulting in

continuous methodological and

technological innovation enhancing

scientific understanding and knowledge,

which in turn

has an impact on discovery, research,

development, approval, and utilization of

new medicines

Page 5: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

MBDD Focus Areas@TCS

Page 6: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Pharmacokinetics: How Drug is processed by Body i.e. Kinetics of the drug

Plasma Drug Concentration

vs. Time profile is studied

Pharmacodynamics: How Drug Affects Body i.e. Dynamics of the drug Effect vs Time profile is

studied

PK-PD Modeling: How Drug Effect is governed by plasma concentration (Dose) Effect vs. Concentration

profile is established

Urethra

Liver

PK-PD Modelling

Page 7: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

PK-PD Parameters Estimation Approaches and Why?

Page 8: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

CA- Multi Compartment Model

In our case study, we are considering this multi compartment model only.

Here, A, B, C, Alpha, Beta, Gamma are the parameters.

The goal is to find such parameters for this model which best fits the data

available.

Page 9: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Impact of the Estimated Parameters in Drug Development

Parameter PK Parameter derived Relevance for Drug

Development

A

B

C

ALPHA

BETA

GAMMA

Page 10: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

1

2

3

4

5

PK-PD Parameter Estimation Steps

• Parameter Bounds: Universal?

• IP Estimation method:

Deterministic Vs random?

• Which is going to solve this

problem?

Is GRID SEARCH a potential universal method for initial parameter estimation ?

Page 11: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

1. Induces a grid over parameter space defined by parameters range.

2. The grid is divided into a finite no of grid points (N).

3. Evaluate some objective function at each grid point.

4. The point where the objective function takes its minimum / maximum value

is considered to be the optimum solution.

Step 1: Creating the grid:

upper bound = UBi,

lower bound = LBi ,

# grid points = N.

Grid values for each parameter are

where ri = 0, 1, 2.. N-1 represents the coordinates for the grid points, and i = 1, 2.. p

Page 12: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Example: Let no. of parameters (p) = 2

Let no. of Grid Points (N) = 4

Total no. of points inside grid = Np = 16

Thus, grid point values are

which is the parameter set

Page 13: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Step 2: Evaluate objective function at each point

Our goal is to find the point in the grid which best fits with the observed data. In order to do that, calculate the objective function value (SRS ) for each point in the grid.

# of obs.

i =1

(yi observed − yi

predicted )2 SRS =

The goal then becomes finding the point in the grid with the minimum value of SRS .

Step 3: Optimal solution

At each point, compare with SRS value with the previous stored SRS value, and choose

the minimum of the two, along with their parameters.

Finally, return the minimum SRS value along with the parameters.

Page 14: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Objective:

To build PK/PD Model of a Low Molecular weight (LMW)

Heparin drug (Dose:100µg) to choose optimum dose

regimen (Loading and Maintenance doses) in case of an

patient suffering from acute angina pectoris

Method: Fit the PCT data of LMW Heparin to a PK model and

derive the final estimates of model parameters (CL & Vd)

and hence use these final estimates in building a PD

model and calculating the Loading and Maintenance

dose

The PK/PD Model developed is used for making

prediction’s of Response vs Concentration which would

help in optimizing the dosing regimen

Computational Methods: 1. Grid Search- Serial version

2. Parallelized Grid Search on Xeon Phi

Model Equation:

Data Set

Time(min) Conc(ug/L)

5 1.625

10 1.384

15 1.28

20 1.105

30 0.973

45 0.806

60 0.74

90 0.582

120 0.53

150 0.458

180 0.416

240 0.342

300 0.321

360 0.246

Page 15: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

For Initial Estimate, critical factors are:

Number of parameters (p)

Range of parameters (R)

Number of grid points (N)

The figure above shows the effect of no. of grid points on the convergence.

Increasing the no. of grid points produces more optimal result.

Page 16: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which
Page 17: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which
Page 18: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Relation between no. of Grid Points and Execution Time

Page 19: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Goal

Goal: An optimal solution in a much shorter time-frame!

Is it possible through parallelization?

Page 20: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Serial naive implementation in Java – 153 seconds

Speedup – 1.9 X

Intel Xeon (Host) Specifications

• Intel Xeon IvyBridge E5-2697 V • 12 Cores @ 2.70 GHz, 30 MB cache • 64 GB RAM • GNU/Linux 2.6.32

Serial naive implementation in C – 77 seconds

Page 21: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 20 -

For a grid with 20 points for 6 parameters – Grid of size 20^6 = 64000000 points

Pt0 (0;0;0;0;0;0)

Pt19 (0;0;0;0;0;19)

Pt20 (0;0;0;0;1;0)

Pt400(0;0;0;1;0;0)

Pt8000(0;0;1;0;0;0)

Pt64000000(19;19;19;19;19;19)

.

.

.

.

.

.

.

.

. Pt0

Pt20

Pt400

Pt8000

Pt64000000

Best Fit Parameters

Serial Approach Parallel Approach

64000000 points

divided among ‘n’

threads

Best Fit Parameters

Pt0 .

.

. Ptx

Thread1

Ptx+1 .

.

.

Thread2

Find best fit among the

result of all threads

Threadn

Ptnx+1 .

.

. Ptfin

Thread3

Pt2x+1 .

.

.

Page 22: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 21 -

Grid Points

Processor cores

Divided among (t) OpenMP threads

Core Core Core Core

Optimum within

thread 0

Optimum within

thread t

for all t threads

………………...

Barrier

Optimum among all threads

Page 23: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 22 -

Number of OpenMP threads Execution Time (sec)

60 11.62

120 7.59

180 6.62

240 6.16

Previous = 77 sec New = 6.16 sec

v1

92%

Intel Xeon Phi (Coprocessor) Specification

• Intel Xeon E5v2 • 61 core, 244 threads @ 1.2 GHz, 512 KB cache • 16 GB RAM • GNU/Linux 2.6.38.8+mpss3.1.2

Page 24: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 23 -

https://www.tacc.utexas.edu/c/document_library/get_file?uuid=ed331f32-49db-4c4b-9ea7-f7d9547c79d9&groupId=13601

0

2

4

6

8

10

12

14

60 120 180 240

Exe

cuti

on

Tim

e (

seco

nd

s)

Number of Threads

Variation of Execution Time with Threads

Execution Time

Page 25: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 24 -

OpenMP parallel region marked in

time: larger frame

OpenMP parallel regions are shown as frames on grid

This region is serial: marked outside frame

Smaller frame

Page 26: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 25 -

Compiler Option Description Redeuction in run time

-fimf-accuracy-bits=11

defines the relative error, measured by the number of correct bits, for math library function results

~2.02 sec

-fimf-precision=low This is equivalent to accuracy-bits = 11 for single-precision functions; accuracy-bits = 26 for double-precision functions

~2.02 sec

-fimf-domain-exclusion=31 (all)

This option indicates the input arguments domain on which math functions must provide correct results. As more classes are excluded, faster code sequences can be used

~0.64 sec

-no-prec-div it enables optimizations that give slightly less precise results than full IEEE division

~0.82 sec

-no-prec-sqrt uses a faster but less precise implementation of square root ~0.82 sec

-fp-model fast=2 Enables more aggressive optimizations on floating-point data ~0.68 sec

Used in combination with each other ~2.6 sec

Previous = 6.16 sec New = 3.56 sec 42%

Page 27: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 26 -

In the equation to calculate dose effect • exp() • expf() • exp2() • exp2f()

Trials with different exponentiation functions without affecting the accuracy of results

Page 28: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 27 -

Arithmetic Equation for dose effect calculation Execution Time

Original Equation

amt = A1*exp(-alpha*tp) + B1*exp(-beta*tp) +

C1*exp(-gamma*tp);

3 sec, 560231 usec

Changing from double to float

amt = A1*expf(-alpha*tp) + B1*expf(-beta*tp) +

C1*expf(-gamma*tp);

2 sec, 574300 usec

Changing from base e to base 2

tmp = -1.0*tp*M_LOG2E;

amt = A1*exp2(alpha*tmp) + B1*exp2(beta*tmp) +

C1*exp2(gamma*tmp);

3 sec, 2729 usec

Changing from, base e to base 2 and from double to float

tmp = -1.0*tp*M_LOG2E;

amt = A1*exp2f(alpha*tmp) + B1*exp2f(beta*tmp) +

C1*exp2f(gamma*tmp);

2 sec, 572489 usec

Previous = 3.56 sec New = 2.57 sec 27%

Page 29: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 28 -

Equation to calculate result changed due to high cost of “fmax” function From: res = res + fmax(0,amt);

To: ((amt > 0.0) ? (res = res + amt):0);

Previous = 2.57 sec New = 2.11 sec

v3

17%

Page 30: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 29 -

Loops not getting

vectorized !!!

Need to clean up

code

Page 31: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 30 -

Generate a vectorization report using the compile option “vec-report[0-6]”

Reports which loops are not vectorized and why

Sample Output of vec-report

Problems with code:

for(i=0; i<r; i++) {

double xf = X[i*Nfun + fn_no];

double yf = Y[i*Nfun + fn_no];

if(xf == 999999.9)

errorMat[i] = 0;

else {

fn = func_diff18(par,xf);

errorMat[i] = yf – fn;

}

}

Checks inside a loop

prevent vectorization

xf is an array element used

in a loop in ‘func_diff18’ –

creating a dependence

Page 32: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 31 -

Problem : Removing Checks inside the loop

• Identify what the checks signify – in our case it was for preventing the

function being called for outliers

Solution: Remove the outliers from the set of parameters

• Performing the checks at initialization

• Send in ‘perfect’ set of parameters

v4

Previous = 2.11 sec New = 1.35 sec 36%

Problem : Removing dependence of variables between outer/inner loops

• Examine the function – in our case redundant assignments and single

iteration inner loop

Solution: collapse the inner loop into the outer loop

Previous = 1.35 sec New = 1.28 sec 5%

Page 33: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 32 -

Page 34: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 33 -

// code to choose parameter sets (0,0,0,0,0,0) – (19,19,19,19,19,19)

for(k=0; k<nPts; k++) {

tmp = k;

int gIdx[noOfParam];

for(j=noOfParam-1; j>=0; j--) {

gIdx[j] = tmp%gridPts;

tmp = tmp/gridPts;

parameter[j] = lowerBound[j] + (gIdx[j] + 1)*stepSize[j];

}

……

Page 35: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 34 - v6

// convert to 6 loops – 1 for each parameter

for(k1=0; k1<gridPts; k1++) {

for(k2=0; k2<gridPts; k2++) {

for(k3=0; k1<gridPts; k3++) {

for(k4=0; k4<gridPts; k4++) {

for(k5=0; k5<gridPts; k5++) {

for(k6=0; k6<gridPts; k6++) {

parameter[0] = k1*stepSize[0];

parameter[1] = k2*stepSize[1];

parameter[2] = k3*stepSize[2];

parameter[3] = k4*stepSize[3];

parameter[4] = k5*stepSize[4];

parameter[5] = k6*stepSize[5];

The Vtune hotspot analysis shows that the hotspot has been removed.

Previous = 1.28 sec New = 0.68 sec 46%

Page 36: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 35 -

Previous New

Page 37: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

#pragma omp for collapse(3) nowait schedule(dynamic)

- 36 -

Previous = 0.68 sec New = 0.60 sec

v7

11%

Page 38: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 37 -

Pre

Op

timiz

atio

n

Po

st

Op

timiz

atio

n

Page 39: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 38 -

A call to malloc() is taking 12.8% of the time !!!

Page 40: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 39 -

From:

double *error(double * par, double * X, double * Y, int fn_no, int * row)

{

……

double *errorMat = (double *)malloc(sizeof(double)*r);

……

}

--------------------------------------------------------------------------

To:

double *error(double * par, double * X, double * Y, int fn_no, int * row)

{

……

double errorMat[NO_OBS];

……

}

Previous = 0.60 sec New = 0.39 sec

A call to malloc() is executed on Xeon Phi, making it expensive

35%

Page 41: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 40 -

The entry for malloc() is no longer there

Page 42: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 41 -

Grid Points

Xeon Phi cores

Divided among 24/48 (n) threads + 60/120/240 (t) OpenMP threads

Core Core Core Core

Optimum within

thread 0

Optimum within

thread t-1

for all t threads

………………...

MPI Barrier

Optimum among all threads

Xeon cores

Core Core

Optimum within

thread 0

Optimum within

thread n-1

Previous = 0.39 sec New = 0.22 sec 44%

Page 43: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 42 -

0

100

200

300

400

Spe

ed

Up

Speedup w.r.t serial version on 1 CPU – 77 sec

6.16 sec

0.60 sec

|-------- Optimizations on Native Mode -------|

11.6 sec 3.56 sec 2.11 sec

0.39 sec

0.22 sec

Page 44: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which
Page 45: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

Backup slides on OpenFOAM optimization

Page 46: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

45 TCS Confidential TCS Confidential 45

213 187

635

194

518

135

0

100

200

300

400

500

600

700

Baseline Optimized

2S Intel® Xeon® processor E5-2697v2 (Native) - 24 Cores

Intel® Xeon Phi™ coprocessor 7120A (Native) - 60 Cores

2S E5-2697v2 + Intel® Xeon Phi™ coprocessor 7120A (Symmetric) - 24 + 60 Cores

1.4X Speedup due to MIC addition

3.3X 3.9X

Tim

e i

n s

ec

s

Runtime of Motorbike Case, 4.2M Workload

(lower is better)

Results: • 3.3X Speedup on Xeon Phi

native execution

• 1.4X Speedup w.r.t Xeon

optimized result (Xeon / Xeon +

Phi) = (187/135) = 1.4X

Code Optimization Strategy:

• Vectorization (AVX on CPU / 512-

bit vectorization Intrinsics on

Phi™)

• Prefetching

• Cache Optimizations

• Optimized Decomposition

Algorithm modification for

Symmetric runs

• Compiler Flags

• Added #pragma unroll to

improve loop performance on

both Intel® Xeon® processors

and Intel® Xeon Phi™

coprocessors

• IO Optimizations

• Cleaning the code, cache

blocking, helping auto-

vectorization, prefetch distance,

unroll factor.

• Detailed Profiling of Hotspots

Execution Model: Native, Symmetric Mode

Software: Intel C++ Compiler,

Intel MPI, Vtune Profiler, ITAC

Custom designed Decomposition algorithm for Xeon + MIC

1.14X

Page 47: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

ITAC Message Profile for 84 cores Symmetric run – Shows Load Imbalance

Original Modified

MPI_WAITALL reduced drastically with Modified decomposition algorithm

Xeon

Xeon Phi

Xeon

Xeon Phi

Page 48: 246 255 155 190 28 42 Dark 1 Light 1 Dark 2 Light 2 Accent 1 … · 2020. 1. 14. · The PK/PD Model developed is used for making prediction’s of Response vs Concentration which

- 47 -

initializations

omp parallel for

SRS Calculation

Determine local minimum SRS

omp single

Determine the first minimum SRS value

omp barrier

omp critical

Determine the minimum SRS among all threads

0.002896 seconds

0.370136 seconds

0.010665 seconds

0.000037 seconds

Total Execution Time 0.384007 seconds

Total Time in parallel region (0.37+0.01+.000037) 0.380838 seconds

Initialization 0.002896 seconds