grid parallelization and tests

22
1 Grid parallelization and tests CERN GRACE Final Review Amsterdam, 15-16 February 2005

Upload: gryta

Post on 21-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Grid parallelization and tests. CERN GRACE Final Review Amsterdam, 15-16 February 2005. Contents. Two GRACE Grid integration models: M1, M2 Pre-conditions for the tests Work performed General test results Model 1 test results Simulation of Model 2 Model 2 tests results Comparison - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grid parallelization and tests

1

Grid parallelization and tests

CERN

GRACE Final ReviewAmsterdam, 15-16 February 2005

Page 2: Grid parallelization and tests

2GRACE Review February 2005 - Amsterdam

1. Two GRACE Grid integration models: M1, M2

2. Pre-conditions for the tests

3. Work performed

4. General test results

5. Model 1 test results

6. Simulation of Model 2

7. Model 2 tests results

8. Comparison

9. Conclusions

Contents

Page 3: Grid parallelization and tests

3GRACE Review February 2005 - Amsterdam

Application workflow

Single search Grid workflow M1M2

Approach used: M1 - M2

Page 4: Grid parallelization and tests

4GRACE Review February 2005 - Amsterdam

• Adopted Content and Categorization Engines release 4.45. These components have been later on improved and optimized by the partners

• A convenient testing corpus of documents has been selected (English documents, correct pdf to txt conversion, small and large sizes)

• Configuration problems of GILDA replica manager have been solved (intervention of site administrators)

• Search result set size is considered in average between 0.1 and 4 MBs of text

• The Usage of DAG for the job model in GILDA has been discarded

Pre-conditions

Page 5: Grid parallelization and tests

5GRACE Review February 2005 - Amsterdam

• Preparation of a test plan and report template

• Creation of the testing corpus of documents

• Verification of testing pre-conditions

• Creation of the test scripts for semi-automatic testing

• Testing on Gilda testbed

• Creation of scripts for validation of output and parsing of logging

• Collection and analysis of the results

Work performed

Page 6: Grid parallelization and tests

6GRACE Review February 2005 - Amsterdam

M1 M2 General Total

Total number of jobs submitted 58 727 395 1180

• general (RM, RB, functional, etc.) tests started in October 2004 • main testing period November 2004 • submitted more than 1000 jobs

Testing: job submission

Model 1General tests

Model 2

Page 7: Grid parallelization and tests

7GRACE Review February 2005 - Amsterdam

V1 Input Data size

V2 Worker Node Specifications

V3 Number of parallel jobs

V1 ID 0 1 2 3 4 5 6

Size 0,1 MB 0,5 MB 1,0 MB 1,5 MB 2,0 MB 3,0 MB 4,0 MB

V3 ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13

JobsN 1 2 3 4 5 6 7 8 9 10 11 12 14 16

Variable Parameters

V2 ID 0 / “Spec1” 1 / “Spec2” 2 / “Spec3”

Specifications PIV 2.4 GHz, 512 MB RAM PIII 800 MHz, 1GB RAM PIII 1000 MHz, 2 GB RAM

Comment The fastest machine in the GILDA testbed

The slowest machine in the GILDA testbed

The most common machine in the GILDA testbed

Page 8: Grid parallelization and tests

8GRACE Review February 2005 - Amsterdam

G1 Total Execution Time Execution time (P4) as a function of input data size (V1) on worker nodes with different specifications (V2).

G2 Detailed Execution Time Execution time (P4) as a function of input data size (V1) split by text normalization and categorization. V2 is fixed.

G3 Output Size Output size (P9) as a function of input data size (V1).

G4 UI Waiting Time UI waiting time (P7) as a function of the number of sub-jobs (V3) with fixed input data size (V1).

G5 Spent Computing Time Spent computing time (P8) as a function of the number of sub-jobs (V3) with fixed input size (V1).

G6 Optimal number of Jobs Optimal number of jobs (FN1, FN2) as a function of the input size (V1).

G7 Optimal UI Waiting Time UI waiting time (P7) as a function of the input size (V1) when applying the optimal splitting (FN1, FN2).

G8 Spent Computing Time Optimal Spent Computing Time (P8) as a function of the input size (V1) when applying the optimal splitting (FN1, FN2).

Graphs

Page 9: Grid parallelization and tests

9GRACE Review February 2005 - Amsterdam

M1 - G1

0,0

5,0

10,0

15,0

20,0

25,0

30,0

35,0

40,0

0 0,5 1 1,5 2 2,5 3 3,5 4

InputSize in MB

Exe

cuti

on

Tim

e in

Ho

urs

Spec1 Spec2 Spec3

M1 - G2

0

5

10

15

20

25

30

35

0 0,5 1 1,5 2 2,5 3 3,5 4

InputSize in MB

Exe

cuti

on

Tim

e in

Ho

urs

Categorization

Normalization

M1 - G3

0

500

1000

1500

2000

2500

3000

0 0,5 1 1,5 2 2,5 3 3,5 4

InputSize in MB

Siz

e in

KB

Categories OutputSandbox (compressed) Index Files

Jobs Per Day (triggered, probably executed later)

0

50

100

150

200

250

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Day (November)

Nu

mb

er o

f Jo

bs

M1 Jobs M2 Jobs General Jobs M2 - G2 - V1=Spec3 (TT3B)

0,00

0,50

1,00

1,50

2,00

2,50

3,00

3,50

4,00

4,50

5,00

0,5 1 1,5 2 2,5 3 3,5 4

InputSize in MB

Tim

e in

Ho

urs

CategorizationEngine

ContentEngine

M2 - G6 - V2=Spec3 (TT3B)

0

2

4

6

8

10

12

14

16

18

0 0,5 1 1,5 2 2,5 3 3,5 4

InputSize in MB

Op

tim

al n

um

be

r o

f jo

bs

M2 - G7

0,00

1,00

2,00

3,00

4,00

5,00

6,00

7,00

0 0,5 1 1,5 2 2,5 3 3,5 4

InputSize in MB

UI W

ait

ing

Tim

e in

Ho

urs

V2=Spec3

V2=Spec2

V2=*

M2 - Spent computing time with optimal number of jobs

0,00

10,00

20,00

30,00

40,00

50,00

60,00

0 0,5 1 1,5 2 2,5 3 3,5 4

InputSize in MB

Sp

en

t C

om

pu

tin

g T

ime

in H

ou

rs

V2=Spec3

V2=Spec2

V2=*

Comparing P8: M1 - M2, V2=Spec3

0,0

5,0

10,0

15,0

20,0

25,0

30,0

35,0

40,0

45,0

0 0,5 1 1,5 2 2,5 3 3,5 4

InputSize in MB

Sp

en

t C

om

pu

tin

g T

ime

in H

ou

rs

M1

M2

ResultsResults collected and published on a study and test report

Page 10: Grid parallelization and tests

10GRACE Review February 2005 - Amsterdam

General tests

Page 11: Grid parallelization and tests

11GRACE Review February 2005 - Amsterdam

F1 Job submission Submission of a job to the Grid

F2 Job Status check Status checking while the job is running

F3 Results Retrieval Retrieving the output sandbox after successful execution

F4 Results Validation Validate that the results are complete. Output files exist and ane not empty: indexes, NDF, categories

F5 Error Testing Testing if error conditions return the proper error messages:

Input data not available, GRACE application not available,

ContentEngine failure, CategorizationEngine failure

The functional tests were successful. Problems related to the Grid nodes configuration were experienced and fixed:

• RB Configuration Problems• RM/SE Configuration Problems

Functional tests

Page 12: Grid parallelization and tests

12GRACE Review February 2005 - Amsterdam

M1 [sec] M2 [sec]

P1 Job submission time 66,9 ± 23,9 34,9 ± 7,3

P2 Job brokering time 28,5 ± 5,1 26,6 ± 3,9

P3 Job queuing time 72,5 ± 19,0 68,6 ± 21,8

P4 Job execution time 0,69 + 7,88 * I See graphs

P5 Job retrieving time 18,1 ± 5,9 17,6 ± 1,4

Average Grid overhead 3.1 min 2.5 min

Depends on input data size

On empty queues

Depends on output data size

Depends on GRACE performance

I = Input Size in MB

Variable

Performance tests (I)

Page 13: Grid parallelization and tests

13GRACE Review February 2005 - Amsterdam

Grid overhead

Grid overhead is 3 minutes in average

submission

brokering

queuing

retrieving

Page 14: Grid parallelization and tests

14GRACE Review February 2005 - Amsterdam

M1 M2

P6 Job failure rate 19,0 % 15,3 %

11 failed out of 58 jobs 22 failed out of 144 jobs

We identified as main cause of failure the misbehavior of the resource broker (RB) which needed re-initialization (performed by the GILDA team).

After re-initialization 23 jobs were executed, all successfully

Aborted at the broker.

Not considered: failures due to one CE which broke

Job success rate > 80%

Performance tests (II)

The Grid performed well, job success rate > 80%

Page 15: Grid parallelization and tests

15GRACE Review February 2005 - Amsterdam

Model 1

Page 16: Grid parallelization and tests

16GRACE Review February 2005 - Amsterdam

0,0

5,0

10,0

15,0

20,0

25,0

30,0

35,0

40,0

0 0,5 1 1,5 2 2,5 3 3,5 4

InputSize in MB

Exe

cuti

on

Tim

e in

Ho

urs

Spec1 Spec2 Spec3

Tests performed on machines with different specifications

The normalization job is the most demanding

M1 performaces: execution time/input size

CategorizationNormalization

Page 17: Grid parallelization and tests

17GRACE Review February 2005 - Amsterdam

Model 2

Page 18: Grid parallelization and tests

18GRACE Review February 2005 - Amsterdam

• Search results are split outside the Grid

• Grid parallel jobs execute Text normalization

• Jobs are monitored for status

• Results are stored on the Grid (Replica Manager)

• Grid Categorization job executes:– normalized documents merging from SEs– categorization processing

• Job is monitored and results retrieved

M2 description

Page 19: Grid parallelization and tests

19GRACE Review February 2005 - Amsterdam

Kopt Ideal optimal splitting number

infinite-worker-nodes Grid

any splitting is possible

Function which minimizes the UI waiting time with resource-saving parameter α

Keff Real optimal splitting number

available worker nodes

input data file size

splitting sequence

Kopt considering constraints

M2 Simulation

Increase due to job submission

overhead

KoptKopt1

α {

Kopt2

α

Computing timeWaiting time at UI

Page 20: Grid parallelization and tests

20GRACE Review February 2005 - Amsterdam

M2 performances

execution time/n. of parallel jobs

execution time/input size

Grid overheadUI waiting time

CategorizationNormalization

Input size=2MB

Splitting parameter = 9

Page 21: Grid parallelization and tests

21GRACE Review February 2005 - Amsterdam

Comparison M1 and M2

execution time/input size

Model 1

Model 2

computing time/input size

Model 1

Model 2

Page 22: Grid parallelization and tests

22GRACE Review February 2005 - Amsterdam

• Grid performed well: low failure rate, prompt reply of Grid administrators to problems, good coordination with Gilda team

• Parallelization proved to improve application performances and lower the query failure rate

Conclusions