performance assessment of the 2008 configuration of the jinr cicc cluster a. ayriyan 1,*, gh. adam...

1
Performance assessment of the 2008 configuration of the JINR CICC cluster A. Ayriyan 1,* , Gh. Adam 1,2 , S. Adam 1,2 , E. Dushanov 1 , V. Korenkov 1 , A. Lutsenko 1 , V. Mitsyn 1 1 Laboratory of Information Technologies, Joint Institute for Nuclear Research, 141980, Dubna, Russia 2 Horia Hulubei National Institute for Physics and Nuclear Engineering, IFIN-HH, Bucharest, Romania * E-mail: [email protected] Work is partly supported by grants RFBR #08-01-00800 and RFBR #08-07-09240 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 5 10 15 20 25 30 35 40 entries 121 mean 0.748 hwidth 0.089 entries 284 mean 0.532 hwidth 0.055 E ffectiven ess 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 10 20 30 40 50 60 70 80 90 100 110 120 In finiB an d G b E th ern et E ffectiven ess 3 old 1 old 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 5 10 15 20 25 30 35 40 45 50 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 10 20 30 40 50 60 70 80 0.44 0.713 T he C E R N clu ster efficien cy 0.51 (2007) InfiniB and E ffectiven ess entries = 119 mean = 0.713 hwidth = 0.093 G b E th ern et E ffectiven ess entries = 271 mean = 0.529 hwidth = 0.0787 Fig. 1. The three modules of JINR CICC cluster. Table 1. Summary of system and performance characterization. Fig. 3. Comparison of former efficiencies of Modules 1 and 3 (arrows) with NOV’07 TOP500 data (histograms) Fig. 4. Comparison of the last and present efficiencies of Modules 1 to 3 (arrows) with JUN’08 TOP500 data (histograms) Table 2. Table of the current results. Modules Modules Network Network R R peak peak [Gflo [Gflo ps] ps] R R max max [Gflo [Gflo ps] ps] Effici Effici ency ency Past Past Eff. Eff. Module 1 Gigabit Ethernet 2553. 6 1325. 0 0.528 0.44 Module 2 Gigabit Ethernet 2553. 6 1414. 0 0.554 Module 3 Gigabit Ethernet 960 598.4 0.623 InfiniBand 960 757.5 0.797 0.713 Sum Gigabit Ethernet 6067. 2 3337. 4 CICC Cluster Gigabit Ethernet 6067. 2 2982. 0 0.491 0.44 RDIG RDIG Accounting Normalized CPU time per site and per LHC LHC VO VO June – September 2008 JINR CICC JINR CICC Accounting Normalized CPU time per LHC VO LHC VO June – September 2008 I N T R O D U C T I O N JINR covers a broad multidisciplinary spectrum of research topics and LIT does mathematical modeling and software development for that. Among them there are two kind of problems ~ very large single problems (e.g. lattice chromodynamics), ~ very large number of different problems of similar nature (like in LHC experiment). Central Information and Computing Complex (CICC) has to optimally implement the acquired hardware and open software sources. M O T I V A T I O N ~ Research of efficiency of the software implementation and optimization of hardware modules connection. ~ Finding and shooting weak points. ~ Finding way for next upgrade. ~ To make proposal for Top50 (www.supercomputers.ru) and to make comparison with Top500 (www.top500.org) data. Fig. 2. Example of the data fitting, fitted curve (polynomial with degree m) to February 2008 Module 1 (left) data and Module 3 (right) data. Fig. 5. Normalized CPU time per site and per LHC VO Fig. 6. List of top sites sorted by normalized CPU time. XII International Workshop on Advanced Computing and Analysis Techniques in Physics Research, 3-7 NOV 2008 Erice, Sicily (Italia ) C O N C L U S I O N S We have found that low cost improvements of the hardware connections resulted in substantial gains concerning the latency decrease and the consistent data handling to/from the associated mass storage peripherals. The increase of the RAM/core also resulted in sensible performance gains too. With these improvements, the efficiency of the high performance computing done on each of the three homogeneous modules of the LIT-JINR cluster stays at the upper end of the reported statistics on the TOP500 [4] data. Therefore, the home made free software installation is consistent with the similar works worldwide. As it concerns the efficiency of the distributed computing of interest in Grid data analyses, it also benefited from these additions since the improved rack interconnect is straightforwardly reflected in the speed of this kind of computations. The highest measured performance of the CICC cluster provided entries to the Top50: - 12 th rank within the 7 th edition SEP’07, - 23 rd rank within the 8 th edition MAR’08, - 24 th rank within the 9 th edition SEP’08. Fig. 7. Entries of the CICC cluster to the Top50 (in Russian) Provider [Totals] T- Platfor.&H P SuperMicr o SuperMicro Structure Heterogene ous Homogeneou s Homogeneo us Homogeneous Year 2007-2008 2007 2008 2008 CPU/cores 200/560 120/240 60/240 20/80 Intel Processor Xeon 5150 (dual- core) Xeon E5430 (quad- core) Xeon X5450 (quad-core) Frequency, MHz 2660 2660 3000 RAM, Gb 1120 480 480 160 Network Gigabit Ethernet Gigabit Ethernet Gigabit Ethernet Gigabit Ethernet & InfiniBand Performance Linpack/Peak, GFlops 2982 / 6067.2 1325 / 2511.04 1414 / 2553.6 GbE IfB 598.4/ 960 757.5/ 960 Efficiency Linpack/Peak 0.491 0.528 0.554 GbE IfB 0.623 0.797 Coefficient matrix order 274400 220000 220000 GbE IfB 120000 120000 RAM occupancy by coefficient matrix 0.501 0.751 0.751 GbE IfB 0.671 0.671 Basic aim of computing Distribute d Distribute d Distribut ed Distributed Parallel Parallel 1000 10000 100000 1E -3 0.01 0.1 1 10 100 1000 M o d u le 3: S u p erB ladeIB -IB February 2008 data m = 3,P o isso n F itted cu rve Tim e [sec] D im en sio n o f m atrix, N 10000 100000 0.1 1 10 100 1000 10000 M od u le 1: T -P latform 'n'H P S ep tem ber 2007 data m = 3,P o isso n F itted cu rve Tim e [sec] D im en sio n o f m atrix, N EFFICIENCY IN RDIG JINR CICC shows roughly 40 percent of the active resources involved in Tier2 RDIG. FINDING AND SHOOTING WEAK POINTS The information transfer is secured by one Gigabit Ethernet (GbE) in each machine, while every module connect to the main Backbone Ethernet switch via four-ports GbE trunk, so aggregated in-between modules rate was upgraded up to 4 Gbps. CONSISTENCY TIME DATA CHECK The Top500 High-Performance Linpack benchmark (HPL) was used for performance assessment. The consistency of the measured times with the computational complexity of the LU-decomposition involved in the HPL was checked by solving a least squares problem. This analysis yields a third power law of the increase of the computing time with N, the order of the solved linear algebraic system. Detailed description of the lsq-problem was given in: Gh.Adam, S.Adam, A.Ayriyan, E.Dushanov, E.Hayryan, V.Korenkov, A.Lutsenko, V.Mitsyn, T.Sapozhnikova, A.Sapozhnikov,O.Streltsova, F.Buzatu, M.Dulea, I.Vasile, A.Sima, C.Visan, J.Busa, I.Pokorny, “Performance assessment of the SIMFAP parallel cluster at IFIN-HH Bucharest", Romanian Journal of Physics, Vol. 53, Nos. 5-6, P. 665- 677, 2008.

Upload: george-pierce

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance assessment of the 2008 configuration of the JINR CICC cluster A. Ayriyan 1,*, Gh. Adam 1,2, S. Adam 1,2, E. Dushanov 1, V. Korenkov 1, A. Lutsenko

Performance assessment of the 2008 configurationof the JINR CICC cluster

A. Ayriyan1,*, Gh. Adam1,2, S. Adam1,2, E. Dushanov1, V. Korenkov1, A. Lutsenko1, V. Mitsyn1

1 Laboratory of Information Technologies, Joint Institute for Nuclear Research, 141980, Dubna, Russia2 Horia Hulubei National Institute for Physics and Nuclear Engineering, IFIN-HH, Bucharest, Romania

* E-mail: [email protected] is partly supported by grants RFBR #08-01-00800 and RFBR #08-07-09240

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

5

10

15

20

25

30

35

40

entries 121 mean 0.748hwidth 0.089

entries 284 mean 0.532hwidth 0.055

Effectiveness0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0102030405060708090

100110120

InfiniBand GbEthernet

Effectiveness

3old 1old

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

5

10

15

20

25

30

35

40

45

50

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

10

20

30

40

50

60

70

80

0.440.713

The CERN clusterefficiency 0.51 (2007)

InfiniBand

Effectiveness

entries = 119mean = 0.713hwidth = 0.093

GbEthernet

Effectiveness

entries = 271mean = 0.529hwidth = 0.0787

Fig. 1. The three modules of JINR CICC cluster.Table 1. Summary of system and performance characterization.

Fig. 3. Comparison of former efficiencies of Modules 1 and 3 (arrows) with NOV’07 TOP500 data (histograms)

Fig. 4. Comparison of the last and present efficiencies of Modules 1 to 3 (arrows) with JUN’08 TOP500 data (histograms)

Table 2. Table of the current results.

ModulesModules Network Network RRpeakpeak [Gflops][Gflops]

RRmaxmax [Gflops][Gflops]

EfficiencyEfficiency Past Eff.Past Eff.

Module 1 Gigabit Ethernet 2553.6 1325.0 0.528 0.44Module 2 Gigabit Ethernet 2553.6 1414.0 0.554 –Module 3 Gigabit Ethernet 960 598.4 0.623 –

InfiniBand 960 757.5 0.797 0.713Sum Gigabit Ethernet 6067.2 3337.4 – –

CICC Cluster Gigabit Ethernet 6067.2 2982.0 0.491 0.44

RDIG RDIG Accounting Normalized CPU time per site and per LHC VOLHC VO June – September 2008

JINR CICCJINR CICC Accounting Normalized CPU time per LHC VOLHC VO June – September 2008

I N T R O D U C T I O N

JINR covers a broad multidisciplinary spectrum of research topics and LIT does mathematical modeling and software development for that.Among them there are two kind of problems~ very large single problems (e.g. lattice chromodynamics),~ very large number of different problems of similar nature (like in LHC experiment).Central Information and Computing Complex (CICC) has to optimally implement the acquired hardware and open software sources.

M O T I V A T I O N

~ Research of efficiency of the software implementation and optimization of hardware modules connection.~ Finding and shooting weak points.~ Finding way for next upgrade.~ To make proposal for Top50 (www.supercomputers.ru) and to make comparison with Top500 (www.top500.org) data.

Fig. 2. Example of the data fitting, fitted curve (polynomial with degree m) to February 2008Module 1 (left) data and Module 3 (right) data.

Fig. 5. Normalized CPU time per site and per LHC VO

Fig. 6. List of top sites sorted by normalized CPU time.

XII International Workshop on Advanced Computing and Analysis Techniques in Physics Research, 3-7 NOV 2008 Erice, Sicily (Italia )

C O N C L U S I O N S

We have found that low cost improvements of the hardware connections resulted in substantial gains concerning the latency decrease and the consistent data handling to/from the associated mass storage peripherals. The increase of the RAM/core also resulted in sensible performance gains too.

With these improvements, the efficiency of the high performance computing done on each of the three homogeneous modules of the LIT-JINR cluster stays at the upper end of the reported statistics on the TOP500 [4] data. Therefore, the home made free software installation is consistent with the similar works worldwide.

As it concerns the efficiency of the distributed computing of interest in Grid data analyses, it also benefited from these additions since the improved rack interconnect is straightforwardly reflected in the speed of this kind of computations.

The highest measured performance of the CICC cluster provided entries to the Top50: - 12th rank within the 7th edition SEP’07, - 23rd rank within the 8th edition MAR’08, - 24th rank within the 9th edition SEP’08.

Fig. 7. Entries of the CICC cluster to the Top50 (in Russian)

Provider [Totals] T-Platfor.&HP SuperMicro SuperMicroStructure Heterogeneous Homogeneous Homogeneous Homogeneous

Year 2007-2008 2007 2008 2008CPU/cores 200/560 120/240 60/240 20/80

Intel Processor – Xeon 5150(dual-core)

Xeon E5430(quad-core)

Xeon X5450(quad-core)

Frequency, MHz – 2660 2660 3000RAM, Gb 1120 480 480 160

Network Gigabit Ethernet

Gigabit Ethernet

Gigabit Ethernet

Gigabit Ethernet & InfiniBand

PerformanceLinpack/Peak,

GFlops2982 / 6067.2 1325 / 2511.04 1414 / 2553.6

GbE IfB

598.4/960 757.5/960

Efficiency Linpack/Peak 0.491 0.528 0.554

GbE IfB0.623 0.797

Coefficientmatrix order 274400 220000 220000

GbE IfB120000 120000

RAM occupancy by coefficient matrix 0.501 0.751 0.751

GbE IfB0.671 0.671

Basic aim of computing

DistributedDistributed Distributed

DistributedParallel Parallel

1000 10000 1000001E-3

0.01

0.1

1

10

100

1000

Module 3: SuperBladeIB-IBFebruary 2008 datam = 3, PoissonFitted curve

Tim

e [s

ec]

Dimension of matrix, N10000 100000

0.1

1

10

100

1000

10000

Module 1: T-Platform'n'HPSeptember 2007 datam = 3, PoissonFitted curve

Tim

e [s

ec]

Dimension of matrix, N

EFFICIENCY IN RDIG

JINR CICC shows roughly 40 percent of the active resources involved in Tier2 RDIG.

FINDING AND SHOOTING WEAK POINTS

The information transfer is secured by one Gigabit Ethernet (GbE) in each machine, while every module connect to the main Backbone Ethernet switch via four-ports GbE trunk, so aggregated in-between modules rate was upgraded up to 4 Gbps.

CONSISTENCY TIME DATA CHECK

The Top500 High-Performance Linpack benchmark (HPL) was used for performance assessment. The consistency of the measured times with the computational complexity of the LU-decomposition involved in the HPL was checked by solving a least squares problem. This analysis yields a third power law of the increase of the computing time with N, the order of the solved linear algebraic system.

Detailed description of the lsq-problem was given in:Gh.Adam, S.Adam, A.Ayriyan, E.Dushanov, E.Hayryan, V.Korenkov, A.Lutsenko, V.Mitsyn, T.Sapozhnikova, A.Sapozhnikov,O.Streltsova, F.Buzatu, M.Dulea, I.Vasile, A.Sima, C.Visan, J.Busa, I.Pokorny, “Performance assessment of the SIMFAP parallel cluster at IFIN-HH Bucharest", Romanian Journal of Physics, Vol. 53, Nos. 5-6, P. 665-677, 2008.