hs06 performance per watt and transition to sl6
DESCRIPTION
HS06 performance per watt and transition to SL6. Michele Michelotto – INFN Padova. The SL6 transition. Rumors of sizeable differences of HS06 across Scientific Linux distribution on same hardware I made detailed measurements on AMD Opteron 6272 SL5 with gcc 4.1.2 SL6 with gcc 4.4.0 - PowerPoint PPT PresentationTRANSCRIPT
HS06 performance per watt and transition to SL6
Michele Michelotto – INFN Padova
1
The SL6 transition
2
Rumors of sizeable differences of HS06 across Scientific Linux distribution on same hardware
I made detailed measurements on AMD Opteron 6272 SL5 with gcc 4.1.2 SL6 with gcc 4.4.0 SL6 with last compiler available at that time 4.7.0
End of August We started collecting SL6 results from WLCG sites Alessandra Forti asked to send results to me and
Manfred Manfred created a new page on the HEPiX site.
SL6 performance vs. SL5
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 360
50
100
150
200
250
300
SL5SL6
SL6+gcc4.7 vs. SL6 vs. SL5
4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 360
50
100
150
200
250
300
AMD Opteron 6272 HS06 32bit
threads
HS
06
SL6+gcc4.7 and SL6 gcc 4.4Diff with SL5 and gcc 4.1.2
5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 360.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
AMD Opteron 6272 HS06 32bit
threads
HS
06
Let’s do it on Intel Xeon
6
Differences SL6 gcc4.7 and SL6 gcc4.4 wrt SL5 gcc4.1
7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 380.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
HEP-SPEC06eon E5 HS06 32bit
SL6 / SL5 Ratio
gcc4.7/SL5 ra-tio
threads
HE
P-S
PE
C06
HEP-SPEC06 site maintained mainly by Manfred
8
SL6 vs. SL5 from pair of similar worker node
9
AMD Opteron
6168
AMD Opteron
6174
AMD Opteron
6276
Intel Xeon 5520
Intel Xeon 5520
Intel Xeon
E5-2665
Intel Xeon
E5-2670
Intel Xeon
E5-2670
Intel Xeon
E5-2670
Intel Xeon
E5520
Intel Xeon
E5520
Intel Xeon
E5630
Intel Xeon
X5650
Intel Xeon
X5650
Intel Xeon
X5650
Intel Xeon
X5650
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
SL6/SL5
Ivy Bridge vs. Sandy Bridge
10
Ivy Bridge vs. Sandy Bridge 64 bit
11
Performances per clock
12
Mail from Manfred on Friday 25th
13
DELL C6620 (2U, 4nodes) Each node
2 x Intel Xeon E5-2670 v2 – 10 cores (20 Logical cpu) @ 2.5 GHz
64 GB (8x8 GB PC3-14900) 6x900 GB SAS
342 HS06 (20 copies) - 411 HS06 (40 copies)
Adding Manfred new beast
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 480
20
40
60
80
100
120
140
160
180
200
dual e5-2697v2 (2.7GHZ 24 cores 48 L-CPU) vs
dual E5-2660 (2.2 GHz 16 cores 32 L-CPU)Vs
dual E5-2670v2 (2.5 Ghz 20 cores 40 L-CPU)
2697v2 2.7 GHz 2660 2.2 GHz 2670v2 2.5 MHz
# concurrent runs
HS06
per
MHz
A new architecture: ARM
15
A new architecture
16
Exynos4412 Prime CPU
1.7 GHz Cortex -A9 quad core
2GB LP-DDR2 memory (512MB/core)
$89 each Fedora 18, armV7,
gcc4.8, ODROID kernel
Courtesy of Peter Elmer, Princeton Univ.
17
HS06 measured on ARM
18
0 1 2 3 4 50.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
HS06HS06/core
Measurements of power consumption
19
Measurements of voltage, amperage and power consumption
The power logger Measurements setup Single core Multicore 32 bit measurements. 64 bit measurements Collecting results from Manfred Measurements on ARM
Fluke 1735 Three-Phase Power Logger
20
Measurement setup for single phase
21
On display
22
Power logger sw
23
Idlecompile First run Second run Third run
Black average – Green min –Red Max
24
Power consumption (Watt) on Intel Xeon E5 2660
25
32 copies
30 copies
28 copies
26 copies
24 copies
22 copies
20 copies
18 copies
16 copies
14 copies
12 copies
10 copies
8 copies
6 copies
4 copies
3 copies
2 copies
1 copy gcc idle0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
Intel Xeon E52660 - 2PSU
Min Average Max
26
32 copies
30 copies
28 copies
26 copies
24 copies
22 copies
20 copies
18 copies
16 copies
14 copies
12 copies
10 copies
8 copies
6 copies
4 copies
3 copies
2 copies
1 copy gcc idle0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
450.00
Efficiency HS06/Watt
27
32 copies
30 copies
28 copies
26 copies
24 copies
22 copies
20 copies
18 copies
16 copies
14 copies
12 copies
10 copies
8 copies
6 copies
4 copies
3 copies
2 copies
1 copy gcc idle0.00
200.00
400.00
600.00
800.00
1000.00
1200.00
HS06/kWatt
Historical Trend from Manfred
28
Jan-04 May-05 Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-130.00
0.20
0.40
0.60
0.80
1.00
1.20
HS06/WattXEON E5 26x0
XEON 54xx
XEON 51xxAMD 2xx
AMD 6168
HS06/Watt with ARM processor
29
Jan-04 May-05 Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13 Dec-140.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
HS06/Watt
XEON E5 26x0
ARM
XEON 54xxXEON 51xxAMD 2xx
AMD 6168
Mail from Manfred on Friday 25th
30
DELL C6620 (2U, 4nodes) Each node
2 x Intel Xeon E5-2670 v2 – 10 cores (20 Logical cpu) @ 2.5 GHz
64 GB (8x8 GB PC3-14900) 6x900 GB SAS
342 HS06 (20 copies) - 411 HS06 (40 copies) 1450 Watt on four nodes 362 Watt/node
Mail from Manfred on Friday 25th
31
DELL C6620 (2U, 4nodes) Each node
2 x Intel Xeon E5-2670 v2 – 10 cores (20 Logical cpu) @ 2.5 GHz
64 GB (8x8 GB PC3-14900) 6x900 GB SAS
342 HS06 (20 copies) - 411 HS06 (40 copies) 1450 Watt on four nodes 362 Watt/node
Jan-04 May-05 Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13 Dec-140.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
HS06/Watt
HS06/Watt +XeonE5v2670v2 (Manfred)
32
Future work
33
New Xeon E5 v2 very good performances Detailed measurements on Xeon E5 v2 in
HS06/watt New Intel server processor
Avoton New ARM processors
64bit processor will be available