re-visiting power measurement for the green500a power-measurement methodology for large-scale,...

19
EE HPC Working Group http://eehpcwg.lbl.gov/ http://www.green500.org New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 Re-Visiting Power Measurement for the Green500 Thomas R. W. Scogland (LLNL/CASC, Green500) 1

Upload: others

Post on 04-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

Re-Visiting Power Measurement for the

Green500Thomas R. W. Scogland (LLNL/CASC, Green500)

1

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

Level 1 Requirements

• Workload phase: Measure at least 20% of the middle 80% of the core phase

• Machine fraction: Measure at least 1/64th of the system or 1kW, whichever is greater

• Subsystems measured: Measure the compute components, network, storage and other subsystems are not required

2

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

Workload Phase:A classic HPL Profile

3

250

300

350

400

0 10000 20000Time from start (seconds)

Powe

r (kW

) Nearly flat,except…

Job launch Job cleanup

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

The Core Phase

• The time period under test

• Possible core phases:

• Job scheduling -> Job completion

• Application start -> application end

• Benchmark start -> benchmark end

• Any is valid, so long as it matches your other metrics

4

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

250

300

350

400

0 10000 20000Time from start (seconds)

Powe

r (kW

)Segment Core Startup Tear−down

5

Core phase cuts off most of the cruft

The Core Phase: Linpack Example

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

What do we require now?

6

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

250

300

350

400

0 10000 20000Time from start (seconds)

Powe

r (kW

)

Segment Core Startup Tear−down

Workload Timing by Measurement Level

7

20%Level 1

Level 2: evenly spaced average measurements

Level 3: Continuously integrated energy

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

250

300

350

400

0 10000 20000Time from start (seconds)

Powe

r (kW

)

Segment Core Startup Tear−down

Power Variability

8

Core phase average:398.7

First 20%:398.1

Last 20%:398.2

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

Why Change the Requirement?

9

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

Newer system designs have a different pattern.

10

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

Piz Daint (GPU accelerated) Linpack Profile

11

400

600

800

0 2000 4000 6000 8000Time from start (seconds)

Powe

r (kW

)

Segment Core Startup Tear−down

Tail-off is much longer

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

200

400

600

800

0 2000 4000 6000 8000Time from start (seconds)

Powe

r (kW

)

Segment Core Startup Tear−down

Core Phase Averaged for Piz Daint

12

Core phase average:833.4

First 20%:873.8

Last 20%:698.4

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

200

400

600

800

0 2000 4000 6000 8000Time from start (seconds)

Powe

r (kW

)

Segment Core Startup Tear−down

Core Phase Averaged for Piz Daint

13

25%Lower average

power in the last 20%!

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

What do we propose?

14

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

250

300

350

400

0 10000 20000Time from start (seconds)

Powe

r (kW

)

Segment Core Startup Tear−down

Workload Timing by Measurement Level

15

100%Level 1

Level 2: evenly spaced average measurements

Level 3: Continuously integrated energy

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

Measurement Fraction

• Level 1 requires 1/64th of the machine

• Which 64th of the machine?

16

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

A Power-Measurement Methodology for Large-Scale, High-Performance Computing, International Conference on Performance Engineering, March 2014

Variability Across Levels:SuperMUC

17

Quality Level Mflops/Watt full run Efficiency Drop From Level 1

L1 (compute only) 1055 0

L2 (>10kW)(compute and interconnect) 1011 44 (~4%)

L2 (>1/8)(compute and interconnect) 994 61 (~6%)

L3(compute, interconnect, storage,

cooling, power distribution)887 168 (~16%)

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

A Power-Measurement Methodology for Large-Scale, High-Performance Computing, International Conference on Performance Engineering, March 2014

Subsystem Contribution

• Networks have been considered “in the noise” by Level 1 to this point

• We have increasing reports of the network contributing 10-20% of overall power use

18

EE HPC Working Grouphttp://eehpcwg.lbl.gov/ http://www.green500.org

New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014

Conclusions

• Our current requirements for level 1 are no longer sufficient

• We propose raising the requirements of Level 1:

• Measurement phase: 100% of the core phase

• System fraction: 1/16th or more

• Subsystems included: Compute and networking

19