2012 atlas technical i nterchange meeting annecy, france

13
2012 ATLAS Technical Interchange Meeting Annecy, France Stephen Gray Dell Global CERN/LHC Technologist +1.512.574.5032 | [email protected]

Upload: lot

Post on 22-Feb-2016

21 views

Category:

Documents


0 download

DESCRIPTION

2012 ATLAS Technical I nterchange Meeting Annecy, France. Stephen Gray Dell Global CERN/LHC Technologist + 1.512.574.5032 | Stephen [email protected]. Building a “Bulldozer” Processor. Each processor die is composed of 4 “Bulldozer” modules - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

2012 ATLAS Technical Interchange MeetingAnnecy, France

Stephen GrayDell Global CERN/LHC Technologist+1.512.574.5032 | [email protected]

Page 2: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program

Building a “Bulldozer” Processor

• Each processor die is composed of 4 “Bulldozer” modules

• Module divisions are transparent to shared hardware, operating system or application

• The modular architecture speeds chip development and increases product flexibility

Server:“Interlagos” –16 cores (2 dies) “Valencia” –8 cores (1 die) Client:“Zambezi” –8 cores (1 die)

Shared L3 Cache

NB/HT LinksMemory Controller

DELL/AMD CONFIDENTIAL

8MB Shared L3 Cache per die

Page 3: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program

DDR3

Romley EP Platform

Sandy Bridge CPU

Sandy Bridge CPU

Patsburg

QPIDDR3

DDR3

DDR3

MemoryDDR3 & DDR3L

RDIMMs & UDIMMs, LR DIMMs4 channels per socket, up to 3 DPC; speeds up to DDR3 1600

PCI Express* 3.040 lanes per socket

Extra Gen 2 x4 on 2nd CPU

DDR3

DDR3

DDR3

PCIe

3 x8

PCIe

3 x8

PCIe

3 x8

PatsburgOptimized Server & WS

PCHIntegrated Storage:

Up to 8 ports 6Gb/s SASRAID 5 optional

Sandy Bridge CPUsUp to 8 cores / socket

with up to 20M of cache

DM

I2

PCIe

3 x8

PCIe

3 x8

PCIe

3 x8

PCIe

2 x4

QPI2 QPI links with

bandwidth up to 8 GT/s

QPI

DDR3

PCIe

3 x8

PCIe

3 x8

PCIe

3 x8

PCIe

3 x8

DELL/Intel CONFIDENTIAL

Page 4: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program4 Confidential

All HS 06 Test Before 12/2011

6 8 12 16 32 64 6.00

106.00

206.00

306.00

406.00

506.00

606.00

79.09

101.52157.33

519.46

48.5565.20

129.61

330.25

531.50

Intel Sandy BridgeAMD Interlagos

Cores Present

HEPS

PEC0

6/sy

stem

Notes:* All tests are 32-bit, hyperthreading disabled, clock speed up enabled* Multiple tests on the same proc type are averaged * 32 core AMD is 3.0 GHz, all others are 2.3 GHz* Intel 6 & 12 core is 2.0 GHz, 8 core is 1.6 GHz, 32 core is 2.7 GHz* 64 Core Intel is a 4 x Socket 2.4GHz R820

Page 5: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program5 Confidential

Core Control

1 2 48 16 32

64

0

100

200

300

400

500

600

11 21 4386

170

337

504

00

0

101

175

358

566

21 27 51

93

161

316

536

22 4174

128

255

502

0

AMD Interlagos - Numactl BindingIntel Sandy Bridge - BIOs Core DowningIntel Sandy Bridge - Numactl Bind-ingIntel Sandy Bridge - Numactl Bind-ing HT Off

Cores/Threads

HEP

SPEC

06

Notes:- All tests used RHEL 6.2 and Gcc 4.4.5- Intel SB numbers are from an R820 with 4 x 2.4GHz 8 core engineering processors and HT enabled- AMD Interlagos numbers come from a C6145 with 4 x 6276 2.3GHz production processors

Page 6: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program6 Confidential

Very Cool Scalability

1 2 4 8 16 32 640.0

0.2

0.4

0.6

0.8

1.0

1.2

101.3% 100.4%100.7%

97.1% 97.8%

49.6%

28.6%

88.9%82.4%

73.1%

96.3%

69.6%AMD Interlagos - 4 Socket Numactl Binding

Intel Sandy Bridge - 4 Socket Numactl Binding

Cores/Threads

SPee

d U

p

Notes:- RHEL 6.2 and gcc 4.4.5 used for all tests- Sandy Brigde numbers are from an R820 with 4 sockets 2.4GHz 8 core engineering processors w/ HT en-abled- Interlagos numbers are from a C6145 with 1 tray with 4 socket Optern 6276 2.3GHz production pro-cessors

Page 7: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program7 Confidential

AMD C6145 Interlagos Map

Page 8: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program8 Confidential

Intel Sandy BridgeGet the Map Right

Page 9: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program9 Confidential

The Problem is an Old One• New x86 systems think they

are SMP• As many CPUs in 2u as an HP

SuperDome in a 42u rack (eta 2004)• One must relearn

process/thread binding

Page 10: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program10 Confidential

OS Effect On HS06

5.5/5.76.2

RHEL 7A wo tune RHEL 7A w

tune RHEL 7A w tune & avx

0

100

200

300

400

500

600

198 198 207

291 309

586 587

428

503541 548 Intel Westmere-C6100 w 2x Intel

x5670 2.66GHz 6C

Intel Sandy Bridge-R820 w 2 x Intel SB 2.4GHz 8C

AMD Interlagos - C6145 w 4 x AMD 6276 2.3GHz 16C

Operating System

HEP

SPEC

06

Notes:- R820 with RHEL 7A and GCC 4.6.2 com-piled all HEPSPEC06 benchmarks except Deal II (see whitepaper)-C6145 with RHEL 7A and GCC 4.6.2 compiled all HEPSPEC06 benchmarks ex-cept Deal II (see whitepaper)- The "w tune" designation refers to the linux64-cern.cfg file compiler flags being modified to include the -march=bdver1 for AMD's Interlagos and -march=corei7 for Intel's Sandy Bridge- No patching or tuning to the OS was made

Page 11: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program11 Confidential

Newer OSes Vs SL 5.5/5.7

Inter RHEL 7A SB RHEL 7A Inter RHEL 6.2 SB RHEL 6.2 Westmere SL 6.280.00%

100.00%

120.00%

140.00%

160.00%

180.00%

200.00%

220.00%

186%

201%

118%

106%100%

Operating System Perform...

SL or RHEL 55/57 vs

Perc

ent

Incr

ease

Notes:- No tuning was per-formed on RHEL 7A runtimes- The standard linux32_cern.cfg was used for all testing- AMD Interlagos num-bers are from a C6145 tray with 4 x AMD 6276 2.3 GHz 16 core processor- Intel Sandy Bridge numbers are from an R820 with a 2.4 GHz 8 core processor- Intel Westmere num-bers are from a C6100 tray with Intel X5650 2.66 GHz 6 core pro-cessors

Page 12: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program12 Confidential

SB R 7A w tune - 64T

Inter R7A w tune - 64C

SB R 7A wo tune - 64T

Inter R 7A wo tune - 64C

Inter R 6.2 - 64C

Inter SL 5.7 - 64C

SB R 6.2 - 64T

SB SL 5.7 - 64T

Westmere SL 5.5/6.2 - 12C

0 30 60 90 120 150 180 210

68

73

68

74

80

93

129

137

202

40,000 HS06 Target

Systems Required

Servers Required

Syst

em O

S

Notes:- The standard linux64_cern.cfg was used for SL 5.7 and RHEL 6.2- AMD Interlagos numbers are from a 4 x 2.3GHz 16c processor- Intel Sandy Bridge numbers are from a 4 x 2.4GHz 8c processors- Intel Westmere are from 2 x 2.66 GHz 6c processors- Hyper threading was enabled on all Intel testing- HS06 numbers are based on total system ores/threads- The "w tune" designation refers to the linux64-cern.cfg file compiler flags being modified to include the -march=bdver1 for AMD's Interlagos and -march=corei7 for Intel's Sandy Bridge

Page 13: 2012  ATLAS Technical  I nterchange Meeting Annecy, France

Dell LHC Program13 Confidential

Walk A Way• Intel Sandy Bridge is Fast (Porsche GT3)

• Must learn to use Numactl to bind thread• Expensive - Intel = $18362.22, ~1000 HS06

Intel Solution: Dell PowerEdge C6220, $18.36/HS06, 8 E5-2670 2.6GHz 8C, 128GB 1600MHz (total RAM per C6220, 2GB/core), 8 500GB drives.

• Interlagos (Volkswagen GTI)• Must learn to use Numactl to bind threads• For some applications you must turn half the

cores• Cheaper -

AMD = $11011.65, ~1000 HS06 AMD Solution: Dell PowerEdge C6145, $11.01/HS06, 8 6276 2.3GHz 16C, 256GB 1600MHz (total RAM per C6145, 2GB/core), 8 500GB drives; $11011.65, ~1000 HS06

• New Operating Systems and Gcc are your friend