dell - hpc: soluzioni applicate al mondo della ricerca

HPC: Soluzioni applicate al mondo della Ricerca Come Dell aiuta i suoi clienti a semplificare l’High-Performance computing

Marco Verardi | Solution Consultant | Dell

Cambridge Solution Centre

What is the HPC Solution Centre?

• It is a Dell - Cambridge HPC Centre of Excellence based out of the Cambridge HPC service in Cambridge UK

• It is jointly managed by Dell & Cambridge with technical input from Dell, Cambridge & a range of HPC technology partners

• The Centre also contains one of the largest production HPC systems in the UK & a fully equipped & staffed HPC & Storage Solution Lab

2 Pisa, November 24, 2010

How does the HPC Service work?

• The Cambridge HPC Service is run as a pay at point of use cloud computing business.

–4 years development in creating service provision models & software tools to enable such an operation

–The whole design & running of the HPC service has been optimised for the cloud computing model,

–Transitioned from 100% subsidised to 100% self sustained

• This determines the design, usage model & funding profile of the Service

Pisa, November 24, 2010 3

• Provides HPC resources & support to 60 research groups 350 users across 12 departments of the University

• Growing external research organisations & industry users!

• Broad range of application types, usage modes, job sizes & CPU to data ratio’s.

• Common requirement, flexible access, fast turn around & high technical support capability.

• The Cambridge HPC cloud user base is one of the largest & most demanding any where in the world with world leading application knowledge

Cambridge HPC Users

What Does the Centre offer Dell?


• It will produce targeted HPC solutions & POC’s to meet real life customer requirements, these designs can then be used in the sales process

• It provides large benchmarking & test facilities • It provides a centre for multiple technologies with a broad range of

Dell & 3rd Party HPC equipment acting as an HPC compute & storage Lab for creation of new solution stacks & performance data

– Lustre parallel file systems, storage design, implementation and operation – GPU Clusters – CUDA programming and GPU cluster system integration – Scientific visualisation – commodity 3D Visualisation and remote visualisation – Commodity shared memory solutions – ScaleMP – multi user HPC

configurations, testing, benchmarking best practises – Application optimisation and benchmarking, MPI profiling – Resource management and scheduling – HPC Cloud provisioning

Technical Solutions To Date

During the first quarter of the Solution Centre operation a number of technical solutions have been designed, built & tested:

–Dell MD3000 Lustre brick

–Dell MD3000 NFS – IPoIB

–Remote visualization

–GPU cluster


Dell-LHC Team


Enabling researchers to have more compute power and storage space

Dell ATLAS Test Site Research

• Compilers (Intel vs.. gcc) for ATLAS codes • BIOS settings

– Logical processor (hyper-threading) › How many ATLAS jobs can I run at once?

– Turbo-mode – C-states

• Memory configurations • SRM (BeStMan, StoRM) on Lustre • Data aware scheduling • ATLAS codes running in VMs

DELL CONFIDENTIAL 8

Tiers of Storage for Research Computing


Lustre, dCache, GPFS

NFS CASTOR, SAM, Stornext

Scalable Storage Building Block for LHC

• Redundant storage pairs: – 240TB raw, 200TB Raided storage (2LUNs per array, Raid 5 5+1) per pair

(no file system overhead included in calculation) – 3.4GBps raw write BW, 5.2GBps raw read BW per pair


R710 R710

MD3200 MD3200

Dual Intel® Xeon® E5620 2.4Ghz, 12M Cache,Turbo, HT, 1066MHz Max Mem 24GB Memory (6x4GB), 1333MHz Dual Ranked RDIMMs Dual 300GB 15K RPM Serial-Attach SCSI 6Gbps 3.5in Hotplug Hard Drive (RAID 1) Dual SAS HBA Qlogic QLE7340 Single Port QDR Infiniband

PowerVault MD3200, SAS, 12 Bay, Dual Controller 12 2TB, NLSAS 6Gb, 7.2K, 3.5 HDD, License Key High Performance Tier

MD1200 MD1200

PowerVault MD1200, SAS, 12 Bay, Dual Controller 12 2TB, NLSAS 6Gb, 7.2K, 3.5 HDD

MD1200 MD1200

MD1200 MD1200

MD1200 MD1200

PowerEdge C6100 Dell’s cluster optimized, shared infrastructure server


High Density Compute

• Four 2S servers in 2u = 20% denser than blades

• Tylersburg, 12xDDR3 w/QR

• 24 x 2.5” or 12 x 3.5” HDD

• 24x GbE, 1 x16 Gen II, 1 x8 daughter card

• IPMI 2.0 management only

• iKVM, PXE support

• Hot plug motherboards & HDD

• Redundant 1400W power supplies

PowerEdge C410x PCIe Expansion Chassis Maximizing space, weight, energy and cost efficiency, with unprecedented flexibility


PCIe EXPANSION CHASSIS CONNECTING 1-8 HOSTS TO 1-16 PCIe

• 3U chassis, 19” wide, 143 pounds • PCI express modules: 10 front, 6 rear • PCI form factors: HH/HL and FH/HL • Up to 225W per module • PCIe inputs: 8PCIe x16 IPASS ports • PCI fan out options: x16 to 1 slot, x16 to 2 slot,

x16 to 3 slot, x16 to 4 slot • GPUs supported: NVIDIA M1060, M2050,

M2070 (TBD)

• Thermals: high-efficiency 92mm fans; N + 1 fan redundancy

• Management: On-board BMC; IPMI 2.0; dedicated management port

• Power supplies: 4 x 1400W hot-plug, high efficiency PSUs; N+1 power redundancy

• Services vary by region: IT Consulting, Server and Storage Deployment, Rack Integration (US only), Support Services

Great for: HPC including universities, oil & gas, biomed research, design, simulation, mapping, visualization, rendering, and gaming

4 GPU / x16 16GPU/5U 3 GPU / x16 12GPU/5U

2 GPU / x16 16GPU/7U 1 GPU / x16 8GPU/7U

Flexibility of the PowerEdge C410x

• Enabling HPC applications to optimize cost / performance equation off single x16

PCI Switch GPU

GPU

GPU

x16

GPU

Host PCI

Switch GPU

GPU

GPU

x16 Host

PCI Switch GPU

x16 Host PCI

Switch GPU

GPU

x16

GPU/U ratios assume PowerEdge C6100 host with 4 servers per 2U chassis

Confidential 13

HIC x16

iPass cable

C410x

HIC

C410x

iPass cable

x16

x16

x16

HIC

C410x

x16

x16

x16

x16

iPass cable

Host HIC

C410x

x16

x16

iPass cable 7U = (1) C410x + (2) C6100 7U = (1) C410x + (2) C6100

5U = (1) C410x + (1) C6100 5U = (1) C410x + (1) C6100

C6100

C6100

C6100

C6100

Introducing the PowerEdge M610x


• Full-height blade server coupled to a uniquely capable PCIe expansion module.

• Dual x16 Gen2 PCIe expansion slots accommodate any standard full length or full height PCIe card.

• Supplemental power connectors allow utilization of up to 2 x 250W or 1 x 350W PCIe cards (including GPGPUs).

Cosa facciamo in Italia

– Federico II, Telespazio, General Electric, INAF, CIRA, ASI, Politecnico MI, INFN, CNAF

– Ci avvaliamo del supporto di Partner locali quali: Altair Engineering e DoIT

Confidential 15

Summary

• Come vi aiutiamo a semplificare l’HPC

– Attraverso Partnership con Istituzioni Leader nel settore, e con Partner locali in grado di erogare i servizi necessari

– Creando un framework abilitante di soluzioni e tecnologie (Cambridge, Atlas)

– Investendo in R&D per la produzione di hardware dedicato a soluzioni specifiche (C410x, M610x)

– Sviluppando Tailored Solutions

Confidential 16

Q&A

17

Questions?

Pisa, November 24, 2010

Thank you.

dell - hpc: soluzioni applicate al mondo della ricerca

Technology

cambridge hpc service

hpc solution centre

hpc resources

dell cambridge

hpc service hasbeen

hpc service work

hpc compute storagelab

hpc users departments