dell - hpc: soluzioni applicate al mondo della ricerca
DESCRIPTION
Event presentation on Dell's HPC TechnologyTRANSCRIPT
HPC: Soluzioni applicate al mondo della Ricerca Come Dell aiuta i suoi clienti a semplificare l’High-Performance computing
Marco Verardi | Solution Consultant | Dell
Cambridge Solution Centre
What is the HPC Solution Centre?
• It is a Dell - Cambridge HPC Centre of Excellence based out of the Cambridge HPC service in Cambridge UK
• It is jointly managed by Dell & Cambridge with technical input from Dell, Cambridge & a range of HPC technology partners
• The Centre also contains one of the largest production HPC systems in the UK & a fully equipped & staffed HPC & Storage Solution Lab
2 Pisa, November 24, 2010
How does the HPC Service work?
• The Cambridge HPC Service is run as a pay at point of use cloud computing business.
–4 years development in creating service provision models & software tools to enable such an operation
–The whole design & running of the HPC service has been optimised for the cloud computing model,
–Transitioned from 100% subsidised to 100% self sustained
• This determines the design, usage model & funding profile of the Service
Pisa, November 24, 2010 3
• Provides HPC resources & support to 60 research groups 350 users across 12 departments of the University
• Growing external research organisations & industry users!
• Broad range of application types, usage modes, job sizes & CPU to data ratio’s.
• Common requirement, flexible access, fast turn around & high technical support capability.
• The Cambridge HPC cloud user base is one of the largest & most demanding any where in the world with world leading application knowledge
Cambridge HPC Users
What Does the Centre offer Dell?
5 Pisa, November 24, 2010
• It will produce targeted HPC solutions & POC’s to meet real life customer requirements, these designs can then be used in the sales process
• It provides large benchmarking & test facilities • It provides a centre for multiple technologies with a broad range of
Dell & 3rd Party HPC equipment acting as an HPC compute & storage Lab for creation of new solution stacks & performance data
– Lustre parallel file systems, storage design, implementation and operation – GPU Clusters – CUDA programming and GPU cluster system integration – Scientific visualisation – commodity 3D Visualisation and remote visualisation – Commodity shared memory solutions – ScaleMP – multi user HPC
configurations, testing, benchmarking best practises – Application optimisation and benchmarking, MPI profiling – Resource management and scheduling – HPC Cloud provisioning
Technical Solutions To Date
During the first quarter of the Solution Centre operation a number of technical solutions have been designed, built & tested:
–Dell MD3000 Lustre brick
–Dell MD3000 NFS – IPoIB
–Remote visualization
–GPU cluster
Pisa, November 24, 2010 6
Dell-LHC Team
7 Pisa, November 24, 2010
Enabling researchers to have more compute power and storage space
Dell ATLAS Test Site Research
• Compilers (Intel vs.. gcc) for ATLAS codes • BIOS settings
– Logical processor (hyper-threading) › How many ATLAS jobs can I run at once?
– Turbo-mode – C-states
• Memory configurations • SRM (BeStMan, StoRM) on Lustre • Data aware scheduling • ATLAS codes running in VMs
DELL CONFIDENTIAL 8
Tiers of Storage for Research Computing
9 Pisa, November 24, 2010
Lustre, dCache, GPFS
NFS CASTOR, SAM, Stornext
Scalable Storage Building Block for LHC
• Redundant storage pairs: – 240TB raw, 200TB Raided storage (2LUNs per array, Raid 5 5+1) per pair
(no file system overhead included in calculation) – 3.4GBps raw write BW, 5.2GBps raw read BW per pair
Pisa, November 24, 2010 10
R710 R710
MD3200 MD3200
Dual Intel® Xeon® E5620 2.4Ghz, 12M Cache,Turbo, HT, 1066MHz Max Mem 24GB Memory (6x4GB), 1333MHz Dual Ranked RDIMMs Dual 300GB 15K RPM Serial-Attach SCSI 6Gbps 3.5in Hotplug Hard Drive (RAID 1) Dual SAS HBA Qlogic QLE7340 Single Port QDR Infiniband
PowerVault MD3200, SAS, 12 Bay, Dual Controller 12 2TB, NLSAS 6Gb, 7.2K, 3.5 HDD, License Key High Performance Tier
MD1200 MD1200
PowerVault MD1200, SAS, 12 Bay, Dual Controller 12 2TB, NLSAS 6Gb, 7.2K, 3.5 HDD
MD1200 MD1200
MD1200 MD1200
MD1200 MD1200
PowerEdge C6100 Dell’s cluster optimized, shared infrastructure server
11 Pisa, November 24, 2010
High Density Compute
• Four 2S servers in 2u = 20% denser than blades
• Tylersburg, 12xDDR3 w/QR
• 24 x 2.5” or 12 x 3.5” HDD
• 24x GbE, 1 x16 Gen II, 1 x8 daughter card
• IPMI 2.0 management only
• iKVM, PXE support
• Hot plug motherboards & HDD
• Redundant 1400W power supplies
PowerEdge C410x PCIe Expansion Chassis Maximizing space, weight, energy and cost efficiency, with unprecedented flexibility
12 Pisa, November 24, 2010
PCIe EXPANSION CHASSIS CONNECTING 1-8 HOSTS TO 1-16 PCIe
• 3U chassis, 19” wide, 143 pounds • PCI express modules: 10 front, 6 rear • PCI form factors: HH/HL and FH/HL • Up to 225W per module • PCIe inputs: 8PCIe x16 IPASS ports • PCI fan out options: x16 to 1 slot, x16 to 2 slot,
x16 to 3 slot, x16 to 4 slot • GPUs supported: NVIDIA M1060, M2050,
M2070 (TBD)
• Thermals: high-efficiency 92mm fans; N + 1 fan redundancy
• Management: On-board BMC; IPMI 2.0; dedicated management port
• Power supplies: 4 x 1400W hot-plug, high efficiency PSUs; N+1 power redundancy
• Services vary by region: IT Consulting, Server and Storage Deployment, Rack Integration (US only), Support Services
Great for: HPC including universities, oil & gas, biomed research, design, simulation, mapping, visualization, rendering, and gaming
4 GPU / x16 16GPU/5U 3 GPU / x16 12GPU/5U
2 GPU / x16 16GPU/7U 1 GPU / x16 8GPU/7U
Flexibility of the PowerEdge C410x
• Enabling HPC applications to optimize cost / performance equation off single x16
PCI Switch GPU
GPU
GPU
x16
GPU
Host PCI
Switch GPU
GPU
GPU
x16 Host
PCI Switch GPU
x16 Host PCI
Switch GPU
GPU
x16
GPU/U ratios assume PowerEdge C6100 host with 4 servers per 2U chassis
Confidential 13
HIC x16
iPass cable
C410x
HIC
C410x
iPass cable
x16
x16
x16
HIC
C410x
x16
x16
x16
x16
iPass cable
Host HIC
C410x
x16
x16
iPass cable 7U = (1) C410x + (2) C6100 7U = (1) C410x + (2) C6100
5U = (1) C410x + (1) C6100 5U = (1) C410x + (1) C6100
C6100
C6100
C6100
C6100
Introducing the PowerEdge M610x
14 Pisa, November 24, 2010
• Full-height blade server coupled to a uniquely capable PCIe expansion module.
• Dual x16 Gen2 PCIe expansion slots accommodate any standard full length or full height PCIe card.
• Supplemental power connectors allow utilization of up to 2 x 250W or 1 x 350W PCIe cards (including GPGPUs).
Cosa facciamo in Italia
– Federico II, Telespazio, General Electric, INAF, CIRA, ASI, Politecnico MI, INFN, CNAF
– Ci avvaliamo del supporto di Partner locali quali: Altair Engineering e DoIT
Confidential 15
Summary
• Come vi aiutiamo a semplificare l’HPC
– Attraverso Partnership con Istituzioni Leader nel settore, e con Partner locali in grado di erogare i servizi necessari
– Creando un framework abilitante di soluzioni e tecnologie (Cambridge, Atlas)
– Investendo in R&D per la produzione di hardware dedicato a soluzioni specifiche (C410x, M610x)
– Sviluppando Tailored Solutions
Confidential 16
Q&A
17
Questions?
Pisa, November 24, 2010
Thank you.