high performance computing with fujitsu - altron · comparison with the manual installation ......

18
0 © 2017 FUJITSU High Performance Computing with Fujitsu Ivo Doležel

Upload: hoangthuy

Post on 24-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

0 © 2017 FUJITSU

High Performance Computingwith Fujitsu

Ivo Doležel

1 © 2017 FUJITSU

A complete HPC software stack solution

FUJITSU Software HPC Cluster Suite

� HPC cluster general characteristics� HPC clusters consist primarily of compute nodes with exactly the

same hardware� Clusters can have a few to thousands of compute nodes� The software used on each compute node is exactly the same� Compute nodes have no keyboards or displays attached to them

� Fundamental operational requirements� Bare metal deployment with no intervention� Central management and deployment of all needed software

components� A way to control resource usage across the compute nodes� Ability to run many applications (serial & parallel) simultaneously

on the cluster� High-speed inter node communication and access to large data

storage areas� Some sort of shared storage is needed� Monitoring and management of nodes

NCI cluster in Australia3592 CX250 nodes ~ 1.2 PFlop/s

2 © 2017 FUJITSU

Main Features (I)

FUJITSU Software HPC Cluster Suite

� Bare metal deployment of compute nodes

� Central management of node installation images

� Central management of additional software packages

� Automatic management of key configuration files

� Central management of node configuration files

� Central management of users/passwords

� Support for LDAP/AD

Application programs

Workload managerWorkload manager

Operating System

GPGPU and

XEON Phi

software support

GPGPU and

XEON Phi

software support

Cluster deployment and management

Automated installation

and configuration

Automated installation

and configuration

Administrator interface

Operation and monitoring

Administrator interface

Operation and monitoringCluster checkerCluster checker

User environment

management

User environment

management

Management of cluster resourcesManagement of cluster resources Manage serial and parallel jobsManage serial and parallel jobs Fair share usage between usersFair share usage between users

Parallel

Middleware

Scientific

Libraries

Parallel

File System

Compilers, performanceCompilers, performance

and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Red Hat LinuxRed Hat Linux CentOSCentOS

OS DriversOS Drivers

3 © 2017 FUJITSU

Main Features (II)

FUJITSU Software HPC Cluster Suite

� Central management of NFS settings

� Selection of Open source or commercial workload managers

� Selection of message passing environments

� Variation of software configuration within a node group using package groups

� Web based monitoring for node health/usage

� Web based interface for application execution, data management

Application programs

Workload managerWorkload manager

Operating System

GPGPU and

XEON Phi

software support

GPGPU and

XEON Phi

software support

Cluster deployment and management

Automated installation

and configuration

Automated installation

and configuration

Administrator interface

Operation and monitoring

Administrator interface

Operation and monitoringCluster checkerCluster checker

User environment

management

User environment

management

Management of cluster resourcesManagement of cluster resources Manage serial and parallel jobsManage serial and parallel jobs Fair share usage between usersFair share usage between users

Parallel

Middleware

Scientific

Libraries

Parallel

File System

Compilers, performanceCompilers, performance

and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Red Hat LinuxRed Hat Linux CentOSCentOS

OS DriversOS Drivers

4 © 2017 FUJITSU

HPC Cluster Suite (HCS)

FUJITSU Software HPC Cluster Suite

� Bare metal deployment

� Software management

� User management

� Node configuration

� Monitoring and Alerting

� Bare metal deployment

� Software management

� User management

� Node configuration

� Monitoring and Alerting

� Simplicity in using the HPC Cluster and Applications

� More effective use of resources

� Broaden HPC and process reuse

� Share and exchange data more widely

� Simplicity in using the HPC Cluster and Applications

� More effective use of resources

� Broaden HPC and process reuse

� Share and exchange data more widely

Deployment/ManagementDeployment/Management HPC Gateway - Integrated intuitive WEB interface -

HPC Gateway - Integrated intuitive WEB interface -

� Single file namespace across all nodes

� Increases Storage performance

� Required in large or high load I/O configurations

� Fujitsu Exabyte File System Fujitsu developed (Lustre based)

� Single file namespace across all nodes

� Increases Storage performance

� Required in large or high load I/O configurations

� Fujitsu Exabyte File System Fujitsu developed (Lustre based)

� Flexible choice of Workload Manager

� Libraries, Compilers

� Support for Parallel File Systems

� Flexible choice of Workload Manager

� Libraries, Compilers

� Support for Parallel File Systems

Comprehensive & Flexible optionsComprehensive & Flexible options FEFS- Parallel File System -

FEFS- Parallel File System -

5 © 2017 FUJITSU

Comparison with the manual installation

Knowledge/Skills/Commands needed for installing HCS

HCS Manual install Using the HCS Installer

Actions/Commands Skills required Actions/Commands Skil ls required

• Install Linux OS• Upload ISO images• Mount iso’s• Setup SNMP• Setup SMTP• Run rpm• Run CDM installer• Run ifconfig• Run fjkit-mgr• Run cdm-kitops• Run cdm-repoman• Run cdm-ngedit• Run cdm-nfsedit• Run cdm-mpedit• Run cdm-addhost• Reboot a server

• OS install experience• How to copy files• How to mount ISO’s• Editing text files• Configuration of SNMP• Configuration of SMTP• How to install/del RPM’s• Basic HPC architecture• How to run CDM• How to configure network

interfaces• How to install CDM kits, update

CDM Repositories, update/create/configure nodegroups, create CDM NFS export and mount definitions, add new hosts to the configuration

• Reboot a Linux server

• Install Linux OS• Upload ISO images• Edit the hcs.cfg file• Run hcs-installer --unattended• Turn compute nodes on

• OS install experience• How to copy files• Basic HPC architecture• Editing text files• How to run HCS installer

Approximate install time: 2-3 days Approximate install time: 2-3 hours

6 © 2017 FUJITSU

Different systems for varying MESH SIZE

HPC found at all Scales

Meshing balance between accuracy/quality and turnaround time

Mesh size = 0.1m Mesh size = 0.02m Mesh size = 0.005m

Grain Conveyor simulation with Discrete Element Method in STAR-CCM+

7 © 2017 FUJITSU

Different systems utilized for different MODELS

HPC found in many Products

Car beam optimisation Full car offset impactBicycle helmet modelling

Adapted to users segment and capability

8 © 2017 FUJITSU

HPC Cluster – User expectations

・・・

High-speed network (Interconnect)Management network

Head Node (Management Node)

LAN

HPC Cluster

Compute Cluster (Compute Nodes)

Distributed / Parallel processing

Jobs arerun hereJobs arerun here

User submits jobs here

User submits jobs here

Jobs are queued here

Jobs are queued here

hide cluster complexity

more time for creativity

raising productivity

increase innovation

eliminate waste

reliable and predictable results

stable working environment

transferable best practice workflows

maximize application effectiveness

increase project throughput

ease of use

migrate more projects and new users into HPC

optimize the development process

9 © 2017 FUJITSU

Fujitsu HPC Gateway Demo Centre

� Fujitsu HPC Gateway is the end-user interface in PRIMEFLEX for HPC

� An online demonstrator is available for trial

� Users can sign up on dedicated web page

� Login and secure private area are assigned by return

� Initial trial period of 2 weeks

10 © 2017 FUJITSU

PRIMEFLEX for HPC First Application Appliances

Industry: CAE

Customers: Product Manufacturing, Engineering

Application: ANSYS Fluent ANSYS CFX

Models: CFD (Computational Fluid Dynamics)

Industry: CAE, Physics-based simulation

Customers: Product design, Engineering, Geophysics

Application: COMSOL Multiphysics

Model: Geomechanics, subsurface flow, mechanics, chemical

Industry: Automotive, Creative

Customers: Automotive OEM, Creative agencies

Application: VRED

Models: 3D Visualisation, real-time digital prototyping

11 © 2017 FUJITSU

Platform Entry PRIMERGY RX PRIMERGY CX

BaseNo switch –

direct interconnect

Increment Not available

Single-switch maximum Single-switch maximum

Rack None Single cabinet Single cabinet

PRIMEFLEX for HPC Appliance Building blocks

12 © 2017 FUJITSU

Select Your Preferred Hardware Platform

Scalability,

Infrastructure

density

Capacity

Capability

Flexibility to address all kinds of customer requirements

� PRIMERGY CX400 skinless server − Massive scale-out due ultra dense server− GPU coprocessor support

� PRIMERGY blade server− Industry leading blade server density

� PRIMERGY rack server

� CELSIUS workstations

PRIMERGY Blade Server

CELSIUS

PRIMERGYScale-out Server

PRIMERGYRack Server

Scalability,

Compute density

13 © 2017 FUJITSU

Supercomputers since 1977, PRIMERGY in HPC for more than 10 Years!

*NWT: Numerical Wind Tunnel

AP3000

PRIMEPOWER HPC2500

Most Efficient Performancein Top500 (Nov. 2008)

PRIMEQUEST

FX1 K computer

FX10

Exascale

Japan’s First Vector (Array) Supercomputer(1977)

F230-75APU

VPP5000

VPP300/700

VPP500

VP Series

NWT*Developed with NAL

World’s FastestVector Processor (1999)

ⒸJAXA

No.1 in Top500(Jun / Nov 2011)

Japan’s Largest Cluster in Top500(July 2004)

PRIMERGY BX400/900Cluster node

HX600Cluster node

PRIMERGY RX200Cluster node

PRIMERGY CX400Scale-out server

No.1 in Top500(Nov. 1993)Gordon Bell Prize (1994, 95, 96)

AP1000

World’s Most ScalableSupercomputer (2003)

SPARCEnterprise

FX100

PRIMERGY CX600Scale-out server

next x86 generation

14 © 2017 FUJITSU

FUJITSU Server PRIMERGY CX600 M1Multi-node chassis

Platform for highly parallel computing

� Maximum density with 8 nodes in 2U

� Specialized for parallel workloads

� Compliant to conventional datacenter environment

� Optimized software stack

CX600 CX1640

HPC optimized scale-out server platform based on Intel Xeon Phi 7200 (“Knights Landing”) technology

Single socket Xeon Phi server node for significant performance boost in parallel-processing

15 © 2017 FUJITSU

FUJITSU Server PRIMERGY CX600 M1Server Node

HPC Usage ScenariosHPC Usage Scenarios

Head Node

Login Node

Compute Node

File Server Node

NASAccelerator Card Node

Parallel computing node� Condensed half-width-1U

server node � 8x CX1640 M1 per chassis

� Intel® Xeon Phi™ processor 7200 product family

� 16 GB high-bandwidth on-package MCDRAM memory, >500GB/sec

� Additional 6x DDR4 memory DIMMs, up to 384 GB, 2,400 MHz

� At air cooling: 1x SATADOM or 1x 2.5” non hotplug HDD / SSD, At liquid cooling: 1x SATADOM

� Fan less server node with shared power and cooling

PRIMERGY CX1640 M1

16 © 2017 FUJITSU

A path towards Exascale …

Higher... Faster... Further...

… enforces a deployment of parallelism at each level to the ultimate extent:� Node level (distributed memory)� Multi socket (shared memory on nodes)� CPU level (number of cores)� Instruction level (SIMD)

Challenges� Node parallelism � ultra-high-speed interconnect � CPU parallelism � higher memory bandwidth � greater complexity of memory hierarchy� Core parallelism � increase of system errors� Amdahl‘s Law � every portion of serial code lowers the overall performance

Name FLOPSexaFLOPS 1018

petaFLOPS 1015

floating-point operations per second

Intel® Xeon Phi™ Processor

Intel Xeon E5 Processor22 cores, 44 threads

Towards Many CoreArchitecturese.g. Intel® Xeon Phi™ 7200 product family (up to 72 cores)

17 © 2017 FUJITSU