Download - Lenovo system management solutions
LENOVO System Management Solutions
2015 Lenovo All rights reserved.
Luigi Brochard, Lenovo HPC Distinguished Engineer
HPC Advisory Council 2016, Lugano April 21-23.
2
HPC Software Solutions through Partnerships
2015 Lenovo
• Building Partnerships to provide the “Best In-Class” HPC Cluster Solutions for our customers
• Collaborating with software vendors to provide features that optimizes customer workloads
• Leveraging “Open Source” components that are production ready
• Contributing to “Open Source” (i.e. xCAT, Confluent, OpenStack) to enhance our platforms
• Providing “Services” to help customers deploy and optimize their clusters
Customer Applications
Compute Storage Network
OFED
UFM
LenovoSystem x Virtual, Physical, Desktop, Server
OS VM
Systems Management IBM PCM
xCat Extreme Cloud Admin. Toolkit
Parallel File Systems IBM GPFS Lustre NFS
Workload & Resources
IBM LSF HPC & Symphony
Adaptive Moab
Maui/Torque Slurm
Parallel Runtime Intel MPI Open MPI MVAPICH,
IBM PMPI
Compilers & Tools
Intel Parallel Studio, MKL
Open Source Tools: FFTW, PAPI, TAU, ..
Debuggers & Monitoring
Eclipse PTP + debugger, gdb,.. ICINGA Ganglia
Ent
erpr
ise
Sol
utio
n S
ervi
ces
Inst
alla
tion
and
cust
om s
ervi
ces,
may
not
incl
ude
se
rvic
e su
ppor
t for
third
par
ty s
oftw
are
OmniPath
3
xCAT
2015 Lenovo
§ Open Source § Collaboration with IBM § Server Hardware Management § OS Deployment § IP and network service
management § Virtualization Management § CLI § Holistic solution management
§ Weak GUI § Complex to learn § Lacking structure § Poor enablement for web
development § Good for large clusters, difficult for
smaller solutions/enterprise networks
4
WEB ORCHESTRATION Initial GOALs § Provide easy cluster access to new HPC customers using Open Source HPC
Infrastructure § Low cost entry into HPC
§ Visual summary views to help understand cluster usage § Admin Console – User management, Cluster Monitoring § User Console – Jobs submission, Job/Cluster Monitoring
§ Initial target and Proof of Concept trials – China Market § Focus on China Market first – A lot of customers are just coming into HPC workloads § Collaborating with customers to understand their usage models and future requirements § Very positive feedback and market acceptance § LiCO – Lenovo Intelligent Computing Orchestration was released to China market
§ WW Market – Create English version and work with collaborators to release the English version as “Open Source” project : OSMWC § Oxford University collaboration
2015 Lenovo
5
Lenovo Intelligent Cluster Orchestrator (LiCO) What is Web
Console: An Unified GUI • User Portal
(dashboard, submit job, monitor job)
• Admin Portal (dashboard, user/account management)
Future Work Items: • SLURM integration • ICINGA integration • Intel OPA integration • LDAP integration
Lenovo components Open Source/3rd party Lenovo Hardware
xCAT/Confluent
Torque/MAUI GOLD/Ganglia
WEB CONSOLE GUI Installation guide / scripts
Adm
in guide / scripts
OpenMPI, MVAPICH MPICH, Intel Parallel studio
CentOS/RHEL Lustre OFED
Server Storage Network
Main HPC components below the GUI would be part of OpenHPC project
2015 Lenovo
6
Open System Management Web Console (OSMWC)
What is Web Console:
An Unified GUI • User Portal
(dashboard, submit job, monitor job)
• Admin Portal (dashboard, user/account management)
Future Work Items: • SLURM integration • ICINGA integration • Intel OPA integration • LDAP integration
Lenovo components Open Source/3rd party Lenovo Hardware
xCAT/Confluent
Torque/MAUI Ganglia
WEB CONSOLE GUI Installation guide / scripts
Adm
in guide / scripts
OpenMPI, MVAPICH MPICH, Intel Parallel studio
CentOS/RHEL Lustre OFED
Server Storage Network
Main HPC components below the GUI would be part of OpenHPC project
2015 Lenovo
7
END USER PORTAL – TRANSLATED VIEW
2015 Lenovo
8
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
9
Confluent
2015 Lenovo
10
Confluent Goals
2015 Lenovo
§ Lenovo led project to improve upon xCAT heritage § Carries on strong CLI and other facets of xCAT § More structured interface § Easier to learn § Web development enabled – RESTful APIs – good GUI possible § Faster performance/lower memory usage/higher scalability for large solutions § Better equipped to work in smaller configurations without full network control § Enhanced security model § Reuse effort across HPC, Openstack, xClarity efforts § Reuse development effort across multiple projects (Lenovo/external
Ecosystem) § More contributions from third parties
11
Confluent updates • xCAT style noderanges • Client connections persist across server restart (e.g. consoles) • xCAT style commands:
– nodehealth (new) – nodesensors (like rvitals) – nodepower (like rpower) – nodeeventlog (like reventlog) – nodeconsole (like rcons) – nodesetboot (like rsetboot) – nodeidentify (like rbeacon) – nodelist (like nodels)
• Inventory in API (nodeinventory to come, similar to rinv) • Dynamic nodegroups (groups with a ‘noderange’ attribute get expanded) • Enriched debugging facilities • Rotating log support (defaults to daily) 2015 Lenovo
12
Confluent Web UI (consoles without plugin or java)
2015 Lenovo
13
Confluent CLI – through confetti (RESTful API)
2015 Lenovo
14
nodesensors (csv, and time series data)
2015 Lenovo
15
Confluent performance
2015 Lenovo
16
Future High Performance Computing Open Solutions
2015 Lenovo
• Partnering as founding member of OpenHPC initiative to establish a common Open HPC Framework
• Collaborating with Oxford University to create an Open System Management framework for small to medium clusters
• Leading Open Source system management projects: Confluent and soon to be formed OSMWC
• Contributing to xCAT Open Source project to enhance our platforms
• Providing “Services” to help customers deploy and optimize their clusters
Customer Applications
Parallel File Systems
Lenovo GSS
Intel Lustre NFS
Ent
erpr
ise
Sol
utio
n S
ervi
ces
Inst
alla
tion
and
cust
om s
ervi
ces,
may
not
incl
ude
se
rvic
e su
ppor
t for
third
par
ty s
oftw
are
Systems Management
Open System Management WEB Console (OSMWC)
Confluent
xCat Extreme Cloud Admin. Toolkit
OS VM OFED
Compute Storage Network UFM
Leovo System x Virtual, Physical, Desktop, Server
OmniPath
17
Future High Performance Computing Solutions
2015 Lenovo
• Adding new features • Power & Energy awareness • Light weight virtual HPC • Big Data / Spark workload • Managing more than the servers
Customer Applications
Parallel File Systems
Lenovo GSS
Intel Lustre NFS
Ent
erpr
ise
Pro
fess
iona
l Ser
vice
s In
stal
latio
n an
d cu
stom
ser
vice
s, m
ay n
ot in
clud
e
serv
ice
supp
ort f
or th
ird p
arty
sof
twar
e
Open System Management WEB Console (OSMWC) Integration with
OS VM OFED
Compute Storage Network UFM
Lenovo System x Virtual, Physical, Desktop, Server
OmniPath
xCat Extreme Cloud Admin Toolkit, Confluent
18
Future HPC Software Solutions through Partnerships
2015 Lenovo
• Building Partnerships to provide the “Best In-Class” HPC Cluster Solutions for our customers
• Collaborating with software vendors to provide features that optimizes customer workloads
• Bright Computing • Altair • …
Customer Applications
Compute Storage Network
OFED
UFM
LenovoSystem x Virtual, Physical, Desktop, Server
OS VM
Systems Management IBM PCM
xCat Extreme Cloud Admin. Toolkit
Parallel File Systems IBM GPFS Lustre NFS
Workload & Resources
IBM LSF HPC & Symphony
Adaptive Moab
Maui/Torque/ Slurm/ PBSPro
Parallel Runtime Intel MPI Open MPI MVAPICH,
IBM PMPI
Compilers & Tools
Intel Parallel Studio, MKL
Open Source Tools: FFTW, PAPI, TAU, ..
Debuggers & Monitoring
Eclipse PTP + debugger, gdb,.. ICINGA Ganglia
Ent
erpr
ise
Sol
utio
n S
ervi
ces
Inst
alla
tion
and
cust
om s
ervi
ces,
may
not
incl
ude
se
rvic
e su
ppor
t for
third
par
ty s
oftw
are
OmniPath
BC CM
20
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
21
User Job Submission views
2015 Lenovo
22
User Job Submission – provide Scheduler job file
2015 Lenovo
23
Admin / Operator views
2015 Lenovo
24
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
25
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
26
ADMIN PORTAL – TRANSLATED VIEW
2015 Lenovo
27
nodehealth
2015 Lenovo