vincent keller, ralf gruber, epfl intelligent grid scheduling service (iss) k. cristiano, a. drotz,...
Post on 22-Dec-2015
217 views
TRANSCRIPT
![Page 1: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/1.jpg)
Managed by
Vincent Keller, Ralf Gruber, EPFL
Intelligent GRID Scheduling Service (ISS)
Intelligent GRID Scheduling Service (ISS)
Managed by
K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti, P. Manneback, M.-C. Sawley, U. Schwiegelshohn,
M. Thiémard, A. Tolou, T.-M. Tran, O. Wäldrich, P. Wieder, C. Witzig, R. Yahyapour, W. Ziegler,
“Application-oriented scheduling for HPC Grids”,CoreGRID TR-0070 (2007) available on http://www.coregrid.net
![Page 2: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/2.jpg)
Managed by
Outline
• ISS Goals• Applications & Resources characterization• ISS architecture• Decision model : CFM• ISS Modules/Services Implementation Status• Testbeds (HW & SW)
![Page 3: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/3.jpg)
Managed by
Goals of ISS
1. Find most suited computational resources in a HPC Grid for a given component2. Use best an existing HPC Grid3. Predict best evolution of an HPC Grid
![Page 4: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/4.jpg)
Managed by
Γ model : Characteristic parameters of an application task*
O: Number of operations per node [Flops]W: Number of main memory accesses per node [Words]Z: Number of messages to be sent per nodeS: Number of words sent by one node [Words]
Va=O/W:Number of operations per memory access [Flops/Word]
a = O/S: Number of operations per word sent [Flops/Word]
*suppose the parallel subtasks are well equilibrated
![Page 5: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/5.jpg)
Managed by
Γ model : Characteristic parameters of a parallel machine
P: Number of nodes in a machineR: Peak performance of a node [Flops/s]M: Peak main memory bandwidth of a node [Words/s]
VM=R / M: Number of operations per memory access [Flops/Word]ra= min (R , M * Va): Peak task performance on a node [Flops/s]tc= O/ra: Minimum computation time [s]
Note: ra= R min (1, Va/VM)
![Page 6: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/6.jpg)
Managed by
Γ model : Characteristic parameters of the internode network
C: Total network bandwidth of a machine [Words/s]L: Latency of the network [s]<d>: Average distance (= number of links passed)
Vc=P R/ C: Number of operations per sent word [Flops/Word]b=C/(P*<d>): Inter-node communication bandwidth per node [Words/s]tb=S/b: Time needed to send S words through the network [s]tL=LZ: Latency time [s]T=tc+ tb+ tL: Minimum turn around time of a task*
M=(ra/b)(1+tL/tb): Number of operations per word sent [Flops/Word]B=b L: Message size taking L to be transfered
*I/O is not considered and communication cannot be hidden behind computation
![Page 7: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/7.jpg)
Managed by
model (One value per application and machine)
> 1
5012
1
/11
1/11
>E>Γ
P>A>Γ
Γ+=
P
A=E
Γ+
P=A
Speedup
= a / M
Task/application: a = O / S [flops/64bit word] Machine (if LZ/S<<1): M = ra / b [flops/64bit word]
Efficiency
![Page 8: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/8.jpg)
Managed byParameters of some Swiss HPC machines
1’8006030682 3.182.7211021.399Pleiades 2+
128
160**
100*
62
30
3
0.3
bMwords/s
128
2’650
1’065
14
3.75
0.4
0.003
C
Gwords/s
18128816SX-5***
6.8
2.5
10
60
60
60
L
s
20019’2007.50.860610NoW
3.3
22
154
179
1’792
VC
f/w
5255’836121.648Terrane
5.2
5.6
9.6
5.6
5.6
R
Gflops/s
1’0806.50.88’6501’664Horizon
25080.722’9374’096BlueGene
62061.62’150224Mizar
1’80070.8672120Pleiades2
18070.8739132Pleiades1
BWords
VM
f/w
M
Gwords/s
P R
Gflops/s
PCluster
*<d>32 for half of C
**<d>10*** decommissioned
![Page 9: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/9.jpg)
Managed byExample: Speculoos
Pleiades 2GbE=3.8
Pleiades 1FE
=1.4
Pleiades 2+GbE=1.6
![Page 10: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/10.jpg)
Managed byISS/VIOLA environment
![Page 11: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/11.jpg)
Managed by
ISS : Job Execution Process
Goal: Find most suited machines in a Grid to run application components
![Page 12: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/12.jpg)
Managed by
Cost Function Model
![Page 13: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/13.jpg)
Managed by
Cost Function Model
• CPU Costs Ke
• licence fees Kl
• Results waiting time Kw
• Energy Costs K eco
• Data Transfer Costs Kd
• All the costs are expressed in Electronic Cost Unit (ECU)
![Page 14: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/14.jpg)
Managed by
Cost Function Model : CPU costs
with investment cost, maintenance fees, bank interest, etc..
![Page 15: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/15.jpg)
Managed by
Cost Function Model : Broker
• The broker computes a list of machines with their relative costs for a given application component
• This ordered list is sent to the MSS for final decision and submission
![Page 16: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/16.jpg)
Managed by
Other important goal of ISS
Simulation to evolve cluster resources in a Grid(uses the same simulator as to determine , ,
using statistical application execution data over a long period in time (same data as to determine , ,
Support tool to decide on how to choose new Grid resource
![Page 17: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/17.jpg)
Managed by
Side products
VAMOS monitoring service (measurement of Ra, )Application optimization (increase Va, Ra)
Processor frequency adaptation (reduce energy consumption)
![Page 18: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/18.jpg)
Managed by
What exists?
Simulator to determine , , VAMOS monitoring service to determine
Cost Function Model
![Page 19: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/19.jpg)
Managed by
What is in implementation phase?
Interface between ISS and MSS (first version ready by end of June 07)Ra monitoring (ready by end of Mai 07)
Cost Function Model (beta version ready by end of 07)Simulator to predict new cluster acquisition (by the end of 07)
![Page 20: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/20.jpg)
Managed by
Application testbed
CFD, MPI: SpecuLOOS (3D spectral element method)CFD, OpenMP: Helmholtz (3D solver with spectral elements)
Plasma physics, single proc: VMEC (3D MHD equilibrium solver)Plasma physics, single proc: TERPSICHORE (3D ideal linear MHD stability analysis)
Climate, POP-C++: Alpine3D (multiphysics, components)Chemistry : GAMESS (ab-initio molecular quantum chemistry)
![Page 21: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/21.jpg)
Managed by
First hardware testbed
UNICORE/MSS/ISS GRID
Pleiades 1 (132 single proc nodes, FE switch, OpenPBS/Maui)Pleiades 2 (120 single proc nodes, GbE switch, Torque/Maui)
Pleiades 2+ (99 dual proc/dual core nodes, GbE switch, Torque/Maui)CONDOR pool EPFL (300 single & multi proc nodes, no interconnect network)
![Page 22: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/22.jpg)
Managed by
CSCS:SMP/vector
Low m cluster
EPFL:SMP/NUMA
High m cluster
ETHZ:SMP/NUMA
High m cluster
EIF:NoW
CERN:egee Grid
SWING
Switch
I
S
S
ISS as a SwissGrid metascheduler
![Page 23: Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d795503460f94a5d0d2/html5/thumbnails/23.jpg)
Managed by
Conclusions
Automatic:Find best suited machines for a given application
Monitor application behaviours on single node and network
Guide towards:Better usage of overall GRID
Extend existing GRID by best suited machines for an application setSingle node optimization and better parallelization
http://web.cscs.ch/ISS/