swift: a language for scalable scientific computing justin m. wozniak swift project: presenter...
TRANSCRIPT
Swift: A language for scalable scientific computing
Justin M. Wozniak
Swift project: www.ci.uchicago.edu/swiftPresenter contact: [email protected]
Parallel scripting language for clusters, clouds & grids– For writing loosely-coupled scripts of application programs
and utilities linked by exchanging files– Can call scripts in shell, python, R, Octave, MATLAB, NCL, …
Swift does 3 important things for you:– Makes parallelism transparent – with functional dataflow– Makes computing location transparent – can run your script
on multiple distributed sites, diverse computing resources (from desktop to petascale)
– Makes basic failure recovery transparent
2www.ci.uchicago.edu/swift
Numerous many-task applications
Simulation of super-cooled glass materials
Protein folding using homology-free approaches
Climate model analysis and decision making in energy policy
Simulation of RNA-protein interaction
Multiscale subsurfaceflow modeling
Modeling of power gridfor OE applications
A-E have published science results obtained using Swift
T0623, 25 res., 8.2Å to 6.3Å (excluding tail)
Protein loop modeling. Courtesy A. Adhikari
Native Predicted
Initial
E
D
C
A B
A
B
C
D
E
F F
>
3
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
Example of when you need Swift: Process and analyze MODIS satellite land use datasets
Swift preprocessing & analysis script
Input: hundreds of land cover tile files (forest, ice, urban, etc) & ROI Output: regions with greatest concentration of specific land types Apps: image processing, statistics, output rendering
4
Largest zones offorest land-cover
In analyzed region
300+ MODIS
Files
www.ci.uchicago.edu/swift
Dataflow and applications of MODIS pipeline
analyzeLandUse
colorMODIS
assemble
markMaplandUsex 317
analyzeLandUse
colorMODISx 317
getLandUsex 317
assemble
markMap
MODIS script is automatically run in parallel:
Each loop levelcan process tensto thousands ofimage files.
5www.ci.uchicago.edu/swift
Encapsulation of applications enables automatic distributed parallel execution
Swift app( ) functionInterface definition
Application program
Typed Swiftdata object
Files expectedor produced
by application program
Declaring applications as Swift functions enables the Swift runtime system to make parallel execution implicit and data distribution and fault retry automatic.
6www.ci.uchicago.edu/swift
app( ) functions specify argument and data passing
Swift getLandUse()app function
img
sortby
o
getlanduse application-i -s
>
To run an application program: getlanduse infile sortfield > outfile
First define this app function:
app (file o) getLandUse (image img, int sortby){ getlanduse "-i" @img "-s" sortby
stdout=@o ;}
Then invoke the app function:
file mfile<"h22v19.tiff">; file use<"h22v19.use">; use = landUse(mfile, "landtype");
1 (=sort by usage)
7
H22v19.tif
H22v19.use
www.ci.uchicago.edu/swift
MODIS script in Swift: main data flowforeach g,i in geos {
land[i] = getLandUse(g,1);
}
(topSelected, selectedTiles) =
analyzeLandUse(land, landType, nSelect);
foreach g, i in geos {
colorImage[i] = colorMODIS(g);
}
gridMap = markMap(topSelected);
montage =
assemble(selectedTiles,colorImage,webDir);
8www.ci.uchicago.edu/swift
9
Programming model:all execution driven by parallel data flow
f() and g() are computed in parallel myproc() returns r when they are done
This parallelism is automatic Works recursively throughout the program’s call graph
(int r) myproc (int i){ j = f(i); k = g(i); r = j + k;}
www.ci.uchicago.edu/swift
Submit host (login node, laptop, Linux server)
Data server
Swiftscript
Swift runtime system has drivers and algorithms to efficiently support and aggregate vastly diverse runtime environments
Swift runs applications on distributed parallel systems
Clouds:Amazon EC2, XSEDE Wispy,Future Grid …
ApplicationPrograms
10www.ci.uchicago.edu/swift
Swift is a parallel scripting system for grids, clouds and clusters– for loosely-coupled applications - application and utility programs linked by
exchanging files Swift is easy to write: simple high-level C-like functional language
– Small Swift scripts can do large-scale work Swift is easy to run: contains all services for running Grid workflow - in one
Java application– Untar and run – acts as a self-contained Grid client
Swift is fast: uses efficient, scalable and flexible distributed execution engine.– Scaling close to 1M tasks – .5M in live science work, and growing
Swift usage is growing:– applications in earth systems sciences, neuroscience, proteomics, molecular
dynamics, biochemistry, economics, statistics, and more.
11www.ci.uchicago.edu/swift
Use: Parallel analysis & visualization of high-res climate models
Work of S Mickelson, RL Jacob, K Schuchardt and the Pagoda Team,M Woitasek, J Dennis.
Standard CESM diagnostic scripts for atmosphere, ocean, and ice models have been converted to Swift
Using NCO and NCL as primary app tools Transcription of csh scripts yields
automated parallel execution. Swift versions are migrating into the
standard UCAR-distributed versions and are in increasingly general community use
Pagoda introduced to replace NCO processing with a tightly-coupled parallel algorithm for hybrid parallelism
www.ci.uchicago.edu/swift
13
CESM Diagnostic Package Speedups
Fig 1: Atmosphere (AMWG) Diagnostics
SwiftOriginal
Fig 2: AMWG Diagnostics with Pagoda
Fig 3: Ice Diagnostics Fig 4: Ocean (OMWG) Diagnostics
www.ci.uchicago.edu/swift
Use: Impact of cloud processeson precipitation
Work L Zamboni, RL Jacob, RV Kotamarthi, TJ Williams, IM Held, M Zhao, C Kerr, V Balaji, JD Neelin, JC. McWilliams
Swift used to postprocess over 80 1TB high-res GFDL HiRAM model runs
Re-organization of data: combining model segments, annualization, and summarization/statistics.
Initial runs achieved 50 TB model data processing in less than 3 days on 32 8-core nodes of ALCF Eureka.
Encapsulated existing scripts created by GFDL for HiRAM
Only a few lines of Swift were needed to parallelize HiRAM postproc scripts
Scripts adjusted to optimize IO
14www.ci.uchicago.edu/swift
Use: land use and energy use models forCIM-EARTH and RDCEP projects
Below: CIM-EARTH energy-economics parameter sweeps of 5,000 AMPL models exploring uncertainty in consumer (top) and industrial (bottom) electricity usage projects by region for the next 5 decades
Left: 120,000 DSSAT models of corn yield are executed in parallel using Swift, per campaign. CENW is also supported as the framework is generalized.Work of J. Elliott, M. Glotter, D. Kelly, K. Maheshwari
www.ci.uchicago.edu/swift
USE: Hydrology assessment of Upper Mississippi River Basin
Model Construction Model Calibration
Hydrologic Cycle
Nutrient Cycle
Crop Growth
Routing
Hydrology:• Runoff• Evapotranspiration• Groundwater• Soil moisture
Water Quality:• Nutrients• Erosion • Pesticides
Crops:• Biomass• Yield
Model Projection Work of E. Yan, Y. Demisie, J. Margoliash, and other collaborators at Argonne Calibration Challenge
– Four processes and 120 parameters– Process is not independent– Long model run time (8 hours)
www.ci.uchicago.edu/swift
When do you need Swift?Typical application: protein-ligand docking for drug screening
Work of M. Kubal, T.A.Binkowski,And B. Roux
(B)
O(100K)drug
candidates
Tens of fruitfulcandidates for wetlab & APS
O(10)proteinsimplicatedin a disease
1Mcompute
jobs
X …
17
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
app( ) functions specify cmd line argument passing
Swift app function“predict()”
t
seq dtlog
PSim application-t -d-s-c
>pdb
pg
To run: psim -s 1ubq.fas -pdb p -t 100.0 -d 25.0 >log
In Swift code:
app (PDB pg, File log) predict (Protein seq, Float t, Float dt) { psim "-c" "-s" @pseq.fasta "-pdb" @pg "–t" temp "-d" dt; }
Protein p <ext; exec="Pmap", id="1ubq">; PDB structure; File log; (structure, log) = predict(p, 100., 25.);
Fastafile
100.0 25.0
18
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
foreach sim in [1:1000] { (structure[sim], log[sim]) = predict(p, 100., 25.);}result = analyze(structure)
…1000
Runs of the“predict”
application
Analyze()
T1af7
T1r69
T1b72
Large scale parallelization with simple loops
19
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
Nested parallel prediction loops in Swift
20
1. Sweep( )2. {3. int nSim = 1000;4. int maxRounds = 3;5. Protein pSet[ ] <ext; exec="Protein.map">;6. float startTemp[ ] = [ 100.0, 200.0 ];7. float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ];8. foreach p, pn in pSet {9. foreach t in startTemp {10. foreach d in delT {11. ItFix(p, nSim, maxRounds, t, d);12. }13. }14. }15. }16. Sweep();
10 proteins x 1000 simulations x3 rounds x 2 temps x 5 deltas
= 300K tasks www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
21
Coasters: uniform resource provisioning
ModFTDock
12/6/2011
string str_roots[] = readData( @arg( "list" ) );int n = @toint(@arg("n","1"));
foreach str_root in str_roots{ string str_file_static = @strcat( @arg("in", "input/"), str_root, ".pdb"); string str_file_mobile = "input/4TRA.pdb";
file_pdb file_static<single_file_mapper;file=str_file_static >; file_pdb file_mobile<single_file_mapper;file=str_file_mobile >; file_dat dat_files[]<simple_mapper; padding = 3, location=@arg("out", "output"), prefix=@strcat(str_root, "_"), suffix=".dat">;
foreach mod_index in [0:n-1] { string str_modulo = @strcat(mod_index, ":", modulus); dat_files[mod_index] = do_one_dock(str_root, str_modulo, file_static, file_mobile); }} 2dk1.pdb
Spatial normalization of functional run reorientRun
reorientRun
reslice_warpRun
random_select
alignlinearRun
resliceRun
softmean
alignlinear
combinewarp
strictmean
gsmoothRun
binarize
reorient/01
reorient/02
reslice_warp/22
alignlinear/03 alignlinear/07alignlinear/11
reorient/05
reorient/06
reslice_warp/23
reorient/09
reorient/10
reslice_warp/24
reorient/25
reorient/51
reslice_warp/26
reorient/27
reorient/52
reslice_warp/28
reorient/29
reorient/53
reslice_warp/30
reorient/31
reorient/54
reslice_warp/32
reorient/33
reorient/55
reslice_warp/34
reorient/35
reorient/56
reslice_warp/36
reorient/37
reorient/57
reslice_warp/38
reslice/04 reslice/08reslice/12
gsmooth/41
strictmean/39
gsmooth/42gsmooth/43gsmooth/44 gsmooth/45 gsmooth/46 gsmooth/47 gsmooth/48 gsmooth/49 gsmooth/50
softmean/13
alignlinear/17
combinewarp/21
binarize/40
reorient
reorient
alignlinear
reslice
softmean
alignlinear
combine_warp
reslice_warp
strictmean
binarize
gsmooth
Dataset-level workflow Expanded (10 volume) workflow22
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
Complex scripts can be well-structuredprogramming in the large: fMRI spatial normalization script example
(Run snr) functional ( Run r, NormAnat a,
Air shrink )
{ Run yroRun = reorientRun( r , "y" );
Run roRun = reorientRun( yroRun , "x" );
Volume std = roRun[0];
Run rndr = random_select( roRun, 0.1 );
AirVector rndAirVec = align_linearRun( rndr, std, 12, 1000, 1000, "81 3 3" );
Run reslicedRndr = resliceRun( rndr, rndAirVec, "o", "k" );
Volume meanRand = softmean( reslicedRndr, "y", "null" );
Air mnQAAir = alignlinear( a.nHires, meanRand, 6, 1000, 4, "81 3 3" );
Warp boldNormWarp = combinewarp( shrink, a.aWarp, mnQAAir );
Run nr = reslice_warp_run( boldNormWarp, roRun );
Volume meanAll = strictmean( nr, "y", "null" )
Volume boldMask = binarize( meanAll, "y" );
snr = gsmoothRun( nr, boldMask, "6 6 6" );
}
(Run or) reorientRun ( Run ir, string direction){ foreach Volume iv, i in ir.v { or.v[i] = reorient(iv, direction); }}
23
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
Dataset mapping example: fMRI datasets
type Study { Group g[ ];
}
type Group { Subject s[ ];
}
type Subject { Volume anat; Run run[ ];
}
type Run { Volume v[ ];
}
type Volume { Image img; Header hdr;
}
On-DiskDataLayout Swift’s
in-memory data model
Mapping functionor script
24
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
Simple campus Swift usage: running locally on a cluster login host
Swift can send work to multiple sites
Data server(GridFTP, scp, …)
Any remote host
CondorOr GRAMProvider
RemoteData
Provider
Remote campus cluster
LocalScheduler
GRAM
Remote campus cluster
LocalScheduler
GRAM
. . .
25
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
Coasters: Flexible worker-node agents for execution and data transport
Main role is to efficiently run Swift tasks on allocated compute nodes, local and remote
Handles short tasks efficiently Runs on:
– Clusters: PBS, SGE, SLURM, SSH, etc. – Grid infrastructure: Condor, GRAM– Cloud infrastructure: Amazon, University clouds, etc.– HPC infrastructure: Blue Gene, Cray
Runs with few infrastructure dependencies Can optionally perform data staging Provisioning can be automatic or external (manually
launched or externally scripted)26
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
Centralized campuscluster environment
Any remote host
Swiftscript
Swift can deploy coasters automatically, or user can deploy manually or with a script. Swift provides scripts for clouds, ad-hoc clusters, etc.
Coasters deployment can be automatic or manual
Data server
CoasterAutoBoot
DataProvider
CoasterService
Computenodes
f1
a1
RemoteHost
GRAMor SSH
Coasterworker
Coasterworker
. . .
ClusterScheduler
27
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
Coaster architecture handles diverse environments
28
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
29
Coasters: uniform resource provisioning
Motivation: Queuing systems
What we have (PBS, SGE)
What we would like
12/6/2011
30
Coasters: uniform resource provisioning
Big Picture: Reusable service components
Application invocationsApplications Data Dependencies
ResourcesCPUs Storage Services
Cluster
HPC
Grid Cloud
Inte
rfac
es
Coaster Pilot Job
Coaster Service
12/6/2011
31
Coasters: uniform resource provisioning
Big Picture: No hands operation
12/6/2011
Client
Coaster Client
Coaster Service Code
Head NodeGRAM/SSH
Bootstrap
Cache
Coaster Service
32
Coasters: uniform resource provisioning
Big Picture: Manage scientific applications
Application invocations
Produce inputs
ResourcesCPUs Storage Services
Cluster
HPC
Grid Cloud
Inte
rfac
es
Coaster Pilot Job
Coaster Service
Produce inputs
Produce inputs
ComputeProduce inputs
Produce inputs
AnalyzeReduce
12/6/2011
33
Coasters: uniform resource provisioning
DiskDisk
Motivation: File system use on Grids
Typical file path when using GRAM For a stage-in, a file is read 3 times from disk, written 2 times to disk, and
goes 4 times through some network Assumption: it’s more efficient to copy data to compute node local storage
than running a job directly on a shared FS.
12/6/2011
Stage-inStage-out
FS Server
Submit Site
Head Node
FS Server
Compute Node
Disk
34
Coasters: uniform resource provisioning
Motivation: … can be made more efficient 2 disk reads, one write, and 2 times through network Assumption: compute node has no outside world access, otherwise the
head node can be bypassed
12/6/2011
Disk Disk
Submit Site
Head Node
Compute Node
Stage-inStage-out
35
Coasters: uniform resource provisioning
Big Picture: Make these resources uniform
12/6/2011
• Application-level
• Task execution• Data access and transfer
• Infrastructure-level
• Pilot job management• Configuration management
• Enable automation of configuration
Enable high portability
36
Coasters: uniform resource provisioning
Big Picture: Create a live pathway between client side and compute nodes
12/6/2011
ClientApplications Invocations Data
DesktopCoaster Service
Cluster Grid Site CloudCoaster Service Coaster Service Coaster Service
Coaster Client
Coaster Worker
CPU Storage
Compute Node
Coaster Worker
CPU Storage
Compute Node
Coaster Worker
CPU Storage
Compute Node
Coaster Worker
CPU Storage
37
Execution infrastructure - Coasters
Coasters: a high task rate execution provider
– Automatically deploys worker agents to resources with respect to user task queues and available resources
12/6/2011
Coasters: uniform resource provisioning
– Implements the Java CoG provider interfaces for compatibility with Swift and other software
– Currently runs on clusters, grids, and HPC systems
– Can move data along with task submission
– Contains a “block” abstraction to manage allocations containing large numbers of CPUs
38
Coasters: uniform resource provisioning
Infrastructure: Execution/data abstractions We need to submit jobs and move data to/from various resources
Coasters is implemented to use the Java CoG Kit providers
Coasters implements the Java CoG abstractions
Java CoG supports: Local execution, PBS, SGE, SSH, Condor, GT, Cobalt
Thus, it runs on: Cray, Blue Gene, OSG, TeraGrid, clusters, Bionimbus, FutureGrid, EC2, etc.
Coasters can automatically allocate blocks of computation time for use in response to user load: “block queue processing”
Runs as a service or bundled with Swift execution
12/6/2011
39
Coasters: uniform resource provisioning
Implementation: The job packing problem (I) We need to pack small jobs into larger jobs (blocks)
Resources have restrictions:– Limit on total number of LRM jobs– Limit on job times– Non-unity granularity (e.g. CPUs can only be allocated in multiples of 16)
We don’t generally know where empty spots in the scheduling are
We want to fit Coasters pilot jobs into the queue however we can
12/6/2011
40
Coasters: uniform resource provisioning
Implementation: The job packing problem (II)(not to scale) Sort incoming jobs based on walltime
Partition required space into blocks
12/6/2011
# CPUs
walltime
41
Coasters: uniform resource provisioning
Implementation: The job packing problem (II)(also not to scale) Commit jobs to blocks and adjust as necessary based on actual walltime
The actual packing problem is NP-complete Solved using a greedy algorithm: always pick the largest job that will fit in
a block first
12/6/2011
# CPUs
walltime
now
… …
42
Coasters: uniform resource provisioning
Use on commercial clouds
Swift/Coasters is easy to deploy on Globus Provisioning (GP)
GP provides simple start-up scripts and several other features we may use in the future (NFS, VM image, CHEF)
Usage:0. Authenticate to AWS1. start-coaster-service
gp-instance-create gp-instance-start
2. swift myscript.swift … Repeat as necessary
3. stop-coaster-service gp-instance-terminate
12/6/2011
Submit site
GP+AWS
Coaster Pilot Job
Coaster Service
Coaster Pilot Job
NFS
Swift
43
NAMD - Replica Exchange Method
Original JETS use case- sizeable batch of short parallel jobs with data exchange
Method extracts information about a complex molecular system through an ensemble of concurrent, parallel simulation tasks
09/13/2011
JETS
Application parameters (approx.):
• 64 concurrent jobs x 256 cores per job = 16,384 cores
• 10-100 time steps per job = 10-60 seconds wall time
• Requires 6.4 MPI executions/sec. →1,638 processes/sec. over a 12-hour period = 70 million process starts
44
JETS
Performance challenges for large batches For small application run times, the cost of application start-up, small I/O,
library searches, etc. is expensive
Existing HPC schedulers do not support this mode of operation– On the Blue Gene/P, job start takes 2-4 minutes– On the Cray, aprun job start takes a full second or so– Neither of these systems allow the user to make a fine-grained selection of
cores from the allocation for small multicore/multinode jobs
Solution pursued by JETS:– Allocate worker agents en masse– Use a specialized user scheduler to rapidly submit user work to agents– Support dynamic construction of multinode MPI applications
09/13/2011
45
JETS
JETS: Features
Portable worker agents that run on compute nodes– Provides scripts to launch agents on common systems– Features provide convenient access to local storage such as BG/P ZeptoOS RAM filesystem.
Storing application binary, libraries, etc. here results in significant application start time improvements
Central user scheduler to manage workers: (Stand-alone JETS or Coasters discussed on following slides)
MPICH /Hydra modification to allow “launcher=manual”: tasks launched by the user (instead of SSH or other method)
User scheduler plug-in to manage a local call to mpiexec – Processes output from mpiexec over local IPC, launches resultant single tasks on workers– Single tasks are able to find the mpiexec process and each other to start the user job (via
Hydra proxy functionality)– Can efficiently manage many running mpiexec processes
09/13/2011
46
Execution infrastructure - JETS
Stand-alone JETS: a high task rate parallel-task launcher
– User deploys worker agents via customizable, provided submit scripts
09/13/2011
JETS
– Currently runs on clusters, grids, and HPC systems
– Great over SSH– Runs on the BG/P through
ZeptoOS sockets- great for debugging, performance studies, ensembles
– Faster than Coasters but provides fewer features
– Input must be a flat list of command lines
– Limited data access features
47
JETS
NAMD/JETS - Parameters
NAMD REM-like case: Tasks average just over 100 seconds
09/13/2011
ZeptoOS sockets on the BG/P 90% efficiency for large messages50% efficiency for small messages
• Case provided by Wei Jiang
48
JETS
JETS - Task rates and utilization
Calibration: Sequential performance on synthetic jobs:
09/13/2011
Utilization for REM-like case: not quite 90%
49
JETS
NAMD/JETS load levels
Allocation size: 512 nodes
09/13/2011
Allocation size: 1024 nodes
Load dips occur during exchange & restart
50
JETS
JETS - Misc. results
Effective for short MPI jobs on clusters
Single-second duration jobs on Breadboard cluster
09/13/2011
JETS can survive the loss of worker agents (BG/P)
51
JETS
JETS/Coasters
JETS features were integrated with Coasters/Swift for elegant description of workflows containing calls to MPI programs
09/13/2011
The Coasters service manages workers as normal but was extended to aggregate worker cores for MPI execution
Re-uses features from stand-alone JETS
Enables dynamic allocation of MPI jobs; highly portable to BG/P, Cray, clusters, grids, clouds, etc.
Relatively efficient but slower than stand-alone mode
52
JETS
NAMD REM in Swift Constructed SwiftScript to
implement REM in NAMD– Whole script is about 100
lines– Intended to substitute for
multi-thousand line Python script (that is incompatible with the BG/P)
Script core structures shown to the right
Represents REM data flow from previous slide as Swift data items, statements, and loops
09/13/2011
app (positions p_out, velocities v_out, energies e_out)namd(positions p_in, velocities v_in){ namd @p_out @v_out @p_in @v_in stdout=@e_out;}
positions p[]<array_mapper;files=p_strings>;velocities v[]<array_mapper;files=v_strings>;energies e[]<array_mapper;files=e_strings>;
// Initialize first segment in each replicaforeach i in [0:replicas-1] { int index = i*exchanges; p[i] = initial_positions(); v[i] = initial_velocities();}
// Launch data-dependent NAMDs…iterate j { foreach i in [0:replicas-1] { int current = i*exchanges + j+1; int previous = i*exchanges + j; (p[current], v[current], e[current]) = namd(p[previous], v[previous]); }} until (j == exchanges);
53
ExM: Extreme-scale many-task computing
09/13/2011
JETS
The JETS work focused on performance issues in parallel task distribution and launch– Task generation (dataflow script execution) is still a single node process– To run large instances of the script, a distributed memory dataflow system is
required
Project goal: Investigate many-task computing on exascale systems– Ease of development – fast route to exaflop application– Investigate alternative programming models– Highly usable programming model: natural concurrency, fault-tolerance– Support scientific use cases: batches, scripts, experiment suites, etc.
Build on and integrate previous successes– ADLB: Task distributor– MosaStore: Filesystem cache– Swift language: Natural concurrency, data specification, etc.
54
Task generation and scalability
09/13/2011
JETS
In SwiftScript, all data items are futures
Progress is enabled when data items are closed, enabling dependent statements to execute
Not all variables, statements are known at program start
SwiftScript allows for complex data definitions, conditionals, loops, etc.
Current Swift implementation constrains the data dependency logic to a single node (as do other systems like Dryad, CIEL) - thus task generation rates are limited
ExM proposes a fully distributed, scalable task generator and dependency graph – built to express Swift semantics and more
55
Performance target
09/13/2011
JETS
Need to utilize O(109) concurrency
For batch of 1000 tasks per core– 10 seconds per task– 1 hour, 46 minute batch
Tasks : O(1012)
Tasks/s: O(108)
Divide cores into workers and control cores– Allocate 0.01% as control cores, O(105)– Each control core must produce O(103) = 1000 tasks/second
Performance requirements for distributing the work of Swift-like task generation for an ADLB-like task distributor on an example exascale system:
Move to distributed evaluation: Swift/T
56
www.ci.uchicago.edu/swift www.mcs.anl.gov/exm
Have this: Want this:
What are the challenges in applying themany-task model at extreme scale?
Dealing with high task dispatch rates: depends on task granularity but is commonly an obstacle– Load balancing– Global and scoped access to script variables: increasing locality
Handling tasks that are themselves parallel programs– We focus on OpenMP, MPI, and Global Arrays
Data management for intermediate results– Object based abstractions– POSIX-based abstractions– Interaction between data management and task placement
57
Parallel evaluation of Swift/T in ExM
58
Big picture Software components
Run time configuration
Fully parallel evaluation of complex scripts
59
int X = 100, Y = 100;int A[][];int B[];foreach x in [0:X-1] { foreach y in [0:Y-1] { if (check(x, y)) { A[x][y] = g(f(x), f(y)); } else { A[x][y] = 0; } } B[x] = sum(A[x]);}
Swift/T: Basic scaling performance
Swift/T synthetic app: simple task loop– Scales to capacity of
ADLB
SciSolSim: collaboration graph modeling– ~400 lines Swift– C++ graph model; simulated annealing– Scales well to 4K cores in initial tests
(further tests awaiting app debugging)
Swift/T: PIPS scaling results
Scaling result for original application:Limited by available data from PIPS team
Scaling result for current applicationwith parameter sweep over rounding parameter (integer programming)
61
“One major advantage of Swift is that its high-level programming language considerably shortens coding times, hence it allows more effort to be dedicated to algorithmic developments. Remarkably, using a scripting language has virtually no impact on the solution times and scaling efficiency, as recent Swift/PIPS-L runs show.”
- Cosmin Petra
Swift/T: Use of low-level features
Variable-sized tasks produce trailing tasks:addressed by exposing ADLB task priorities at language level
FOR MORE INFORMATION and TOOL DOWNLOADS: To try Swift: http://www.ci.uchicago.edu/swift ParVisProject: http://trac.mcs.anl.gov/projects/parvis CESM Diagnostic script info at http://www.cesm.ucar.edu Pagoda project link: https://svn.pnl.gov/gcrm/wiki/Pagoda Extreme-scale Swift: http://www.mcs.anl.gov/exm
63www.ci.uchicago.edu/swift
64Parallel Computing, Sep 2011www.ci.uchicago.edu/swift
Acknowledgments
Swift is supported in part by NSF grants OCI-1148443 and PHY-636265, NIH DC08638,and the UChicago SCI Program
Extreme scaling research on Swift (ExM project) is supported by the DOEOffice of Science, ASCR Division
ParVis is sponsored by the DOE Office of Biological and Environment Research, Office of Science
The Swift team:– Mihael Hategan, Tim Armstrong, David Kelly, Ian Foster, Dan Katz,
Mike Wilde, Zhao Zhang Sincere thanks to our scientific collaborators whose usage of Swift
was described and acknowledged in this talk
65www.ci.uchicago.edu/swift