swift: a language for scalable scientific computing justin m. wozniak swift project: presenter...

65
Swift: A language for scalable scientific computing Justin M. Wozniak Swift project: www.ci.uchicago.edu/swift Presenter contact: [email protected]

Upload: chad-elijah-davidson

Post on 27-Dec-2015

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Swift: A language for scalable scientific computing

Justin M. Wozniak

Swift project: www.ci.uchicago.edu/swiftPresenter contact: [email protected]

Page 2: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Parallel scripting language for clusters, clouds & grids– For writing loosely-coupled scripts of application programs

and utilities linked by exchanging files– Can call scripts in shell, python, R, Octave, MATLAB, NCL, …

Swift does 3 important things for you:– Makes parallelism transparent – with functional dataflow– Makes computing location transparent – can run your script

on multiple distributed sites, diverse computing resources (from desktop to petascale)

– Makes basic failure recovery transparent

2www.ci.uchicago.edu/swift

Page 3: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Numerous many-task applications

Simulation of super-cooled glass materials

Protein folding using homology-free approaches

Climate model analysis and decision making in energy policy

Simulation of RNA-protein interaction

Multiscale subsurfaceflow modeling

Modeling of power gridfor OE applications

A-E have published science results obtained using Swift

T0623, 25 res., 8.2Å to 6.3Å (excluding tail)

Protein loop modeling. Courtesy A. Adhikari

Native Predicted

Initial

E

D

C

A B

A

B

C

D

E

F F

>

3

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 4: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Example of when you need Swift: Process and analyze MODIS satellite land use datasets

Swift preprocessing & analysis script

Input: hundreds of land cover tile files (forest, ice, urban, etc) & ROI Output: regions with greatest concentration of specific land types Apps: image processing, statistics, output rendering

4

Largest zones offorest land-cover

In analyzed region

300+ MODIS

Files

www.ci.uchicago.edu/swift

Page 5: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Dataflow and applications of MODIS pipeline

analyzeLandUse

colorMODIS

assemble

markMaplandUsex 317

analyzeLandUse

colorMODISx 317

getLandUsex 317

assemble

markMap

MODIS script is automatically run in parallel:

Each loop levelcan process tensto thousands ofimage files.

5www.ci.uchicago.edu/swift

Page 6: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Encapsulation of applications enables automatic distributed parallel execution

Swift app( ) functionInterface definition

Application program

Typed Swiftdata object

Files expectedor produced

by application program

Declaring applications as Swift functions enables the Swift runtime system to make parallel execution implicit and data distribution and fault retry automatic.

6www.ci.uchicago.edu/swift

Page 7: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

app( ) functions specify argument and data passing

Swift getLandUse()app function

img

sortby

o

getlanduse application-i -s

>

To run an application program: getlanduse infile sortfield > outfile

First define this app function:

app (file o) getLandUse (image img, int sortby){ getlanduse "-i" @img "-s" sortby

stdout=@o ;}

Then invoke the app function:

file mfile<"h22v19.tiff">; file use<"h22v19.use">; use = landUse(mfile, "landtype");

1 (=sort by usage)

7

H22v19.tif

H22v19.use

www.ci.uchicago.edu/swift

Page 8: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

MODIS script in Swift: main data flowforeach g,i in geos {

land[i] = getLandUse(g,1);

}

(topSelected, selectedTiles) =

analyzeLandUse(land, landType, nSelect);

foreach g, i in geos {

colorImage[i] = colorMODIS(g);

}

gridMap = markMap(topSelected);

montage =

assemble(selectedTiles,colorImage,webDir);

8www.ci.uchicago.edu/swift

Page 9: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

9

Programming model:all execution driven by parallel data flow

f() and g() are computed in parallel myproc() returns r when they are done

This parallelism is automatic Works recursively throughout the program’s call graph

(int r) myproc (int i){ j = f(i); k = g(i); r = j + k;}

www.ci.uchicago.edu/swift

Page 10: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Submit host (login node, laptop, Linux server)

Data server

Swiftscript

Swift runtime system has drivers and algorithms to efficiently support and aggregate vastly diverse runtime environments

Swift runs applications on distributed parallel systems

Clouds:Amazon EC2, XSEDE Wispy,Future Grid …

ApplicationPrograms

10www.ci.uchicago.edu/swift

Page 11: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Swift is a parallel scripting system for grids, clouds and clusters– for loosely-coupled applications - application and utility programs linked by

exchanging files Swift is easy to write: simple high-level C-like functional language

– Small Swift scripts can do large-scale work Swift is easy to run: contains all services for running Grid workflow - in one

Java application– Untar and run – acts as a self-contained Grid client

Swift is fast: uses efficient, scalable and flexible distributed execution engine.– Scaling close to 1M tasks – .5M in live science work, and growing

Swift usage is growing:– applications in earth systems sciences, neuroscience, proteomics, molecular

dynamics, biochemistry, economics, statistics, and more.

11www.ci.uchicago.edu/swift

Page 12: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Use: Parallel analysis & visualization of high-res climate models

Work of S Mickelson, RL Jacob, K Schuchardt and the Pagoda Team,M Woitasek, J Dennis.

Standard CESM diagnostic scripts for atmosphere, ocean, and ice models have been converted to Swift

Using NCO and NCL as primary app tools Transcription of csh scripts yields

automated parallel execution. Swift versions are migrating into the

standard UCAR-distributed versions and are in increasingly general community use

Pagoda introduced to replace NCO processing with a tightly-coupled parallel algorithm for hybrid parallelism

www.ci.uchicago.edu/swift

Page 13: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

13

CESM Diagnostic Package Speedups

Fig 1: Atmosphere (AMWG) Diagnostics

SwiftOriginal

Fig 2: AMWG Diagnostics with Pagoda

Fig 3: Ice Diagnostics Fig 4: Ocean (OMWG) Diagnostics

www.ci.uchicago.edu/swift

Page 14: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Use: Impact of cloud processeson precipitation

Work L Zamboni, RL Jacob, RV Kotamarthi, TJ Williams, IM Held, M Zhao, C Kerr, V Balaji, JD Neelin, JC. McWilliams

Swift used to postprocess over 80 1TB high-res GFDL HiRAM model runs

Re-organization of data: combining model segments, annualization, and summarization/statistics.

Initial runs achieved 50 TB model data processing in less than 3 days on 32 8-core nodes of ALCF Eureka.

Encapsulated existing scripts created by GFDL for HiRAM

Only a few lines of Swift were needed to parallelize HiRAM postproc scripts

Scripts adjusted to optimize IO

14www.ci.uchicago.edu/swift

Page 15: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Use: land use and energy use models forCIM-EARTH and RDCEP projects

Below: CIM-EARTH energy-economics parameter sweeps of 5,000 AMPL models exploring uncertainty in consumer (top) and industrial (bottom) electricity usage projects by region for the next 5 decades

Left: 120,000 DSSAT models of corn yield are executed in parallel using Swift, per campaign. CENW is also supported as the framework is generalized.Work of J. Elliott, M. Glotter, D. Kelly, K. Maheshwari

www.ci.uchicago.edu/swift

Page 16: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

USE: Hydrology assessment of Upper Mississippi River Basin

Model Construction Model Calibration

Hydrologic Cycle

Nutrient Cycle

Crop Growth

Routing

Hydrology:• Runoff• Evapotranspiration• Groundwater• Soil moisture

Water Quality:• Nutrients• Erosion • Pesticides

Crops:• Biomass• Yield

Model Projection Work of E. Yan, Y. Demisie, J. Margoliash, and other collaborators at Argonne Calibration Challenge

– Four processes and 120 parameters– Process is not independent– Long model run time (8 hours)

www.ci.uchicago.edu/swift

Page 17: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

When do you need Swift?Typical application: protein-ligand docking for drug screening

Work of M. Kubal, T.A.Binkowski,And B. Roux

(B)

O(100K)drug

candidates

Tens of fruitfulcandidates for wetlab & APS

O(10)proteinsimplicatedin a disease

1Mcompute

jobs

X …

17

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 18: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

app( ) functions specify cmd line argument passing

Swift app function“predict()”

t

seq dtlog

PSim application-t -d-s-c

>pdb

pg

To run: psim -s 1ubq.fas -pdb p -t 100.0 -d 25.0 >log

In Swift code:

app (PDB pg, File log) predict (Protein seq, Float t, Float dt) { psim "-c" "-s" @pseq.fasta "-pdb" @pg "–t" temp "-d" dt; }

Protein p <ext; exec="Pmap", id="1ubq">; PDB structure; File log; (structure, log) = predict(p, 100., 25.);

Fastafile

100.0 25.0

18

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 19: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

foreach sim in [1:1000] { (structure[sim], log[sim]) = predict(p, 100., 25.);}result = analyze(structure)

…1000

Runs of the“predict”

application

Analyze()

T1af7

T1r69

T1b72

Large scale parallelization with simple loops

19

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 20: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Nested parallel prediction loops in Swift

20

1. Sweep( )2. {3. int nSim = 1000;4. int maxRounds = 3;5. Protein pSet[ ] <ext; exec="Protein.map">;6. float startTemp[ ] = [ 100.0, 200.0 ];7. float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ];8. foreach p, pn in pSet {9. foreach t in startTemp {10. foreach d in delT {11. ItFix(p, nSim, maxRounds, t, d);12. }13. }14. }15. }16. Sweep();

10 proteins x 1000 simulations x3 rounds x 2 temps x 5 deltas

= 300K tasks www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 21: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

21

Coasters: uniform resource provisioning

ModFTDock

12/6/2011

string str_roots[] = readData( @arg( "list" ) );int n = @toint(@arg("n","1"));

foreach str_root in str_roots{ string str_file_static = @strcat( @arg("in", "input/"), str_root, ".pdb"); string str_file_mobile = "input/4TRA.pdb";

file_pdb file_static<single_file_mapper;file=str_file_static >; file_pdb file_mobile<single_file_mapper;file=str_file_mobile >; file_dat dat_files[]<simple_mapper; padding = 3, location=@arg("out", "output"), prefix=@strcat(str_root, "_"), suffix=".dat">;

foreach mod_index in [0:n-1] { string str_modulo = @strcat(mod_index, ":", modulus); dat_files[mod_index] = do_one_dock(str_root, str_modulo, file_static, file_mobile); }} 2dk1.pdb

Page 22: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Spatial normalization of functional run reorientRun

reorientRun

reslice_warpRun

random_select

alignlinearRun

resliceRun

softmean

alignlinear

combinewarp

strictmean

gsmoothRun

binarize

reorient/01

reorient/02

reslice_warp/22

alignlinear/03 alignlinear/07alignlinear/11

reorient/05

reorient/06

reslice_warp/23

reorient/09

reorient/10

reslice_warp/24

reorient/25

reorient/51

reslice_warp/26

reorient/27

reorient/52

reslice_warp/28

reorient/29

reorient/53

reslice_warp/30

reorient/31

reorient/54

reslice_warp/32

reorient/33

reorient/55

reslice_warp/34

reorient/35

reorient/56

reslice_warp/36

reorient/37

reorient/57

reslice_warp/38

reslice/04 reslice/08reslice/12

gsmooth/41

strictmean/39

gsmooth/42gsmooth/43gsmooth/44 gsmooth/45 gsmooth/46 gsmooth/47 gsmooth/48 gsmooth/49 gsmooth/50

softmean/13

alignlinear/17

combinewarp/21

binarize/40

reorient

reorient

alignlinear

reslice

softmean

alignlinear

combine_warp

reslice_warp

strictmean

binarize

gsmooth

Dataset-level workflow Expanded (10 volume) workflow22

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 23: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Complex scripts can be well-structuredprogramming in the large: fMRI spatial normalization script example

(Run snr) functional ( Run r, NormAnat a,

Air shrink )

{ Run yroRun = reorientRun( r , "y" );

Run roRun = reorientRun( yroRun , "x" );

Volume std = roRun[0];

Run rndr = random_select( roRun, 0.1 );

AirVector rndAirVec = align_linearRun( rndr, std, 12, 1000, 1000, "81 3 3" );

Run reslicedRndr = resliceRun( rndr, rndAirVec, "o", "k" );

Volume meanRand = softmean( reslicedRndr, "y", "null" );

Air mnQAAir = alignlinear( a.nHires, meanRand, 6, 1000, 4, "81 3 3" );

Warp boldNormWarp = combinewarp( shrink, a.aWarp, mnQAAir );

Run nr = reslice_warp_run( boldNormWarp, roRun );

Volume meanAll = strictmean( nr, "y", "null" )

Volume boldMask = binarize( meanAll, "y" );

snr = gsmoothRun( nr, boldMask, "6 6 6" );

}

(Run or) reorientRun ( Run ir, string direction){ foreach Volume iv, i in ir.v { or.v[i] = reorient(iv, direction); }}

23

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 24: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Dataset mapping example: fMRI datasets

type Study { Group g[ ];

}

type Group { Subject s[ ];

}

type Subject { Volume anat; Run run[ ];

}

type Run { Volume v[ ];

}

type Volume { Image img; Header hdr;

}

On-DiskDataLayout Swift’s

in-memory data model

Mapping functionor script

24

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 25: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Simple campus Swift usage: running locally on a cluster login host

Swift can send work to multiple sites

Data server(GridFTP, scp, …)

Any remote host

CondorOr GRAMProvider

RemoteData

Provider

Remote campus cluster

LocalScheduler

GRAM

Remote campus cluster

LocalScheduler

GRAM

. . .

25

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 26: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Coasters: Flexible worker-node agents for execution and data transport

Main role is to efficiently run Swift tasks on allocated compute nodes, local and remote

Handles short tasks efficiently Runs on:

– Clusters: PBS, SGE, SLURM, SSH, etc. – Grid infrastructure: Condor, GRAM– Cloud infrastructure: Amazon, University clouds, etc.– HPC infrastructure: Blue Gene, Cray

Runs with few infrastructure dependencies Can optionally perform data staging Provisioning can be automatic or external (manually

launched or externally scripted)26

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 27: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Centralized campuscluster environment

Any remote host

Swiftscript

Swift can deploy coasters automatically, or user can deploy manually or with a script. Swift provides scripts for clouds, ad-hoc clusters, etc.

Coasters deployment can be automatic or manual

Data server

CoasterAutoBoot

DataProvider

CoasterService

Computenodes

f1

a1

RemoteHost

GRAMor SSH

Coasterworker

Coasterworker

. . .

ClusterScheduler

27

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 28: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Coaster architecture handles diverse environments

28

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Page 29: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

29

Coasters: uniform resource provisioning

Motivation: Queuing systems

What we have (PBS, SGE)

What we would like

12/6/2011

Page 30: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

30

Coasters: uniform resource provisioning

Big Picture: Reusable service components

Application invocationsApplications Data Dependencies

ResourcesCPUs Storage Services

Cluster

HPC

Grid Cloud

Inte

rfac

es

Coaster Pilot Job

Coaster Service

12/6/2011

Page 31: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

31

Coasters: uniform resource provisioning

Big Picture: No hands operation

12/6/2011

Client

Coaster Client

Coaster Service Code

Head NodeGRAM/SSH

Bootstrap

Cache

Coaster Service

Page 32: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

32

Coasters: uniform resource provisioning

Big Picture: Manage scientific applications

Application invocations

Produce inputs

ResourcesCPUs Storage Services

Cluster

HPC

Grid Cloud

Inte

rfac

es

Coaster Pilot Job

Coaster Service

Produce inputs

Produce inputs

ComputeProduce inputs

Produce inputs

AnalyzeReduce

12/6/2011

Page 33: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

33

Coasters: uniform resource provisioning

DiskDisk

Motivation: File system use on Grids

Typical file path when using GRAM For a stage-in, a file is read 3 times from disk, written 2 times to disk, and

goes 4 times through some network Assumption: it’s more efficient to copy data to compute node local storage

than running a job directly on a shared FS.

12/6/2011

Stage-inStage-out

FS Server

Submit Site

Head Node

FS Server

Compute Node

Disk

Page 34: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

34

Coasters: uniform resource provisioning

Motivation: … can be made more efficient 2 disk reads, one write, and 2 times through network Assumption: compute node has no outside world access, otherwise the

head node can be bypassed

12/6/2011

Disk Disk

Submit Site

Head Node

Compute Node

Stage-inStage-out

Page 35: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

35

Coasters: uniform resource provisioning

Big Picture: Make these resources uniform

12/6/2011

• Application-level

• Task execution• Data access and transfer

• Infrastructure-level

• Pilot job management• Configuration management

• Enable automation of configuration

Enable high portability

Page 36: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

36

Coasters: uniform resource provisioning

Big Picture: Create a live pathway between client side and compute nodes

12/6/2011

ClientApplications Invocations Data

DesktopCoaster Service

Cluster Grid Site CloudCoaster Service Coaster Service Coaster Service

Coaster Client

Coaster Worker

CPU Storage

Compute Node

Coaster Worker

CPU Storage

Compute Node

Coaster Worker

CPU Storage

Compute Node

Coaster Worker

CPU Storage

Page 37: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

37

Execution infrastructure - Coasters

Coasters: a high task rate execution provider

– Automatically deploys worker agents to resources with respect to user task queues and available resources

12/6/2011

Coasters: uniform resource provisioning

– Implements the Java CoG provider interfaces for compatibility with Swift and other software

– Currently runs on clusters, grids, and HPC systems

– Can move data along with task submission

– Contains a “block” abstraction to manage allocations containing large numbers of CPUs

Page 38: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

38

Coasters: uniform resource provisioning

Infrastructure: Execution/data abstractions We need to submit jobs and move data to/from various resources

Coasters is implemented to use the Java CoG Kit providers

Coasters implements the Java CoG abstractions

Java CoG supports: Local execution, PBS, SGE, SSH, Condor, GT, Cobalt

Thus, it runs on: Cray, Blue Gene, OSG, TeraGrid, clusters, Bionimbus, FutureGrid, EC2, etc.

Coasters can automatically allocate blocks of computation time for use in response to user load: “block queue processing”

Runs as a service or bundled with Swift execution

12/6/2011

Page 39: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

39

Coasters: uniform resource provisioning

Implementation: The job packing problem (I) We need to pack small jobs into larger jobs (blocks)

Resources have restrictions:– Limit on total number of LRM jobs– Limit on job times– Non-unity granularity (e.g. CPUs can only be allocated in multiples of 16)

We don’t generally know where empty spots in the scheduling are

We want to fit Coasters pilot jobs into the queue however we can

12/6/2011

Page 40: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

40

Coasters: uniform resource provisioning

Implementation: The job packing problem (II)(not to scale) Sort incoming jobs based on walltime

Partition required space into blocks

12/6/2011

# CPUs

walltime

Page 41: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

41

Coasters: uniform resource provisioning

Implementation: The job packing problem (II)(also not to scale) Commit jobs to blocks and adjust as necessary based on actual walltime

The actual packing problem is NP-complete Solved using a greedy algorithm: always pick the largest job that will fit in

a block first

12/6/2011

# CPUs

walltime

now

… …

Page 42: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

42

Coasters: uniform resource provisioning

Use on commercial clouds

Swift/Coasters is easy to deploy on Globus Provisioning (GP)

GP provides simple start-up scripts and several other features we may use in the future (NFS, VM image, CHEF)

Usage:0. Authenticate to AWS1. start-coaster-service

gp-instance-create gp-instance-start

2. swift myscript.swift … Repeat as necessary

3. stop-coaster-service gp-instance-terminate

12/6/2011

Submit site

GP+AWS

Coaster Pilot Job

Coaster Service

Coaster Pilot Job

NFS

Swift

Page 43: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

43

NAMD - Replica Exchange Method

Original JETS use case- sizeable batch of short parallel jobs with data exchange

Method extracts information about a complex molecular system through an ensemble of concurrent, parallel simulation tasks

09/13/2011

JETS

Application parameters (approx.):

• 64 concurrent jobs x 256 cores per job = 16,384 cores

• 10-100 time steps per job = 10-60 seconds wall time

• Requires 6.4 MPI executions/sec. →1,638 processes/sec. over a 12-hour period = 70 million process starts

Page 44: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

44

JETS

Performance challenges for large batches For small application run times, the cost of application start-up, small I/O,

library searches, etc. is expensive

Existing HPC schedulers do not support this mode of operation– On the Blue Gene/P, job start takes 2-4 minutes– On the Cray, aprun job start takes a full second or so– Neither of these systems allow the user to make a fine-grained selection of

cores from the allocation for small multicore/multinode jobs

Solution pursued by JETS:– Allocate worker agents en masse– Use a specialized user scheduler to rapidly submit user work to agents– Support dynamic construction of multinode MPI applications

09/13/2011

Page 45: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

45

JETS

JETS: Features

Portable worker agents that run on compute nodes– Provides scripts to launch agents on common systems– Features provide convenient access to local storage such as BG/P ZeptoOS RAM filesystem.

Storing application binary, libraries, etc. here results in significant application start time improvements

Central user scheduler to manage workers: (Stand-alone JETS or Coasters discussed on following slides)

MPICH /Hydra modification to allow “launcher=manual”: tasks launched by the user (instead of SSH or other method)

User scheduler plug-in to manage a local call to mpiexec – Processes output from mpiexec over local IPC, launches resultant single tasks on workers– Single tasks are able to find the mpiexec process and each other to start the user job (via

Hydra proxy functionality)– Can efficiently manage many running mpiexec processes

09/13/2011

Page 46: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

46

Execution infrastructure - JETS

Stand-alone JETS: a high task rate parallel-task launcher

– User deploys worker agents via customizable, provided submit scripts

09/13/2011

JETS

– Currently runs on clusters, grids, and HPC systems

– Great over SSH– Runs on the BG/P through

ZeptoOS sockets- great for debugging, performance studies, ensembles

– Faster than Coasters but provides fewer features

– Input must be a flat list of command lines

– Limited data access features

Page 47: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

47

JETS

NAMD/JETS - Parameters

NAMD REM-like case: Tasks average just over 100 seconds

09/13/2011

ZeptoOS sockets on the BG/P 90% efficiency for large messages50% efficiency for small messages

• Case provided by Wei Jiang

Page 48: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

48

JETS

JETS - Task rates and utilization

Calibration: Sequential performance on synthetic jobs:

09/13/2011

Utilization for REM-like case: not quite 90%

Page 49: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

49

JETS

NAMD/JETS load levels

Allocation size: 512 nodes

09/13/2011

Allocation size: 1024 nodes

Load dips occur during exchange & restart

Page 50: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

50

JETS

JETS - Misc. results

Effective for short MPI jobs on clusters

Single-second duration jobs on Breadboard cluster

09/13/2011

JETS can survive the loss of worker agents (BG/P)

Page 51: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

51

JETS

JETS/Coasters

JETS features were integrated with Coasters/Swift for elegant description of workflows containing calls to MPI programs

09/13/2011

The Coasters service manages workers as normal but was extended to aggregate worker cores for MPI execution

Re-uses features from stand-alone JETS

Enables dynamic allocation of MPI jobs; highly portable to BG/P, Cray, clusters, grids, clouds, etc.

Relatively efficient but slower than stand-alone mode

Page 52: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

52

JETS

NAMD REM in Swift Constructed SwiftScript to

implement REM in NAMD– Whole script is about 100

lines– Intended to substitute for

multi-thousand line Python script (that is incompatible with the BG/P)

Script core structures shown to the right

Represents REM data flow from previous slide as Swift data items, statements, and loops

09/13/2011

app (positions p_out, velocities v_out, energies e_out)namd(positions p_in, velocities v_in){ namd @p_out @v_out @p_in @v_in stdout=@e_out;}

positions p[]<array_mapper;files=p_strings>;velocities v[]<array_mapper;files=v_strings>;energies e[]<array_mapper;files=e_strings>;

// Initialize first segment in each replicaforeach i in [0:replicas-1] { int index = i*exchanges; p[i] = initial_positions(); v[i] = initial_velocities();}

// Launch data-dependent NAMDs…iterate j { foreach i in [0:replicas-1] { int current = i*exchanges + j+1; int previous = i*exchanges + j; (p[current], v[current], e[current]) = namd(p[previous], v[previous]); }} until (j == exchanges);

Page 53: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

53

ExM: Extreme-scale many-task computing

09/13/2011

JETS

The JETS work focused on performance issues in parallel task distribution and launch– Task generation (dataflow script execution) is still a single node process– To run large instances of the script, a distributed memory dataflow system is

required

Project goal: Investigate many-task computing on exascale systems– Ease of development – fast route to exaflop application– Investigate alternative programming models– Highly usable programming model: natural concurrency, fault-tolerance– Support scientific use cases: batches, scripts, experiment suites, etc.

Build on and integrate previous successes– ADLB: Task distributor– MosaStore: Filesystem cache– Swift language: Natural concurrency, data specification, etc.

Page 54: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

54

Task generation and scalability

09/13/2011

JETS

In SwiftScript, all data items are futures

Progress is enabled when data items are closed, enabling dependent statements to execute

Not all variables, statements are known at program start

SwiftScript allows for complex data definitions, conditionals, loops, etc.

Current Swift implementation constrains the data dependency logic to a single node (as do other systems like Dryad, CIEL) - thus task generation rates are limited

ExM proposes a fully distributed, scalable task generator and dependency graph – built to express Swift semantics and more

Page 55: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

55

Performance target

09/13/2011

JETS

Need to utilize O(109) concurrency

For batch of 1000 tasks per core– 10 seconds per task– 1 hour, 46 minute batch

Tasks : O(1012)

Tasks/s: O(108)

Divide cores into workers and control cores– Allocate 0.01% as control cores, O(105)– Each control core must produce O(103) = 1000 tasks/second

Performance requirements for distributing the work of Swift-like task generation for an ADLB-like task distributor on an example exascale system:

Page 56: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Move to distributed evaluation: Swift/T

56

www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

Have this: Want this:

Page 57: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

What are the challenges in applying themany-task model at extreme scale?

Dealing with high task dispatch rates: depends on task granularity but is commonly an obstacle– Load balancing– Global and scoped access to script variables: increasing locality

Handling tasks that are themselves parallel programs– We focus on OpenMP, MPI, and Global Arrays

Data management for intermediate results– Object based abstractions– POSIX-based abstractions– Interaction between data management and task placement

57

Page 58: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Parallel evaluation of Swift/T in ExM

58

Big picture Software components

Run time configuration

Page 59: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Fully parallel evaluation of complex scripts

59

int X = 100, Y = 100;int A[][];int B[];foreach x in [0:X-1] { foreach y in [0:Y-1] { if (check(x, y)) { A[x][y] = g(f(x), f(y)); } else { A[x][y] = 0; } } B[x] = sum(A[x]);}

Page 60: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Swift/T: Basic scaling performance

Swift/T synthetic app: simple task loop– Scales to capacity of

ADLB

SciSolSim: collaboration graph modeling– ~400 lines Swift– C++ graph model; simulated annealing– Scales well to 4K cores in initial tests

(further tests awaiting app debugging)

Page 61: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Swift/T: PIPS scaling results

Scaling result for original application:Limited by available data from PIPS team

Scaling result for current applicationwith parameter sweep over rounding parameter (integer programming)

61

“One major advantage of Swift is that its high-level programming language considerably shortens coding times, hence it allows more effort to be dedicated to algorithmic developments. Remarkably, using a scripting language has virtually no impact on the solution times and scaling efficiency, as recent Swift/PIPS-L runs show.”

- Cosmin Petra

Page 62: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Swift/T: Use of low-level features

Variable-sized tasks produce trailing tasks:addressed by exposing ADLB task priorities at language level

Page 63: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

FOR MORE INFORMATION and TOOL DOWNLOADS: To try Swift: http://www.ci.uchicago.edu/swift ParVisProject: http://trac.mcs.anl.gov/projects/parvis CESM Diagnostic script info at http://www.cesm.ucar.edu Pagoda project link: https://svn.pnl.gov/gcrm/wiki/Pagoda Extreme-scale Swift: http://www.mcs.anl.gov/exm

63www.ci.uchicago.edu/swift

Page 64: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

64Parallel Computing, Sep 2011www.ci.uchicago.edu/swift

Page 65: Swift: A language for scalable scientific computing Justin M. Wozniak Swift project:  Presenter contact: wozniak@mcs.anl.gov

Acknowledgments

Swift is supported in part by NSF grants OCI-1148443 and PHY-636265, NIH DC08638,and the UChicago SCI Program

Extreme scaling research on Swift (ExM project) is supported by the DOEOffice of Science, ASCR Division

ParVis is sponsored by the DOE Office of Biological and Environment Research, Office of Science

The Swift team:– Mihael Hategan, Tim Armstrong, David Kelly, Ian Foster, Dan Katz,

Mike Wilde, Zhao Zhang Sincere thanks to our scientific collaborators whose usage of Swift

was described and acknowledged in this talk

65www.ci.uchicago.edu/swift