blackpblackbblackdblackr: a sustainable path for …...ppppbbbbddddrrr: a sustainable path for...

39
pbdR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University of Tennessee Workshop on Distributed Computing in R, January 26-27, 2015 HP Labs, Palo Alto, CA pbdR Core Team pbdR: A Sustainable Path for Scalable Statistical Computing

Upload: others

Post on 22-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

ppppppbbbbbbddddddRRRRRR: A Sustainable Path for Scalable StatisticalComputing

George Ostrouchov

Oak Ridge National Laboratory and University of Tennessee

Workshop on Distributed Computing in R, January 26-27, 2015HP Labs, Palo Alto, CA

ppppppbbbbbbddddddRRRRRR Core Team ppppppbbbbbbddddddRRRRRR: A Sustainable Path for Scalable Statistical Computing

Page 2: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

The ppppppbbbbbbddddddRRRRRR Core Team

Wei-Chen Chen1

George Ostrouchov2,3

Pragneshkumar Patel3

Drew Schmidt3

SupportThis work used resources of National Institute for Computational Sciences at the University of Tennessee, Knoxville, which issupported by the Office of Cyberinfrastructure of the U.S. National Science Foundation under Award No.ARRA-NSF-OCI-0906324 for NICS-RDAV center. This work also used resources of the Oak Ridge Leadership Computing Facilityat the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy underContract No. DE-AC05-00OR22725.

1FDAWashington, DC, USA

2Computer Science and Mathematics DivisionOak Ridge National Laboratory, Oak Ridge TN, USA

3Joint Institute for Computational SciencesUniversity of Tennessee, Knoxville TN, USA

ppppppbbbbbbddddddRRRRRR Core Team ppppppbbbbbbddddddRRRRRR: A Sustainable Path for Scalable Statistical Computing

Page 3: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R

1 Introduction to HPC and Its View from R

2 pbdR

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 4: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R

Contents

1 Introduction to HPC and Its View from RBatch and InteractiveAn R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and SoftwareBeyond “embarrassignly parallel”

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 5: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R Batch and Interactive

1 Introduction to HPC and Its View from RBatch and InteractiveAn R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and SoftwareBeyond “embarrassignly parallel”

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 6: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R Batch and Interactive

Data analysis is interactive!

Data reduction to knowledge

S (and R) interactive “answer” to batch data analysis

Iterative process with same data

Exploration, model constructionDiagnostics of fit and quantification of uncertaintyInterpretation

Efficient use of expensive people

Supercomputing is batch!

Traditionally data generation by simulation

Data processed mostly by visualization, often single pass

Efficient use of expensive platforms

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 1/26

Page 7: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R Batch and Interactive

When in Rome, do as the Romans do! Batch!

Parallel visualization systems (VisIt and ParaView) are client-server(batch on server)

Current ppppppbbbbbbddddddRRRRRR packages address server side (batch)

Client under development

Bridge laptop to login node to resource manager to clusterSite configuration fileManage relationship of big data (server side) to little data (client side)

Need intuitive parallel data input

Batch and Interactive. Is there a difference?

High-level functional language: a function is a “batch” script

R “An interactive environment to use batch scripts”

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 2/26

Page 8: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R An R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and Software

1 Introduction to HPC and Its View from RBatch and InteractiveAn R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and SoftwareBeyond “embarrassignly parallel”

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 9: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R An R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and Software

Big Data and Little Data

HPC Cluster computer ParallelFile System

DiskStorageServers

Compute Nodes I/O Nodes

Login Nodes Your Laptop

Big

Dat

a

“Little Data”

SoftwareComponents

Reader or GeneratorAnalysis scriptDisplay/write results

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 3/26

Page 10: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R An R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and Software

Hadoop: File System or HPC Cluster?

HADOOP Cluster Computer

Compute/IO/Storage Nodes and HD File System

Name Nodes, Job Trackers, Clients

Your LaptopB

ig D

ata

“Little Data”

SoftwareComponents

HDFS file systemYarn resource managerMap-reduce limitation

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 4/26

Page 11: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R An R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and Software

Adding NVRAM to New HPC Systems

HPC Cluster computerwith NVRAM

ParallelFile System

DiskStorageServers

Compute Nodes I/O Nodes

Login Nodes Your Laptop

Big

Dat

a

“Little Data”

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 5/26

Page 12: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R An R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and Software

R Interfaces to Low-Level Native Tools

Interconnection Network

PROC+ cache

PROC+ cache

PROC+ cache

PROC+ cache

Mem Mem Mem Mem

Distributed Memory

Memory

CORE+ cache

CORE+ cache

CORE+ cache

CORE+ cache

Network

Shared Memory Local Memory

GPUor

MIC

Co-Processor

GPU: Graphical Processing Unit

MIC: Many Integrated Core

Default is parallel (SPMD): where is my data and who has what I need?

Default is serial: which tasks can the compiler make parallel?

Offload data and tasks. We are slow but many!

SocketsMPI

Hadoop

snowRmpi

pbdMPI

snow + multicore = parallel

CUDAOpenCL

OpenACC

OpenMPOpenACC

OpenMPPthreads

fork

multicore

RHadoop

Foreign Langauge

Interfaces:.C

.CallRcpp

OpenCLinline

.

.

.

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 6/26

Page 13: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R An R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and Software

HPC Libraries: 30+ Years of Research and Development

Interconnection Network

PROC+ cache

PROC+ cache

PROC+ cache

PROC+ cache

Mem Mem Mem Mem

Distributed Memory

Memory

CORE+ cache

CORE+ cache

CORE+ cache

CORE+ cache

Network

Shared Memory Local Memory

GPUor

MIC

Co-Processor

GPU: Graphical Processing Unit

MIC: Many Integrated Core

Trilinos

PETSc

PLASMA

DPLASMAMKL (Intel)LibSci (Cray)

ScaLAPACKPBLASBLACS

cuBLAS (NVIDIA)

MAGMA

PAPI

Tau

MPI

mpiP

fpmpi

NetCDF4

ADIOS

ACML (AMD)

CombBLAS

cuSPARSE (NVIDIA)

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 7/26

Page 14: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R An R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and Software

R and ppppppbbbbbbddddddRRRRRR R Interfaces to HPC Libraries

Interconnection Network

PROC+ cache

PROC+ cache

PROC+ cache

PROC+ cache

Mem Mem Mem Mem

Distributed Memory

Memory

CORE+ cache

CORE+ cache

CORE+ cache

CORE+ cache

Network

Shared Memory Local Memory

GPUor

MIC

Co-Processor

GPU: Graphical Processing Unit

MIC: Many Integrated Core

Trilinos

PETSc

PLASMA

DPLASMAMKL (Intel)LibSci (Cray)

ScaLAPACKPBLASBLACS

cuBLAS (NVIDIA)

MAGMA

PAPI

Tau

MPI

mpiP

fpmpi

NetCDF4

ADIOS

magma

HiPLARHiPLARM

pbdMPI

pbdPAPI

pbdNCDF4

pbdADIOS

pbdPROFpbdPROFpbdPROF

ACML (AMD)

CombBLAS

cuSPARSE (NVIDIA)

pbdDMATpbdDMATpbdDMATpbdDMAT

pbdBASE pbdSLAP

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 8/26

Page 15: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R Beyond “embarrassignly parallel”

1 Introduction to HPC and Its View from RBatch and InteractiveAn R and ppppppbbbbbbddddddRRRRRR View of Parallel Hardware and SoftwareBeyond “embarrassignly parallel”

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 16: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R Beyond “embarrassignly parallel”

A brief personal history of parallel computing

1980

1990

2000

2010

Early distributed systems produced (iPSC, NCUBE)

Numerical linear algebra embraces parallel computing, some interest in statistics

MPI standard developed

First distributed libraries released (PBLAS, ScaLAPACK)

Simulation science is the driver of supercomputing

CPU clock speeds stagnate, multicore emerges

Everyone, including statistics and R, cares about parallel computing mostly from a multicore perspective

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 9/26

Page 17: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R Beyond “embarrassignly parallel”

Early Work in Statistics: Crossproduct (Row-Block layout)

Ostrouchov (1987). Parallel Computing on a Hypercube: An overview of the architecture andsome applications. Proceedings of the 19th Symposium on the Interface of Computer Scienceand Statistics, p.27-32.

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 10/26

Page 18: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R Beyond “embarrassignly parallel”

Distributed Numerical Linar Algebra

BLACS Library Communication Patterns (ScaLAPACK)

J. Dongarra and R. C. Whaley (1995). A Users Guide to the BLACS v1.1. UT-CS-95-281,March 1995.

Complex communications for block-distributed numerical linearalgebra computations.

Enables use of separate rowwise and columnwise communicators.

Optimizes collective operations of multiple communicators.

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 11/26

Page 19: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

Introduction to HPC and Its View from R Beyond “embarrassignly parallel”

PaRSEC: Parallel Runtime Scheduling and ExecutionController (DPLASMA)

Graphic from icl.cs.utk.edu

Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J. ”PaRSEC: Exploiting Heterogeneity to EnhanceScalability,” IEEE Computing in Science and Engineering, Vol. 15, No. 6, 36-45, November, 2013.

Master data flow controller runs distributed on all cores.

Dynamic generation of current level in flow graph

Effectively removes collective synchronizations

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 12/26

Page 20: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR

Contents

2 pbdRThe pbdR ProjectpbdMPIpbdDMATBenchmarksFuture Work

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 21: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR The pbdR Project

2 pbdRThe pbdR ProjectpbdMPIpbdDMATBenchmarksFuture Work

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 22: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR The pbdR Project

ppppppbbbbbbddddddRRRRRR Interfaces to Libraries: Sustainable Path

Interconnection Network

PROC+ cache

PROC+ cache

PROC+ cache

PROC+ cache

Mem Mem Mem Mem

Distributed Memory

Memory

CORE+ cache

CORE+ cache

CORE+ cache

CORE+ cache

Network

Shared Memory Local Memory

GPUor

MIC

Co-Processor

GPU: Graphical Processing Unit

MIC: Many Integrated Core

Trilinos

PETSc

PLASMA

DPLASMAMKL (Intel)LibSci (Cray)

ScaLAPACKPBLASBLACS

cuBLAS (NVIDIA)

MAGMA

PAPI

Tau

MPI

mpiP

fpmpi

NetCDF4

ADIOS

pbdMPI

pbdPAPI

pbdNCDF4

pbdADIOS

pbdPROFpbdPROFpbdPROF

ACML (AMD)

pbdDEMO

CombBLAS

cuSPARSE (NVIDIA)

pbdDMATpbdDMATpbdDMATpbdDMAT

pbdBASE pbdSLAP

Why use HPC libraries?

The HPC community is 30 years beyond “embarrassingly parallel.”

They’re tested. They’re fast. They’re scalable.

Many science communities are invested in their API.

You’re not going to beat Jack Dongarra’s lab at dense linear algebra!

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 13/26

Page 23: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR pbdMPI

2 pbdRThe pbdR ProjectpbdMPIpbdDMATBenchmarksFuture Work

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 24: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR pbdMPI

pbdMPI: Simplified, Extensible, and Fast Communication Operations

S4 methods for collective communication: extensible to other Robjects.

Default methods (like Robj in Rmpi) check for data type: safe forgeneral users.

API is simplified: defaults in control objects.

Array and matrix methods without serialization: faster than Rmpi.

pbdMPI (S4) Rmpiallgather mpi.allgather, mpi.allgatherv, mpi.allgather.Robjallreduce mpi.allreduce

bcast mpi.bcast, mpi.bcast.Robjgather mpi.gather, mpi.gatherv, mpi.gather.Robjrecv mpi.recv, mpi.recv.Robjreduce mpi.reduce

scatter mpi.scatter, mpi.scatterv, mpi.scatter.Robjsend mpi.send, mpi.send.Robj

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 14/26

Page 25: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR pbdMPI

pbdMPI: Does the Right Thing for You

Rmpi

1 # int

2 mpi.allreduce(x, type =1)

3 # double

4 mpi.allreduce(x, type =2)

pbdMPI

1 allreduce(x)

Types in R

1 > is.integer (1)

2 [1] FALSE

3 > is.integer (2)

4 [1] FALSE

5 > is.integer (1:2)

6 [1] TRUE

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 15/26

Page 26: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR pbdDMAT

2 pbdRThe pbdR ProjectpbdMPIpbdDMATBenchmarksFuture Work

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 27: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR pbdDMAT

Mapping a Matrix to Processors

Processor Grid Shapes

[0 1 2 3 4 5

]

(a) 1 × 6

[0 1 23 4 5

](b) 2 × 3

0 12 34 5

(c) 3 × 2

012345

(d) 6 × 1

Table: Processor Grid Shapes with 6 Processors

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 16/26

Page 28: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR pbdDMAT

2×3 block-cyclic grid on 6 processors: Global view “ddmatrix” class

x =

x11 x12 x13 x14 x15 x16 x17 x18 x19

x21 x22 x23 x24 x25 x26 x27 x28 x29

x31 x32 x33 x34 x35 x36 x37 x38 x39

x41 x42 x43 x44 x45 x46 x47 x48 x49

x51 x52 x53 x54 x55 x56 x57 x58 x59

x61 x62 x63 x64 x65 x66 x67 x68 x69

x71 x72 x73 x74 x75 x76 x77 x78 x79

x81 x82 x83 x84 x85 x86 x87 x88 x89

x91 x92 x93 x94 x95 x96 x97 x98 x99

9×9

Processor grid =

∣∣∣∣ 0 1 23 4 5

∣∣∣∣ =

∣∣∣∣ (0,0) (0,1) (0,2)(1,0) (1,1) (1,2)

∣∣∣∣

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 17/26

Page 29: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR pbdDMAT

2×3 block-cyclic grid on 6 processors: Local view “ddmatrix” class

x11 x12 x17 x18

x21 x22 x27 x28

x51 x52 x57 x58

x61 x62 x67 x68

x91 x92 x97 x98

5×4

x13 x14 x19

x23 x24 x29

x53 x54 x59

x63 x64 x69

x93 x94 x99

5×3

x15 x16

x25 x26

x55 x56

x65 x66

x95 x96

5×2

x31 x32 x37 x38

x41 x42 x47 x48

x71 x72 x77 x78

x81 x82 x87 x88

4×4

x33 x34 x39

x43 x44 x49

x73 x74 x79

x83 x84 x89

4×3

x35 x36

x45 x46

x75 x76

x85 x86

4×2

Processor grid =

∣∣∣∣ 0 1 23 4 5

∣∣∣∣ =

∣∣∣∣ (0,0) (0,1) (0,2)(1,0) (1,1) (1,2)

∣∣∣∣ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 18/26

Page 30: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR pbdDMAT

ppppppbbbbbbddddddRRRRRR Example Syntax

1 x <- x[-1, 2:5]

2 x <- log(abs(x) + 1)

3 x.pca <- prcomp(x)

4 xtx <- t(x) %*% x

5 ans <- svd(solve(xtx))

The above (and over 100 other functions) runs on 1 core with Ror 10,000 cores with ppppppbbbbbbddddddRRRRRR ddmatrix class

1 > showClass("ddmatrix")

2 Class "ddmatrix" [package "pbdDMAT"]

3 Slots:

4 Name: Data dim ldim bldim ICTXT

5 Class: matrix numeric numeric numeric numeric

1 > x <- as.rowblock(x)

2 > x <- as.colblock(x)

3 > x <- redistribute(x, bldim=c(8, 8), ICTXT = 0)

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 19/26

Page 31: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR Benchmarks

2 pbdRThe pbdR ProjectpbdMPIpbdDMATBenchmarksFuture Work

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 32: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR Benchmarks

Default Choices Throughout

Only free software used (no MKL, ACML, etc.)

Generate random matrix

Global Columns: 500, 1000, and 2000Global Rows: vary with core count (local matrix fixed at 43.4MiB)

1 core = 1 MPI process

Measure wall clock time

“weak scaling” = global problem grows with core count

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 20/26

Page 33: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR Benchmarks

Covariance Benchmark (pbdDMAT)

1 x <- ddmatrix("rnorm", nrow=n, ncol=p, mean=mean , sd=sd)

2 cov.x <- cov(x)

21.34

42.6885.35

170.7 341.41

21.34 42.68 85.35 170.7 341.41

21.3442.68

85.35 170.7

341.41

7

8

9

10

11

504 1008 2016 4032 8064504 1008 2016 4032 8064504 1008 2016 4032 8064Cores

Run

Tim

e (S

econ

ds)

Predictors 500 1000 2000

Calculating cov(x) With Fixed Local Size of ~43.4 MiB

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 21/26

Page 34: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR Benchmarks

Linear Regression Benchmark (pbdDMAT)

1 beta.true <- ddmatrix("runif", nrow=p, ncol =1)

2 y <- x %*% beta.true

3 beta.est <- lm.fit(x, y)$coefficients

21.34

42.6885.35170.7 341.41

1016.09

21.3442.6885.35 170.7

341.411016.09

21.3442.68

85.35

170.7341.41

1016.09

25

50

75

100

125

504

1008

2016

4032

8064

2400

050

410

0820

1640

3280

64

2400

050

410

0820

1640

3280

64

2400

0

Cores

Run

Tim

e (S

econ

ds)

Predictors 500 1000 2000

Fitting y~x With Fixed Local Size of ~43.4 MiB

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 22/26

Page 35: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR Benchmarks

Data Generation (pbdDMAT)

0

200

400

600

800

1 504 1008 2016 4032 8064 24000 48000Cores

Dat

a G

ener

atio

n (G

iB p

er S

econ

d)

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 23/26

Page 36: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR Benchmarks

Matrix Exponentiation (pbdDMAT)

Fitting biogeography models requires manymatrix exponentiations

Benchmark: Matrix exponential of5000×5000 matrix.

R 3.1.0, Matrix 1.1-2, rexpokit 0.25,pbdDMAT 0.3-0

Libs: Cray LibSci, NETLIB ScaLAPACK,Compilers: gnu 4.8.2

Configuration: 1 thread == 1 MPI rank ==1 physical core

Schmidt and Matzke (2014) Distributed matrix exponentiation, The R User Conference (UseR! 2014),

Los Angeles, CA, August 2014 .ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 24/26

Page 37: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR Future Work

2 pbdRThe pbdR ProjectpbdMPIpbdDMATBenchmarksFuture Work

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing

Page 38: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR Future Work

Future Work

Started a 3 year NSF grant to

Bring back interactivity via client/serverSimplify parallel data inputBegin DPLASMA integrationOutreach to the statistics community

DOE funding: In-situ or staging use with simulations

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 25/26

Page 39: blackpblackbblackdblackR: A Sustainable Path for …...ppppbbbbddddRRR: A Sustainable Path for Scalable Statistical Computing George Ostrouchov Oak Ridge National Laboratory and University

pbdR Future Work

Where to learn more?

http://r-pbd.org/

pbdDEMO vignette

Googlegroup:RBigDataProgramming

ppppppbbbbbbddddddRRRRRR Core Team Sustainable Scalable Statistical Computing 26/26